- 02 Feb, 2017 2 commits
-
-
James Almer authored
Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>
-
- 01 Feb, 2017 1 commit
-
-
Michael Niedermayer authored
The assumption this is based on is wrong, the code is not always run with bitexact flags This reverts commit a956164e, reversing changes made to f6005907. Approved-by: James Almer <jamrial@gmail.com>
-
- 31 Jan, 2017 1 commit
-
-
Clément Bœsch authored
-
- 13 Jan, 2017 5 commits
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>
-
- 02 Jan, 2017 2 commits
-
-
Michael Niedermayer authored
make fate passes Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
John Comeau authored
fixes `operation size not specified` errors as described here: http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2 I rebuilt again with yasm and made sure it didn't break that. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 20 Dec, 2016 1 commit
-
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
- 07 Dec, 2016 1 commit
-
-
James Darnley authored
32-bit msvc.
-
- 06 Dec, 2016 4 commits
-
-
James Darnley authored
Yorkfield: - mmx2: 2.53x (504 vs. 199 cycles) - sse2: 3.83x (504 vs. 131 cycles) Nehalem: - mmx2: 2.42x (365 vs. 151 cycles) - sse2: 3.56x (365 vs. 103 cycles) Skylake: - mmx2: 1.81x (308 vs. 170 cycles) - sse2: 2.84x (308 vs. 108 cycles) - avx: 2.93x (308 vs. 105 cycles)
-
James Darnley authored
Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles)
-
James Darnley authored
-
James Darnley authored
-
- 30 Nov, 2016 3 commits
-
-
James Darnley authored
Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)
-
James Darnley authored
2.87 times faster (1830 vs. 638 cycles)
-
James Darnley authored
2.1 times faster (401 vs. 194 cycles)
-
- 18 Nov, 2016 1 commit
-
-
James Almer authored
Fixes compilation with Yasm 1.1.0 and older. Signed-off-by: James Almer <jamrial@gmail.com>
-
- 15 Nov, 2016 1 commit
-
-
Ronald S. Bultje authored
Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).
-
- 21 Oct, 2016 1 commit
-
-
Andreas Cadhalpun authored
Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
-
- 18 Oct, 2016 1 commit
-
-
Rostislav Pehlivanov authored
Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>
-
- 02 Oct, 2016 1 commit
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
- 01 Oct, 2016 1 commit
-
-
James Almer authored
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 23 Sep, 2016 2 commits
-
-
Hendrik Leppkes authored
Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 06 Aug, 2016 1 commit
-
-
James Almer authored
Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>
-
- 02 Aug, 2016 1 commit
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
- 26 Jul, 2016 3 commits
-
-
Ronald S. Bultje authored
Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
-
Ronald S. Bultje authored
Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
-
Ronald S. Bultje authored
About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
-
- 20 Jul, 2016 7 commits
-
-
James Almer authored
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
That code is only ever initialized with that flag set.
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-