- 26 Jul, 2016 2 commits
-
-
Ronald S. Bultje authored
Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
-
Ronald S. Bultje authored
Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
-
- 27 Dec, 2014 16 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
-
Ronald S. Bultje authored
The value is not used outside the branch.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 17 Sep, 2014 1 commit
-
-
Michael Niedermayer authored
Fixes executable stack Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 06 Aug, 2014 1 commit
-
-
Christophe Gisquet authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 05 Aug, 2014 1 commit
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 20 Apr, 2014 2 commits
-
-
Clément Bœsch authored
-
Clément Bœsch authored
-
- 19 Apr, 2014 2 commits
-
-
Clément Bœsch authored
In the 2 FILTER_INIT usages, the source is already preloaded so that extra complexity taken from FILTER_UPDATE is not necessary. Also add forgotten "mask" argument in FILTER_{INIT,UPDATE} comments.
-
Clément Bœsch authored
-
- 08 Feb, 2014 1 commit
-
-
Clément Bœsch authored
For non-avx optims, this saves 8 movs. before: 1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips 3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips 2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips 3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips after: 1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips 3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips 2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips 3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips
-
- 05 Feb, 2014 4 commits
-
-
Clément Bœsch authored
-
Clément Bœsch authored
-
Clément Bœsch authored
-
Clément Bœsch authored
-
- 30 Jan, 2014 1 commit
-
-
Clément Bœsch authored
5.40s → 5.30s overall decode time with -threads 1 on ped1080p.webm (i7 920, ssse3)
-
- 28 Jan, 2014 2 commits
-
-
James Almer authored
Similar gains as the ssse3 version once again Signed-off-by: James Almer <jamrial@gmail.com>
-
Clément Bœsch authored
9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips 9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips 1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips 2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips 5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1) Adding SSE2 support should be relatively trivial (just a matter of changing the pshufb [mask_mix] with something else), patch welcome.
-
- 27 Jan, 2014 3 commits
-
-
Clément Bœsch authored
Allow some macro refactoring in filter14().
-
Clément Bœsch authored
-
Clément Bœsch authored
Introduce 2 additional registers for stride3 and mstride3 to allow direct accesses (lea drops). 3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3 Also uses defines to clarify the code.
-
- 17 Jan, 2014 1 commit
-
-
James Almer authored
Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com>
-
- 15 Jan, 2014 1 commit
-
-
Clément Bœsch authored
4412 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 4193462 runs, 842 skips 3600 decicycles in ff_vp9_loop_filter_h_16_16_avx, 4193621 runs, 683 skips 3010 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193528 runs, 776 skips 2678 decicycles in ff_vp9_loop_filter_v_16_16_avx, 4193742 runs, 562 skips 23025 decicycles in ff_vp9_idct_idct_32x32_add_ssse3, 2096871 runs, 281 skips 19943 decicycles in ff_vp9_idct_idct_32x32_add_avx, 2096815 runs, 337 skips 4675 decicycles in ff_vp9_idct_idct_16x16_add_ssse3, 4194018 runs, 286 skips 3980 decicycles in ff_vp9_idct_idct_16x16_add_avx, 4194022 runs, 282 skips 967 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 16776972 runs, 244 skips 887 decicycles in ff_vp9_idct_idct_8x8_add_avx, 16777002 runs, 214 skips
-
- 12 Jan, 2014 1 commit
-
-
Clément Bœsch authored
16662 decicycles in loop_filter_h_16_16_c, 8387355 runs, 1253 skips 17510 decicycles in loop_filter_v_16_16_c, 8387516 runs, 1092 skips 4941 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 8387887 runs, 721 skips 3899 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 8387980 runs, 628 skips Overall decode time goes from: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 8.10s user 0.02s system 99% cpu 8.126 total to: ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null - 6.15s user 0.04s system 99% cpu 6.199 total (46 to 61 fps)
-