- 03 Aug, 2016 8 commits
-
-
Ronald S. Bultje authored
Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
James Almer authored
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Clément Bœsch authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
James Almer authored
pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by:
James Almer <jamrial@gmail.com> Reviewed-by:
"Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Clément Bœsch authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Ronald S. Bultje authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Anton Khirnov authored
It only contains the MC SIMD, other SIMD will go into different files.
-
Christophe Gisquet authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 07 Dec, 2013 1 commit
-
-
Ronald S. Bultje authored
(And in future, loopfilter or intra pred could be put in their own respective files also.)
-
- 21 Nov, 2013 1 commit
-
-
Clément Bœsch authored
-
- 15 Nov, 2013 1 commit
-
-
Ronald S. Bultje authored
Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by:
Luca Barbato <lu_zero@gentoo.org> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 05 Nov, 2013 1 commit
-
-
Clément Bœsch authored
1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips 1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips 1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips 529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips 516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips 474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips (~3.9x faster) 7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips 7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips 7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips 1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips 1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips 1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips (~7.1x faster) Overall decode time before commit: 16.48s user 0.03s system 99% cpu 16.526 total 16.54s user 0.01s system 99% cpu 16.566 total 16.46s user 0.03s system 99% cpu 16.511 total Overall decode time after commit: 16.34s user 0.02s system 99% cpu 16.378 total 16.28s user 0.02s system 99% cpu 16.315 total 16.32s user 0.03s system 99% cpu 16.366 total Tested on i7 920 with 40s 1080p footage.
-
- 08 Oct, 2013 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 03 Oct, 2013 2 commits
-
-
Ronald S. Bultje authored
Decoding time of ped1080p.webm goes from 11.3sec to 11.1sec.
-
Ronald S. Bultje authored
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
-