- 18 Oct, 2016 1 commit
-
-
Rostislav Pehlivanov authored
Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by:
Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by:
Michael Niedermayer <michael@niedermayer.cc> Reviewed-by:
James Almer <jamrial@gmail.com>
-
- 02 Aug, 2016 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 07 Apr, 2016 1 commit
-
-
Diego Biurrun authored
Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
-
- 01 Mar, 2016 1 commit
-
-
Diego Biurrun authored
-
- 29 Feb, 2016 1 commit
-
-
Timothy Gu authored
-
- 19 Feb, 2016 1 commit
-
-
Diego Biurrun authored
-
- 06 Feb, 2016 3 commits
-
-
James Almer authored
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
Timothy Gu authored
-
Timothy Gu authored
-
- 31 Jan, 2016 2 commits
- 25 Jan, 2016 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 18 Jan, 2016 1 commit
-
-
Diego Biurrun authored
-
- 05 Dec, 2015 1 commit
-
-
Anton Khirnov authored
-
- 22 Oct, 2015 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 21 Oct, 2015 1 commit
-
-
Timothy Gu authored
Heavily based upon ff_add_bytes by Christophe Gisquet. Reviewed-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Timothy Gu <timothygu99@gmail.com>
-
- 13 Oct, 2015 2 commits
-
-
Ronald S. Bultje authored
-
Christophe Gisquet authored
Modeled from the prores version. Clips to [0;1023] and is bitexact. Bitexactness requires to add offsets in different places compared to prores or C, and makes the function approximately 2% slower. For 16 frames of a DNxHD 4:2:2 10bits test sequence: C: 60861 decicycles in idct, 1048205 runs, 371 skips sse2: 27567 decicycles in idct, 1048216 runs, 360 skips avx: 26272 decicycles in idct, 1048171 runs, 405 skips The add version is not implemented, so the corresponding dsp function is set to NULL to make it clear in a code executing it. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Oct, 2015 1 commit
-
-
Paul B Mahol authored
Signed-off-by:
Paul B Mahol <onemda@gmail.com>
-
- 06 Oct, 2015 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 03 Oct, 2015 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 30 Sep, 2015 1 commit
-
-
James Almer authored
Tested-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 17 Sep, 2015 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 28 Aug, 2015 1 commit
-
-
Vittorio Giovara authored
Deprecated in 03/2013.
-
- 30 Jul, 2015 1 commit
-
-
James Almer authored
Between 1.5 and 2.5 times faster Reviewed-by:
Michael Niedermayer <michael@niedermayer.cc> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 17 Jul, 2015 2 commits
-
-
Vittorio Giovara authored
-
Vittorio Giovara authored
-
- 13 Jun, 2015 1 commit
-
-
James Almer authored
Original intrinsics version by Nicolas Bertrand. Reviewed-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 14 Mar, 2015 1 commit
-
-
Christophe Gisquet authored
Also reduce the table duplication with SSE2 code, remove duplicated macro parameters. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 13 Mar, 2015 1 commit
-
-
Christophe Gisquet authored
The main difference consists in renaming properly labels, and letting yasm select the gprs for skipping 1D transforms. Previous-version-reviewed-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 28 Feb, 2015 1 commit
-
-
Anton Khirnov authored
Only ac3dec and dcadec use it.
-
- 16 Feb, 2015 1 commit
-
-
James Almer authored
Reviewed-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 01 Feb, 2015 1 commit
-
-
James Almer authored
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by:
Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 05 Dec, 2014 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
Vittorio Giovara <vittorio.giovara@gmail.com>
-
- 26 Nov, 2014 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 23 Nov, 2014 2 commits
-
-
Carl Eugen Hoyos authored
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 03 Oct, 2014 1 commit
-
-
James Almer authored
2x to 2.5x faster than the C version. Reviewed-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
James Almer <jamrial@gmail.com>
-