- 21 Mar, 2017 1 commit
-
-
James Almer authored
Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 18 Feb, 2017 3 commits
-
-
James Darnley authored
x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
-
James Darnley authored
Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
-
James Darnley authored
-
- 19 Sep, 2016 1 commit
-
-
Alexandra Hájková authored
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
- 18 Jul, 2016 1 commit
-
-
James Almer authored
Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
-
- 11 Jul, 2016 1 commit
-
-
Ronald S. Bultje authored
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
-
- 08 Jun, 2016 2 commits
-
-
James Almer authored
Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
- 12 Sep, 2015 1 commit
-
-
James Almer authored
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 03 Aug, 2015 1 commit
-
-
James Almer authored
Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 31 Dec, 2014 1 commit
-
-
James Almer authored
Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 05 Dec, 2014 1 commit
-
-
Kieran Kunhya authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
- 26 Nov, 2014 1 commit
-
-
Kieran Kunhya authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 03 Aug, 2014 1 commit
-
-
James Almer authored
Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 26 Jul, 2014 1 commit
-
-
James Almer authored
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 15 Jun, 2014 1 commit
-
-
Christophe Gisquet authored
Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 29 May, 2014 1 commit
-
-
Christophe Gisquet authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 28 May, 2014 1 commit
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 17 Apr, 2014 1 commit
-
-
James Almer authored
Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 24 Feb, 2014 1 commit
-
-
James Almer authored
We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 14 Oct, 2013 2 commits
-
-
Jason Garrett-Glaser authored
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 18 Jan, 2013 2 commits
-
-
Diego Biurrun authored
This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
- 15 Jan, 2013 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 14 Jan, 2013 1 commit
-
-
Diego Biurrun authored
-
- 06 Jan, 2013 1 commit
-
-
Diego Biurrun authored
-
- 05 Dec, 2012 1 commit
-
-
Justin Ruggles authored
Include x86-optimized versions for SSE2 and AVX.
-
- 18 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 13 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 11 Nov, 2012 1 commit
-
-
Diego Biurrun authored
This reduces the local difference to the x264 upstream version.
-
- 09 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 05 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 02 Nov, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
"mmxext" is a more sensible name and more common in outside projects.
-
- 31 Oct, 2012 1 commit
-
-
Dave Yeo authored
Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>
-