- 27 Jan, 2019 1 commit
-
-
Janne Grunau authored
Fixes checkasm errors after adding the h264 deblock tests.
-
- 27 Feb, 2017 7 commits
-
-
James Darnley authored
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad. Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
-
James Darnley authored
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
-
James Darnley authored
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
-
James Darnley authored
~1.14x faster (90 vs 78 cycles) compared with mmxext
-
James Darnley authored
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
-
James Darnley authored
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
-
James Darnley authored
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
-
- 18 Feb, 2017 3 commits
-
-
James Darnley authored
x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
-
James Darnley authored
-
James Darnley authored
-
- 30 Nov, 2016 1 commit
-
-
James Darnley authored
2.1 times faster (401 vs. 194 cycles)
-
- 05 Feb, 2016 2 commits
-
-
Henrik Gramner authored
Using rNm and x86inc's stack allocation with a negative value at the same time isn't supported, and caused the original stack pointer to be clobbered when using a compiler that doesn't support stack alignment.
-
James Darnley authored
2.6 times faster (366 vs. 142 cycles)
-
- 11 Aug, 2015 1 commit
-
-
Henrik Gramner authored
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 04 Aug, 2015 1 commit
-
-
Henrik Gramner authored
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.
-
- 01 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 10 Jun, 2014 1 commit
-
-
Martin Storsjö authored
We know that the called function (ff_chroma_inter_body_mmxext) doesn't touch the redzone, and thus will be kept intact - thus, this doesn't fix any bug per se. However, valgrind's memcheck tool intentionally assumes that the redzone is clobbered on every function call and function return (see a long comment in valgrind/memcheck/mc_main.c). This avoids false positives in that tool, at the cost of an extra stack pointer adjustment. The other alternative would be a valgrind suppression for this issue, but that's an extra burden for everybody that wants to run libavcodec within valgrind. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 13 Mar, 2014 1 commit
-
-
Diego Biurrun authored
This helps grepping for functions, among other things.
-
- 07 Oct, 2013 1 commit
-
-
Henrik Gramner authored
Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 10 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 12 Mar, 2013 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 21 Feb, 2013 2 commits
-
-
Matt Wolenetz authored
Thanks-to: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Matt Wolenetz authored
This fixes crashes in chromium on win64 on machines with AVX (crashes that apparently aren't triggered by fate). Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 12 Dec, 2012 2 commits
-
-
Ronald S. Bultje authored
Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Ronald S. Bultje authored
Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- 13 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 30 Oct, 2012 2 commits
-
-
Diego Biurrun authored
This is more consistent with the way we handle C #includes and it simplifies the build system.
-
Diego Biurrun authored
This is necessary to allow refactoring some x86util macros with cpuflags.
-
- 31 Aug, 2012 1 commit
-
-
Carl Eugen Hoyos authored
-
- 31 Jul, 2012 1 commit
-
-
Ronald S. Bultje authored
This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.
-
- 28 Jul, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 11 Apr, 2012 1 commit
-
-
Henrik Gramner authored
Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
- 19 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
Red zone usage is not allowed in the Win64 ABI.
-
- 17 Feb, 2012 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 12 Feb, 2012 1 commit
-
-
Reimar Döffinger authored
%ifdef HAVE_AVX must now be %if HAVE_AVX. Signed-off-by:
Reimar Döffinger <Reimar.Doeffinger@gmx.de>
-
- 27 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
This allows combining multiple conditionals in a single statement.
-
- 19 Oct, 2011 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 15 Aug, 2011 1 commit
-
-
Dave Yeo authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 12 Aug, 2011 1 commit
-
-
Ronald S. Bultje authored
This allows using it in swscale also.
-