- 25 Jan, 2015 1 commit
-
-
James Almer authored
2 to 2.5 times faster. Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 15 May, 2014 1 commit
-
-
Christophe Gisquet authored
From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 30 Aug, 2013 1 commit
-
-
Thilo Borgmann authored
-
- 17 Jul, 2013 1 commit
-
-
Diego Biurrun authored
-
- 10 May, 2013 1 commit
-
-
Christophe Gisquet authored
From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 08 May, 2013 1 commit
-
-
Christophe Gisquet authored
MSVC complains about the 32bits addressing, while mingw/gcc does not. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 03 May, 2013 1 commit
-
-
Christophe Gisquet authored
Sandybridge: 47 cycles Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 24 Apr, 2013 1 commit
-
-
Michael Niedermayer authored
This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 23 Apr, 2013 1 commit
-
-
Michael Niedermayer authored
This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 19 Apr, 2013 1 commit
-
-
Christophe Gisquet authored
233 to 105 cycles on Arrandale and Win64. Replacing the multiplication by s_m[m] by a pand and a pxor with appropriate vectors is slower. Unrolling is a 15 cycles win. A SSE version was 4 cycles slower. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 10 Apr, 2013 1 commit
-
-
Christophe Gisquet authored
From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 08 Apr, 2013 1 commit
-
-
Christophe Gisquet authored
From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64. Sandybridge: 68/47 cycles. Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 05 Apr, 2013 2 commits
-
-
Christophe Gisquet authored
Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Christophe Gisquet authored
Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 05 Feb, 2013 1 commit
-
-
Diego Biurrun authored
-
- 06 Jan, 2013 2 commits
-
-
Christophe Gisquet authored
255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Christophe Gisquet authored
698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 07 Dec, 2012 1 commit
-
-
Christophe Gisquet authored
Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172c - 64 bits: 323c -> 156c Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 08 Sep, 2012 1 commit
-
-
Diego Biurrun authored
This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.
-
- 23 Feb, 2012 2 commits
-
-
Christophe GISQUET authored
Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Christophe GISQUET authored
The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 09 Sep, 2010 1 commit
-
-
Måns Rullgård authored
Instead of defining functions in per-arch header files included by the main cpu.c, define them normally and call them from the generic one. Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 Sep, 2010 1 commit
-
-
Stefano Sabatini authored
function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 22 Dec, 2008 1 commit
-
-
Diego Biurrun authored
It contains optimizations that are not specific to i386 and libavutil uses this naming scheme already. Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 12 Aug, 2008 1 commit
-
-
Loren Merritt authored
c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse. Originally committed as revision 14698 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 09 May, 2008 1 commit
-
-
Diego Biurrun authored
Originally committed as revision 13098 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 May, 2008 1 commit
-
-
Ramiro Polla authored
typedef x86_reg as the appropriate size and use it instead. Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 16 May, 2007 1 commit
-
-
Ronald S. Bultje authored
include paths in the source files. mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 07 Oct, 2006 1 commit
-
-
Diego Biurrun authored
and fix GPL/LGPL version mismatches. Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 18 Aug, 2006 1 commit
-
-
Loren Merritt authored
2.5% faster fft, 0.5% faster vorbis. Originally committed as revision 6023 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 Mar, 2006 1 commit
-
-
Zuxy Meng authored
Patch by Zuxy Meng, zuxy <<dot>> meng >>at<< gmail <<dot>> com Minor non-functional diff-related fixes by me. Originally committed as revision 5125 to svn://svn.ffmpeg.org/ffmpeg/trunk
-