- 12 Dec, 2012 2 commits
-
-
Ronald S. Bultje authored
Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ronald S. Bultje authored
Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
- 13 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 30 Oct, 2012 2 commits
-
-
Diego Biurrun authored
This is more consistent with the way we handle C #includes and it simplifies the build system.
-
Diego Biurrun authored
This is necessary to allow refactoring some x86util macros with cpuflags.
-
- 07 Aug, 2012 1 commit
-
-
Mans Rullgard authored
nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 05 Jul, 2012 1 commit
-
-
Loren Merritt authored
Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
- 04 Apr, 2012 1 commit
-
-
Christophe GISQUET authored
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 10 Mar, 2012 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 04 Mar, 2012 5 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 02 Mar, 2012 1 commit
-
-
Ronald S. Bultje authored
x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.
-
- 19 Oct, 2011 1 commit
-
-
Kieran Kunhya authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 15 Aug, 2011 1 commit
-
-
Dave Yeo authored
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 12 Aug, 2011 2 commits
-
-
Ronald S. Bultje authored
This allows using it in swscale also.
-
Ronald S. Bultje authored
This allows using it in libswscale/ also.
-
- 17 May, 2011 1 commit
-
-
Daniel Kang authored
Arguments for variable size instructions are added to many macros, along with other various changes. The x86util.asm code was ported from x264. Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
- 14 May, 2011 1 commit
-
-
Diego Biurrun authored
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 05 Sep, 2010 1 commit
-
-
Reimar Döffinger authored
This increases compatibilty with nasm and is also more consistent, e.g. with h264_intrapred.asm and h264_chromamc.asm that already do it that way. Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 24 Aug, 2010 1 commit
-
-
Ronald S. Bultje authored
two VP8-related fate failures on Win64. Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 23 Aug, 2010 1 commit
-
-
Ronald S. Bultje authored
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 02 Aug, 2010 1 commit
-
-
Jason Garrett-Glaser authored
Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 31 Jul, 2010 1 commit
-
-
Ronald S. Bultje authored
unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 26 Jul, 2010 6 commits
-
-
Ronald S. Bultje authored
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Jason Garrett-Glaser authored
5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 25 Jul, 2010 1 commit
-
-
Ronald S. Bultje authored
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 24 Jul, 2010 1 commit
-
-
Ronald S. Bultje authored
splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 23 Jul, 2010 4 commits
-
-
Jason Garrett-Glaser authored
Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Jason Garrett-Glaser authored
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Jason Garrett-Glaser authored
Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Jason Garrett-Glaser authored
~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
-