- 03 Aug, 2012 1 commit
-
-
Diego Biurrun authored
Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
-
- 02 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 31 Jul, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
This fixes compilation with YASM disabled.
-
Ronald S. Bultje authored
This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.
-
- 28 Jul, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 23 Jun, 2012 1 commit
-
-
Diego Biurrun authored
-
- 10 Jun, 2012 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 12 Feb, 2012 1 commit
-
-
Reimar Döffinger authored
Some MMX-only CPUs do not have support for CMOV. All SSE/MMX2 CPUs should be fine, thus no check was added to those functions. See also https://sourceforge.net/tracker/?func=detail&aid=3358347&group_id=205275&atid=992986Signed-off-by:
Reimar Döffinger <Reimar.Doeffinger@gmx.de>
-
- 21 Oct, 2011 3 commits
-
-
Ronald S. Bultje authored
Neon parts by Mans Rullgard <mans@mansr.com>.
-
Ronald S. Bultje authored
-
Baptiste Coudurier authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de> Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 14 Aug, 2011 1 commit
-
-
Baptiste Coudurier authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 11 Jul, 2011 1 commit
-
-
Jason Garrett-Glaser authored
Much faster high bit depth deblocking.
-
- 21 Jun, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 16 Jun, 2011 1 commit
-
-
Carl Eugen Hoyos authored
-
- 01 Jun, 2011 1 commit
-
-
Daniel Kang authored
Fixes regression in 836f47d3 in ICC-10.x, since ICC<=11.0 doesn't align stack upon function calls. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 31 May, 2011 2 commits
-
-
Daniel Kang authored
Signed-off-by:
Ronald S. Bultje <rbultje@google.com>
-
Daniel Kang authored
Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by:
Ronald S. Bultje <rbultje@google.com>
-
- 16 May, 2011 1 commit
-
-
Gil Pedersen authored
This fixes linking errors due to undefined symbols on x86_64 OS X. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 11 May, 2011 3 commits
-
-
Jason Garrett-Glaser authored
Also delete some unused deblock asm macros.
-
Jason Garrett-Glaser authored
-
Jason Garrett-Glaser authored
Includes AVX versions from x264.
-
- 10 May, 2011 2 commits
-
-
Ronald S. Bultje authored
Should fix compile on systems missing yasm/nasm.
-
Oskar Arvidsson authored
This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 12 Apr, 2011 1 commit
-
-
Carl Eugen Hoyos authored
-
- 10 Apr, 2011 1 commit
-
-
Oskar Arvidsson authored
This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 14 Jan, 2011 1 commit
-
-
Jason Garrett-Glaser authored
About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 29 Sep, 2010 6 commits
-
-
Ronald S. Bultje authored
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 24 Sep, 2010 1 commit
-
-
Ronald S. Bultje authored
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 21 Sep, 2010 1 commit
-
-
Måns Rullgård authored
This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 18 Sep, 2010 1 commit
-
-
Måns Rullgård authored
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 14 Sep, 2010 1 commit
-
-
Ronald S. Bultje authored
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 10 Sep, 2010 1 commit
-
-
Jason Garrett-Glaser authored
This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
-