- 13 Mar, 2014 1 commit
-
-
Diego Biurrun authored
This helps grepping for functions, among other things.
-
- 07 Oct, 2013 1 commit
-
-
Diego Biurrun authored
-
- 21 Aug, 2013 1 commit
-
-
Diego Biurrun authored
-
- 10 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 23 Jan, 2013 1 commit
-
-
Diego Biurrun authored
It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 27 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 13 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 31 Oct, 2012 1 commit
-
-
Diego Biurrun authored
-
- 30 Oct, 2012 2 commits
-
-
Diego Biurrun authored
This is more consistent with the way we handle C #includes and it simplifies the build system.
-
Diego Biurrun authored
This is necessary to allow refactoring some x86util macros with cpuflags.
-
- 07 Aug, 2012 1 commit
-
-
Mans Rullgard authored
nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 05 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 11 Apr, 2012 1 commit
-
-
Henrik Gramner authored
Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
-
- 08 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.
-
- 27 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
This allows combining multiple conditionals in a single statement.
-
- 15 Aug, 2011 1 commit
-
-
Dave Yeo authored
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 12 Aug, 2011 2 commits
-
-
Ronald S. Bultje authored
This allows using it in swscale also.
-
Ronald S. Bultje authored
This allows using it in libswscale/ also.
-
- 29 Jul, 2011 1 commit
-
-
Jason Garrett-Glaser authored
-
- 14 Jun, 2011 1 commit
-
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-
- 13 Jun, 2011 2 commits
-
-
Jason Garrett-Glaser authored
Needs some ARM/PPC asm modifications.
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-
- 31 May, 2011 1 commit
-
-
Daniel Kang authored
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
-
- 17 May, 2011 1 commit
-
-
Daniel Kang authored
Arguments for variable size instructions are added to many macros, along with other various changes. The x86util.asm code was ported from x264. Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
- 14 May, 2011 1 commit
-
-
Diego Biurrun authored
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 14 Jan, 2011 1 commit
-
-
Jason Garrett-Glaser authored
About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 26 Sep, 2010 1 commit
-
-
Reimar Döffinger authored
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 24 Sep, 2010 2 commits
-
-
Ronald S. Bultje authored
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 14 Sep, 2010 1 commit
-
-
Ronald S. Bultje authored
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
-