- 15 May, 2017 5 commits
-
-
James Darnley authored
Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
-
James Darnley authored
Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
-
James Darnley authored
Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
-
James Darnley authored
-
James Darnley authored
-
- 14 Mar, 2017 1 commit
-
-
Diego Biurrun authored
-
- 30 Nov, 2016 1 commit
-
-
James Darnley authored
2.87 times faster (1830 vs. 638 cycles)
-
- 16 Jun, 2016 1 commit
-
-
Martin Storsjö authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 13 Mar, 2014 1 commit
-
-
Diego Biurrun authored
This helps grepping for functions, among other things.
-
- 07 Oct, 2013 1 commit
-
-
Diego Biurrun authored
-
- 21 Aug, 2013 1 commit
-
-
Diego Biurrun authored
-
- 10 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 19 Feb, 2013 1 commit
-
-
Ronald S. Bultje authored
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 23 Jan, 2013 1 commit
-
-
Diego Biurrun authored
It does not help as an abstraction and adds dsputil dependencies. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 27 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 13 Nov, 2012 1 commit
-
-
Diego Biurrun authored
-
- 31 Oct, 2012 1 commit
-
-
Diego Biurrun authored
-
- 30 Oct, 2012 2 commits
-
-
Diego Biurrun authored
This is more consistent with the way we handle C #includes and it simplifies the build system.
-
Diego Biurrun authored
This is necessary to allow refactoring some x86util macros with cpuflags.
-
- 07 Aug, 2012 1 commit
-
-
Mans Rullgard authored
nasm prints a warning if the colon is missing. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 05 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 11 Apr, 2012 1 commit
-
-
Henrik Gramner authored
Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
- 08 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.
-
- 27 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
This allows combining multiple conditionals in a single statement.
-
- 19 Oct, 2011 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 15 Aug, 2011 1 commit
-
-
Dave Yeo authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 12 Aug, 2011 2 commits
-
-
Ronald S. Bultje authored
This allows using it in swscale also.
-
Ronald S. Bultje authored
This allows using it in libswscale/ also.
-
- 29 Jul, 2011 1 commit
-
-
Jason Garrett-Glaser authored
-
- 14 Jun, 2011 1 commit
-
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-
- 13 Jun, 2011 2 commits
-
-
Jason Garrett-Glaser authored
Needs some ARM/PPC asm modifications.
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-
- 31 May, 2011 1 commit
-
-
Daniel Kang authored
Signed-off-by:
Ronald S. Bultje <rbultje@google.com>
-
- 17 May, 2011 1 commit
-
-
Daniel Kang authored
Arguments for variable size instructions are added to many macros, along with other various changes. The x86util.asm code was ported from x264. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 14 May, 2011 1 commit
-
-
Diego Biurrun authored
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 14 Jan, 2011 1 commit
-
-
Jason Garrett-Glaser authored
About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 26 Sep, 2010 1 commit
-
-
Reimar Döffinger authored
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 24 Sep, 2010 2 commits
-
-
Ronald S. Bultje authored
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Ronald S. Bultje authored
code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
-