- 28 Jul, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 25 Jul, 2012 2 commits
-
-
Ronald S. Bultje authored
This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Yang Wang authored
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 19 Jul, 2012 1 commit
-
-
Diego Biurrun authored
-
- 18 Jul, 2012 1 commit
-
-
Mans Rullgard authored
This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 23 Jun, 2012 2 commits
-
-
Diego Biurrun authored
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 08 Jun, 2012 1 commit
-
-
Justin Ruggles authored
Move vector_fmul() from DSPContext to AVFloatDSPContext.
-
- 21 May, 2012 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
- 10 May, 2012 1 commit
-
-
Christophe Gisquet authored
Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 28 Apr, 2012 1 commit
-
-
Christophe GISQUET authored
Commit 356ee8d7 caused the initial inversion. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 21 Apr, 2012 2 commits
-
-
Mans Rullgard authored
This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 04 Apr, 2012 1 commit
-
-
Christophe GISQUET authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 25 Mar, 2012 4 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
This makes them safe to use in non-fully braced if-blocks and similar.
-
- 05 Mar, 2012 1 commit
-
-
Mans Rullgard authored
This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by:
Mans Rullgard <mans@mansr.com> Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 15 Feb, 2012 1 commit
-
-
Martin Storsjö authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 30 Jan, 2012 1 commit
-
-
Christophe Gisquet authored
While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 29 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 25 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.
-
- 14 Dec, 2011 1 commit
-
-
Diego Biurrun authored
-
- 22 Nov, 2011 1 commit
-
-
Justin Ruggles authored
This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.
-
- 11 Nov, 2011 1 commit
-
-
Justin Ruggles authored
-
- 07 Nov, 2011 1 commit
-
-
Justin Ruggles authored
-
- 26 Oct, 2011 1 commit
-
-
Daniel Kang authored
Add whitespace. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 11 Oct, 2011 1 commit
-
-
Ronald S. Bultje authored
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
-
- 15 Aug, 2011 1 commit
-
-
Alex Converse authored
-
- 11 Aug, 2011 1 commit
-
-
Kostya Shishkov authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 29 Jul, 2011 1 commit
-
-
Jason Garrett-Glaser authored
-
- 21 Jul, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 20 Jul, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 18 Jul, 2011 1 commit
-
-
Diego Biurrun authored
-
- 10 Jul, 2011 1 commit
-
-
Mans Rullgard authored
This macro can cause problems in conjunction with the bitdepth template expansion. It was presumably added to keep source compatibility when high bitdepth support was added. However, emulated_edge_mc is a dsputil pointer and should not be called directly, so there is little reason to keep such a macro. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 08 Jul, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 predict. Some code ported from x264. LGPL ok by author. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 04 Jul, 2011 2 commits
-
-
Daniel Kang authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Daniel Kang authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 03 Jul, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-