- 08 Aug, 2012 1 commit
-
-
Dave Yeo authored
The a.out object format does not allow aligning sections. On OS/2 LD aligns sections to 16 bytes. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 07 Aug, 2012 3 commits
-
-
Mans Rullgard authored
yasm tolerates mismatch between movd/movq and source register size, adjusting the instruction according to the register. nasm is more strict. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
nasm prints a warning if the colon is missing. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Anton Khirnov authored
-
- 05 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 03 Aug, 2012 5 commits
-
-
Ronald S. Bultje authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
-
Diego Biurrun authored
Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
-
Ronald S. Bultje authored
This makes add_hfyu_left_prediction_sse4() handle sources that are not 16-byte aligned in its own function rather than by proxying the call to add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs) does not restore it, thus leading to XMM clobbering and RSP being off. Fixes bug 342.
-
Diego Biurrun authored
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.
-
- 02 Aug, 2012 5 commits
-
-
Diego Biurrun authored
-
Ronald S. Bultje authored
64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
-
Diego Biurrun authored
-
Ronald S. Bultje authored
Some calculations were changed in b6a3849a to use mmsize, which was not correct for the AVX version, which uses INIT_YMM and therefore has mmsize == 32. Fixes Bug 341. Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
Mans Rullgard authored
These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 01 Aug, 2012 2 commits
-
-
Ronald S. Bultje authored
64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
-
Ronald S. Bultje authored
-
- 31 Jul, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
This fixes compilation with YASM disabled.
-
Ronald S. Bultje authored
This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.
-
- 28 Jul, 2012 6 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
Without this, cglobal will expand "z" to "zh" to access the high byte in a register's word, which causes a name collision with the ZH(x) macro further up in this file.
-
Ronald S. Bultje authored
64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX version will never be used.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 27 Jul, 2012 5 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
All x86-64 CPUs have SSE2, so the MMX version will never be used. This leads to smaller binaries.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
jamal authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 26 Jul, 2012 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 25 Jul, 2012 3 commits
-
-
Ronald S. Bultje authored
Mixing yasm and inline asm is a bad idea, since if either yasm or inline asm is not supported by your toolchain, all of the asm stops working. Thus, better to use either one or the other alone. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Ronald S. Bultje authored
This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Yang Wang authored
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at> Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 23 Jul, 2012 1 commit
-
-
yang authored
In file libavcodec/x86/dsputil_mmx.c, function ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t" etc have problem. For above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: “movq 8(%edi), %mm1”. During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes “movq 810000(%edx)”. That is, it will stride by 810000 instead of 8. This will cause the segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build the ffmpeg using Intel C++ Compiler, IPO+PGO optimization. The ffmpeg was crashed when decoding a mjpeg video. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 22 Jul, 2012 1 commit
-
-
Jason Garrett-Glaser authored
Simplifies pshufb masks that operate on words.
-
- 19 Jul, 2012 1 commit
-
-
Diego Biurrun authored
-
- 18 Jul, 2012 1 commit
-
-
Mans Rullgard authored
This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-