- 16 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 15 Aug, 2012 3 commits
-
-
Martin Storsjö authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 13 Aug, 2012 2 commits
-
-
Mans Rullgard authored
This fixes two issues preventing suncc from building this code. The undocumented 'a' operand modifier, causing gcc to omit a $ in front of immediate operands (as required in addresses), is not supported by suncc. Luckily, the also undocumented 'c' modifer has the same effect and is supported. On some asm statements with a large number of operands, suncc for no obvious reason fails to correctly substitute some of the operands. Fortunately, some of the operands in these statements are plain numbers which can be inserted directly into the code block instead of passed as operands. With these changes, the code builds correctly with both gcc and suncc. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
This code contains a C array of addresses of labels defined in inline asm. To do this, the names must be declared as external in C. The declared type does not matter since only the address is used, and for some reason, the author of the code used the 'void' type despite taking the address of a void expression being invalid. Changing the type to char, a reasonable choice since the alignment of the code labels cannot be known or guaranteed, eliminates gcc warnings and allows building with suncc. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 12 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 08 Aug, 2012 3 commits
-
-
Mans Rullgard authored
This macro is only used in two places, both in libavcodec, so this is a more sensible place for it. Two small tweaks to the macro are made: - removing the trailing semicolon - dropping unnecessary 'volatile' from the x86 asm Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Dave Yeo authored
The a.out object format does not allow aligning sections. On OS/2 LD aligns sections to 16 bytes. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 07 Aug, 2012 3 commits
-
-
Mans Rullgard authored
yasm tolerates mismatch between movd/movq and source register size, adjusting the instruction according to the register. nasm is more strict. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
nasm prints a warning if the colon is missing. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Anton Khirnov authored
-
- 05 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 03 Aug, 2012 5 commits
-
-
Ronald S. Bultje authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
-
Diego Biurrun authored
Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
-
Ronald S. Bultje authored
This makes add_hfyu_left_prediction_sse4() handle sources that are not 16-byte aligned in its own function rather than by proxying the call to add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs) does not restore it, thus leading to XMM clobbering and RSP being off. Fixes bug 342.
-
Diego Biurrun authored
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.
-
- 02 Aug, 2012 4 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Ronald S. Bultje authored
Some calculations were changed in b6a3849a to use mmsize, which was not correct for the AVX version, which uses INIT_YMM and therefore has mmsize == 32. Fixes Bug 341. Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
Mans Rullgard authored
These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 01 Aug, 2012 2 commits
-
-
Ronald S. Bultje authored
64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
-
Ronald S. Bultje authored
-
- 31 Jul, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
This fixes compilation with YASM disabled.
-
Ronald S. Bultje authored
This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.
-
- 28 Jul, 2012 6 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
Without this, cglobal will expand "z" to "zh" to access the high byte in a register's word, which causes a name collision with the ZH(x) macro further up in this file.
-
Ronald S. Bultje authored
64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX version will never be used.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 27 Jul, 2012 4 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
All x86-64 CPUs have SSE2, so the MMX version will never be used. This leads to smaller binaries.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 26 Jul, 2012 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-