Commits · b41d481aa4ee6ec916f7bea35dcaf8d35c0a19c1 · Linshizhi / ffmpeg.wasm-core

08 Aug, 2012 1 commit

x86: pngdsp: Fix assembly for OS/2 · 197439c1

Dave Yeo authored 12 years ago

The a.out object format does not allow aligning sections.
On OS/2 LD aligns sections to 16 bytes.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

197439c1

07 Aug, 2012 3 commits

x86: use 32-bit source registers with movd instruction · 2b140a3d

Mans Rullgard authored 12 years ago

yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register.  nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>

2b140a3d

x86: add colons after labels · a3df4781

Mans Rullgard authored 12 years ago

nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>

a3df4781

Replace all CODEC_ID_* with AV_CODEC_ID_* · 36ef5369
Anton Khirnov authored 12 years ago

36ef5369

05 Aug, 2012 1 commit
- x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2 · 20968575
  Diego Biurrun authored 12 years ago
  
  20968575
03 Aug, 2012 5 commits

fft: 3dnow: fix register name typo in DECL_IMDCT macro · 4a8143e7
Ronald S. Bultje authored 12 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
4a8143e7
x86: dct32: port to cpuflags · 0c3ff198
Diego Biurrun authored 12 years ago

0c3ff198

x86: build: replace mmx2 by mmxext · 239fdf1b

Diego Biurrun authored 12 years ago

Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.

239fdf1b

dsputil: make add_hfyu_left_prediction_sse4() support unaligned src. · da6505ad

Ronald S. Bultje authored 12 years ago

This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.

Fixes bug 342.

da6505ad

x86: Use consistent 3dnowext function and macro name suffixes · ca844b7b

Diego Biurrun authored 12 years ago

Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.

ca844b7b

02 Aug, 2012 5 commits

x86: proresdsp: improve SIGNEXTEND macro comments · 03737412
Diego Biurrun authored 12 years ago

03737412

fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64. · 9f14cd91

Ronald S. Bultje authored 12 years ago

64-bit CPUs always have SSE available, thus there is no need to compile
in the 3dnow functions. This results in smaller binaries.

9f14cd91

x86: h264dsp: K&R formatting cosmetics · 81905088
Diego Biurrun authored 12 years ago

81905088

x86: fft: fix imdct_half() for AVX · c728518b

Ronald S. Bultje authored 12 years ago

Some calculations were changed in b6a3849a to use mmsize, which was not correct
for the AVX version, which uses INIT_YMM and therefore has mmsize == 32.

Fixes Bug 341.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>

c728518b

x86: remove libmpeg2 mmx(ext) idct functions · ec7c501e

Mans Rullgard authored 12 years ago

These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate.  There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>

ec7c501e

01 Aug, 2012 2 commits
- fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64. · b6a3849a
  Ronald S. Bultje authored 12 years ago
```
64-bit CPUs always have SSE available, thus there is no need to compile
in the 3dnow functions. This results in smaller binaries.
```
  b6a3849a
- x86/dsputilenc: bury inline asm under HAVE_INLINE_ASM. · 53dfaedc
  Ronald S. Bultje authored 12 years ago
  
  53dfaedc
31 Jul, 2012 3 commits
- x86: h264dsp: Remove unused variable ff_pb_3_1 · 6376a3ad
  Diego Biurrun authored 12 years ago
  
  6376a3ad
- x86: h264dsp: Adjust YASM #ifdefs · 8728b381
  Diego Biurrun authored 12 years ago
```
This fixes compilation with YASM disabled.
```
  8728b381
- h264: convert loop filter strength dsp function to yasm. · b829b4ce
  Ronald S. Bultje authored 12 years ago
```
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
```
  b829b4ce
28 Jul, 2012 6 commits
- h264_idct_10bit: port x86 assembly to cpuflags. · c83f44db
  Ronald S. Bultje authored 12 years ago
  
  c83f44db
- fft: rename "z" to "zc" to prevent name collision. · b3c5ae56
  Ronald S. Bultje authored 12 years ago
```
Without this, cglobal will expand "z" to "zh" to access the high byte
in a register's word, which causes a name collision with the ZH(x) macro
further up in this file.
```
  b3c5ae56
- vp3: don't compile mmx IDCT functions on x86-64. · 4d777eed
  Ronald S. Bultje authored 12 years ago
```
64-bit CPUs always have SSE2, and a SSE2 version exists, thus the MMX
version will never be used.
```
  4d777eed
- h264_loopfilter: port x86 simd to cpuflags. · a5bbb124
  Ronald S. Bultje authored 12 years ago
  
  a5bbb124
- h264_chromamc_10bit: port x86 simd to cpuflags. · d07ff3cd
  Ronald S. Bultje authored 12 years ago
  
  d07ff3cd
- vp3: port x86 SIMD to cpuflags. · 4a26fdd8
  Ronald S. Bultje authored 12 years ago
  
  4a26fdd8
27 Jul, 2012 5 commits
- rv34: port x86 SIMD to cpuflags. · 76888c64
  Ronald S. Bultje authored 12 years ago
  
  76888c64
- vp56: only compile MMX SIMD on x86-32. · 158744a4
  Ronald S. Bultje authored 12 years ago
```
All x86-64 CPUs have SSE2, so the MMX version will never be used. This
leads to smaller binaries.
```
  158744a4
- vp56: port x86 simd to cpuflags. · 2734ba78
  Ronald S. Bultje authored 12 years ago
  
  2734ba78
- proresdsp: port x86 assembly to cpuflags. · 5361e10a
  Ronald S. Bultje authored 12 years ago
  
  5361e10a
- dwt: Fix several warnings about incompatible pointer type · 52a62f90
  jamal authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  52a62f90
26 Jul, 2012 2 commits
- mpegaudio: bury inline asm under HAVE_INLINE_ASM. · bde73f28
  Ronald S. Bultje authored 12 years ago
  
  bde73f28
- x86inc: automatically insert vzeroupper for YMM functions. · 30b45d9c
  Ronald S. Bultje authored 12 years ago
  
  30b45d9c
25 Jul, 2012 3 commits

vp3: don't use calls to inline asm in yasm code. · a1878a88

Ronald S. Bultje authored 12 years ago

Mixing yasm and inline asm is a bad idea, since if either yasm or inline
asm is not supported by your toolchain, all of the asm stops working.
Thus, better to use either one or the other alone.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

a1878a88

x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5

Ronald S. Bultje authored 12 years ago

This allows compiling with compilers that don't support gcc-style
inline assembly.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

79195ce5

dsputil_mmx: fix incorrect assembly code · 845e92fd

Yang Wang authored 12 years ago

In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.

From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.

This will cause a segmentation fault.

This error was fixed in the second block of the assembly code, but not in
the unrolled loop.

How to reproduce:
    This error is exposed when we build using Intel C++ Compiler, with
    IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

845e92fd

23 Jul, 2012 1 commit

dsputil_mmx: fix incorrect assembly code · 6a2bad2c

yang authored 12 years ago

In file libavcodec/x86/dsputil_mmx.c, function ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t" etc have problem.
For above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: “movq 8(%edi), %mm1”. During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes “movq 810000(%edx)”. That is, it will stride by 810000 instead of 8.
This will cause the segmentation fault.
This error was fixed in the second block of the assembly code, but not in the unrolled loop.

How to reproduce:
This error is exposed when we build the ffmpeg using Intel C++ Compiler, IPO+PGO optimization. The ffmpeg was crashed when decoding a mjpeg video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

6a2bad2c

22 Jul, 2012 1 commit
- dsputil: x86: add SHUFFLE_MASK_W macro · 85a3c19e
  Jason Garrett-Glaser authored 12 years ago
```
Simplifies pshufb masks that operate on words.
```
  85a3c19e
19 Jul, 2012 1 commit
- x86: dsputil: drop some unused CPU flag debug code · 9f97af26
  Diego Biurrun authored 12 years ago
  
  9f97af26
18 Jul, 2012 1 commit

vp3: move idct and loop filter pointers to new vp3dsp context · 28f9ab70

Mans Rullgard authored 12 years ago

This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context.  There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>

28f9ab70