Commits · 0b40153d20ba80e842d288038a62c4dfdc5384a3 · Linshizhi / ffmpeg.wasm-core

06 Jul, 2012 1 commit

x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro · f27386cd

Martin Storsjö authored 12 years ago

The SPLATB_REG macro already adds the 'd' suffix internally.

This fixes building on Win64, which has been broken since 878e6690.

This worked for unix, where r2 happened to be rdx in this case, which
with the first suffix rdxd was mapped to eax, and eaxd is defined back
to eax. On win64 however, r2 happened to be R8 in this case, and
R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything.
Signed-off-by: Martin Storsjö <martin@martin.st>

f27386cd

05 Jul, 2012 4 commits
- x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros · 878e6690
  Diego Biurrun authored 12 years ago
  
  878e6690
- x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros · 4d475236
  Loren Merritt authored 12 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  4d475236
- x86: h264_intrapred: port to cpuflag macros · d20f133e
  Diego Biurrun authored 12 years ago
  
  d20f133e
- vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too · 07eeeb1d
  Martin Storsjö authored 12 years ago
```
This was missed in the the previous commit in 70a1c800.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  07eeeb1d
04 Jul, 2012 2 commits
- vp8: loopfilter >=sse2 functions need aligned stack on x86-32. · 70a1c800
  Martin Storsjö authored 12 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  70a1c800
- dsputilenc: group yasm and inline asm function pointer assignment. · 723b266d
  Ronald S. Bultje authored 12 years ago
  
  723b266d
30 Jun, 2012 2 commits
- dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section. · ceabc13f
  Ronald S. Bultje authored 12 years ago
  
  ceabc13f
- x86: fmtconvert: add special asm for float_to_int16_interleave_misc_* · 66a02159
  Ronald S. Bultje authored 12 years ago
```
This gets rid of a variable-length array and a for loop in C code.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  66a02159
29 Jun, 2012 1 commit

x86: vc1: fix and enable optimised loop filter · f2fd1678

Mans Rullgard authored 12 years ago

The problem is that the ssse3 psign instruction does the wrong
thing here.  Commit ea60dfe2 incorrectly removed a macro emulating
this instruction for pre-ssse3 code.  However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.

Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it.  Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.

None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe2).
Signed-off-by: Mans Rullgard <mans@mansr.com>

f2fd1678

27 Jun, 2012 1 commit

x86: fft: replace call to memcpy by a loop · a5bfa66d

Christophe Gisquet authored 12 years ago

The function call was a mess to handle, and memcpy cannot make
the assumptions we do in the new code.

Tested on an IMC sample: 430c -> 370c.
Signed-off-by: Mans Rullgard <mans@mansr.com>

a5bfa66d

25 Jun, 2012 4 commits
- x86: fft: elf64: fix PIC build · 05953348
  Mans Rullgard authored 12 years ago
```
In a 64-bit PIC build, external functions must be called
through the PLT.
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  05953348
- x86: fft: win64: fix stack alignment for memcpy() call · 8725da49
  Mans Rullgard authored 12 years ago
  
  8725da49
- x86: fft: convert sse inline asm to yasm · 82992604
  Mans Rullgard authored 12 years ago
  
  82992604
- x86: place some inline asm under #if HAVE_INLINE_ASM · 8123e090
  Ronald S. Bultje authored 12 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  8123e090
23 Jun, 2012 4 commits
- h264: use asm cabac reader under a generic condition · 0b6f9736
  Mans Rullgard authored 12 years ago
```
This removes a dependency on implementation details from generic
code and allows easy addition of the equivalent optimisation for
other architectures than x86.
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  0b6f9736
- x86: Only use optimizations with cmov if the CPU supports the instruction · fe07c9c6
  Diego Biurrun authored 12 years ago
  
  fe07c9c6
- x86: remove unused inline asm macros from dsputil_mmx.h · 29686d6e
  Mans Rullgard authored 12 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  29686d6e
- x86: move some inline asm macros to the only places they are used · 685f5438
  Mans Rullgard authored 12 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  685f5438
22 Jun, 2012 1 commit
- cosmetics: do not use full path for local headers · a5a93fa8
  Diego Biurrun authored 12 years ago
  
  a5a93fa8
17 Jun, 2012 1 commit
- dwt: remove variable-length arrays · d9669eab
  Ronald S. Bultje authored 12 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  d9669eab
08 Jun, 2012 1 commit
- Add a float DSP framework to libavutil · d5a7229b
  Justin Ruggles authored 12 years ago
```
Move vector_fmul() from DSPContext to AVFloatDSPContext.
```
  d5a7229b
29 May, 2012 1 commit
- x86: use new schema for ASM macros · bac0729d
  Vitor Sessak authored 12 years ago
```
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
```
  bac0729d
22 May, 2012 1 commit
- x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code. · 713548cb
  Justin Ruggles authored 12 years ago
```
This is needed for older versions of yasm/nasm that do not support AVX.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  713548cb
21 May, 2012 1 commit
- Convert vector_fmul range of functions to YASM and add AVX versions · 5ff01259
  Kieran Kunhya authored 12 years ago
```
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
```
  5ff01259
15 May, 2012 2 commits
- x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. · 6797d194
  Michael Kostylev authored 12 years ago
  
  6797d194
- ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16 · 95a98ab3
  Justin Ruggles authored 12 years ago
```
Simplifies the code by using cpuflags and a new macro.
Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only
function.
```
  95a98ab3
14 May, 2012 1 commit
- x86: use more standard construct for setting ASM functions in FFT code · fcc456b8
  Vitor Sessak authored 12 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  fcc456b8
12 May, 2012 1 commit
- x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions. · ea60dfe2
  Michael Kostylev authored 12 years ago
  
  ea60dfe2
10 May, 2012 1 commit

rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc

Christophe Gisquet authored 12 years ago

Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

110d0cdc

02 May, 2012 1 commit
- snowdsp: explicitily state instruction size. · bec207f9
  Ronald S. Bultje authored 12 years ago
```
Fixes a compile error with clang at -O0.
```
  bec207f9
28 Apr, 2012 5 commits

dsputil x86: revert a test back to its previous value · e75d1d4f
Christophe GISQUET authored 12 years ago
```
Commit 356ee8d7 caused the initial inversion.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
e75d1d4f
rv34dsp x86: implement MMX2 inverse transform · fe5ed69d
Christophe Gisquet authored 12 years ago
```
141 cycles down to 51.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
fe5ed69d

h264: new assembly version of get_cabac for x86_64 with PIC · 9b9df1cd

Roland Scheidegger authored 12 years ago

This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

9b9df1cd

h264: use one table instead of several for cabac functions · 14e9ffc1

Roland Scheidegger authored 12 years ago

The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.

The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

14e9ffc1

h264: (trivial) remove unneeded macro argument in x86/cabac.h · 444f47b5
Roland Scheidegger authored 12 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
444f47b5

21 Apr, 2012 2 commits

Remove lowres video decoding · 2bcbd984

Mans Rullgard authored 12 years ago

This feature is complex, of questionable utility, and slows down
normal decoding.
Signed-off-by: Mans Rullgard <mans@mansr.com>

2bcbd984

avcodec: remove AVCodecContext.dsp_mask · 95510be8

Mans Rullgard authored 12 years ago

This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump.  It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>

95510be8

16 Apr, 2012 1 commit
- h264: use proper PROLOGUE statement for a function using 8 registers. · 87a24634
  Ronald S. Bultje authored 12 years ago
```
Fixes crashes when using biweight on win64.
```
  87a24634
13 Apr, 2012 1 commit

dsputil: fix optimized emu_edge function on Win64. · b089ca87

Ronald S. Bultje authored 12 years ago

Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.

b089ca87