Commits · 6c593fee6335b8dda62eaa54d40504aa366b0696 · Linshizhi / ffmpeg.wasm-core

21 Jun, 2017 1 commit

build: Generalize yasm/nasm-related variable names · fd502f4f

Diego Biurrun authored 8 years ago

None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4)
Signed-off-by: James Almer <jamrial@gmail.com>

fd502f4f

28 Mar, 2017 1 commit

vp9: re-split the decoder/format/dsp interface header files. · f8c01994

Ronald S. Bultje authored 7 years ago

The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.

f8c01994

27 Mar, 2017 1 commit
- lavc/vp9: split into vp9{block,data,mvs} · 1c9f4b50
  Clément Bœsch authored 7 years ago
```
This is following Libav layout to ease merges.
```
  1c9f4b50
01 Mar, 2017 1 commit
- build: Generalize yasm/nasm-related variable names · 39e208f4
  Diego Biurrun authored 8 years ago
```
None of them are specific to the YASM assembler.
```
  39e208f4
15 Nov, 2016 1 commit

vp9: add avx2 iadst16 implementations. · 83a139e3

Ronald S. Bultje authored 8 years ago

Also a small cosmetic change to the avx2 idct16 version to make it
explicit that one of the arguments to the write-out macros is unused
for >=avx2 (it uses pmovzxbw instead of punpcklbw).

83a139e3

05 Nov, 2016 1 commit

x86: Drop stray semicolons after function definitions · 3cba09e5

Diego Biurrun authored 9 years ago

libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]

3cba09e5

03 Nov, 2016 1 commit

vp9: Flip the order of arguments in MC functions · 2e55e26b

Martin Storsjö authored 8 years ago

This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.
Signed-off-by: Martin Storsjö <martin@martin.st>

2e55e26b

04 Oct, 2016 13 commits
- vp9lpf/x86: make filter_16_h work on 32-bit. · 715f139c
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  715f139c
- vp9lpf/x86: make filter_48/84/88_h work on 32-bit. · 8915320d
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  8915320d
- vp9lpf/x86: make filter_44_h work on 32-bit. · 725a2164
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  725a2164
- vp9lpf/x86: make filter_16_v work on 32-bit. · 5bfa96c4
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  5bfa96c4
- vp9lpf/x86: make filter_48/84_v work on 32-bit. · b905e8d2
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  b905e8d2
- vp9lpf/x86: make filter_88_v work on 32-bit. · 37637e65
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  37637e65
- vp9lpf/x86: make filter_44_v work on 32-bit. · be10834b
  Ronald S. Bultje authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  be10834b
- vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}. · 0ed21bdc
  Clément Bœsch authored 11 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  0ed21bdc
- vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). · f2e3d706
  Clément Bœsch authored 11 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  f2e3d706
- vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16 · 92d47550
  James Almer authored 11 years ago
```
Similar gains as the ssse3 version once again

Additional improvements by Clément Bœsch <u@pkh.me>.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  92d47550
- vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. · 6bea4781
  Clément Bœsch authored 11 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  6bea4781
- vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). · 1f451eed
  James Almer authored 11 years ago
```
Similar gains in performance as the SSSE3 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  1f451eed
- vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16. · a692724c
  Clément Bœsch authored 11 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  a692724c
03 Aug, 2016 5 commits

vp9mc/x86: sse2 MC assembly. · 9790b44a

Ronald S. Bultje authored 10 years ago

Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.
Signed-off-by: Anton Khirnov <anton@khirnov.net>

9790b44a

vp9mc/x86: add AVX and AVX2 MC · 67922b4e

James Almer authored 10 years ago

Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

67922b4e

vp9mc/x86: rename ff_* to ff_vp9_* · 3cda179f
Clément Bœsch authored 10 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
3cda179f

vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext · 8be8444d

James Almer authored 11 years ago

pavgb is an sse integer instruction, so the mmxext flag is enough
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

8be8444d

vp9mc/x86: add 16px functions (64bit only). · 3a094949
Ronald S. Bultje authored 11 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
3a094949

26 Jul, 2016 3 commits

vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. · a4edaa02

Ronald S. Bultje authored 8 years ago

Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).

a4edaa02

vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. · 7ca422bb

Ronald S. Bultje authored 8 years ago

Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).

7ca422bb

vp9: add 32x32 idct AVX2 implementation. · 726501a3

Ronald S. Bultje authored 8 years ago

About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):

nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0

726501a3

11 Jul, 2016 1 commit

vp9: add 16x16 idct avx2 (8-bit). · f0a2b624

Ronald S. Bultje authored 8 years ago

checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:

nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4

f0a2b624

28 May, 2016 1 commit
- Drop unnecessary libavutil/x86/asm.h #includes · dc40a70c
  Diego Biurrun authored 8 years ago
  
  dc40a70c
14 Feb, 2016 1 commit

x86: use the new helper macros where useful · 70d685a7

James Almer authored 9 years ago

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>

70d685a7

24 Oct, 2015 1 commit

all: fix -Wextra-semi reported on clang · 38f4e973

Ganesh Ajjanagadde authored 9 years ago

This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>

38f4e973

13 Oct, 2015 1 commit
- vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. · 1c3be325
  Ronald S. Bultje authored 9 years ago
  
  1c3be325
17 Sep, 2015 3 commits
- vp9: add subpel MC SIMD for 10/12bpp. · 344d5190
  Ronald S. Bultje authored 9 years ago
  
  344d5190
- vp9: add fullpel (avg) MC SIMD for 10/12bpp. · 77f35967
  Ronald S. Bultje authored 9 years ago
  
  77f35967
- vp9: add fullpel (put) MC SIMD for 10/12bpp. · 6354ff03
  Ronald S. Bultje authored 9 years ago
  
  6354ff03
10 Sep, 2015 1 commit

vp9: fix overflow in 8x8 topleft 32x32 idct ssse3 version. · fd8b90f5

Ronald S. Bultje authored 9 years ago

Also disable the mmx/iwht optimization when the bitexact flag is set.
With synthetically coded coefficients (i.e. these that lead to a
residual well outside the [-255,255] range), our optimizations will
overflow. It doesn't make sense to fix the overflows, since they can
only occur on synthetic input, not on real fwht-generated input. Thus,
add a bitexact flag that disables this optimization.

fd8b90f5

31 May, 2015 1 commit

x86: check for AV_CPU_FLAG_AVXSLOW where useful · c16e99e3

James Almer authored 9 years ago

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

c16e99e3

07 May, 2015 1 commit
- avcodec/x86/vp9dsp_init: Fix mix of declaration and statement · cc77bb09
  Michael Niedermayer authored 9 years ago
```
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  cc77bb09
06 May, 2015 1 commit
- vp9: add keyframe profile 2/3 support. · b224b165
  Ronald S. Bultje authored 9 years ago
  
  b224b165