Commits · f62c54456db0fdd3ff82397f9142715d5c479354 · Linshizhi / ffmpeg.wasm-core

18 Oct, 2016 1 commit

aacenc: add SIMD optimizations for abs_pow34 and quantization · d2ae5f77

Rostislav Pehlivanov authored 8 years ago

Performance improvements:

quant_bands:
with:     681 decicycles in quant_bands, 8388453 runs,    155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,    222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>

d2ae5f77

02 Aug, 2016 1 commit
- x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} · efc9d5c4
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  efc9d5c4
07 Apr, 2016 1 commit

build: miscellaneous cosmetics · 01621202

Diego Biurrun authored 9 years ago

Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.

01621202

01 Mar, 2016 1 commit
- fft: Split MDCT bits off from FFT · 1a094af6
  Diego Biurrun authored 9 years ago
  
  1a094af6
29 Feb, 2016 1 commit
- x86/vc1dsp: Split the file into MC and loopfilter · e3461197
  Timothy Gu authored 8 years ago
  
  e3461197
19 Feb, 2016 1 commit
- build: Add vc1dsp component for more fine-grained dependencies · 15a24614
  Diego Biurrun authored 9 years ago
  
  15a24614
06 Feb, 2016 3 commits
- x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3} · 8ae74479
  James Almer authored 9 years ago
```
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  8ae74479
- dirac_dwt: Make x86 files/functions names consistent · 9fd6ea93
  Timothy Gu authored 9 years ago
  
  9fd6ea93
- diracdsp: Make x86 files/functions names consistent · 17ab8f7e
  Timothy Gu authored 9 years ago
  
  17ab8f7e
31 Jan, 2016 2 commits
- avcodec/dca: add new decoder based on libdcadec · ae5b2c52
  foo86 authored 9 years ago
  
  ae5b2c52
- avcodec/dca: remove old decoder · 46089967
  foo86 authored 9 years ago
```
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
```
  46089967
25 Jan, 2016 1 commit
- avcodec/synth_filter: split off remaining code from dcadec files · 209f50e1
  James Almer authored 9 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  209f50e1
18 Jan, 2016 1 commit
- x86: build: Group all encoder objects together · 03ef89fa
  Diego Biurrun authored 9 years ago
  
  03ef89fa
05 Dec, 2015 1 commit
- hevcdsp: add x86 SIMD for MC · e7078e84
  Anton Khirnov authored 9 years ago
  
  e7078e84
22 Oct, 2015 1 commit
- x86/Makefile: move decoder/encoder objects out of the subsystems section · 73353af6
  James Almer authored 9 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  73353af6
21 Oct, 2015 1 commit

huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm · 6b41b441

Timothy Gu authored 9 years ago

Heavily based upon ff_add_bytes by Christophe Gisquet.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Timothy Gu <timothygu99@gmail.com>

6b41b441

13 Oct, 2015 2 commits

vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. · 1c3be325
Ronald S. Bultje authored 9 years ago

1c3be325

x86: simple_idct(_put): 10bits versions · 4369b9dc

Christophe Gisquet authored 9 years ago

Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.

For 16 frames of a DNxHD 4:2:2 10bits test sequence:

C:    60861 decicycles in idct, 1048205 runs,    371 skips
sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
avx:  26272 decicycles in idct, 1048171 runs,    405 skips

The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

4369b9dc

09 Oct, 2015 1 commit
- avcodec/takdec: add x86 SIMD for rest of decorrelation modes · 35af7add
  Paul B Mahol authored 9 years ago
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  35af7add
06 Oct, 2015 1 commit
- x86/alacdsp: add simd optimized functions · 72254b19
  James Almer authored 9 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  72254b19
03 Oct, 2015 2 commits
- vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. · 26ece7a5
  Ronald S. Bultje authored 9 years ago
  
  26ece7a5
- vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. · db7786e8
  Ronald S. Bultje authored 9 years ago
  
  db7786e8
30 Sep, 2015 1 commit
- x86/hevc_sao: move 10/12bit functions into a separate file · 3178931a
  James Almer authored 9 years ago
```
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  3178931a
17 Sep, 2015 2 commits
- vp9: add subpel MC SIMD for 10/12bpp. · 344d5190
  Ronald S. Bultje authored 9 years ago
  
  344d5190
- vp9: add fullpel (put) MC SIMD for 10/12bpp. · 6354ff03
  Ronald S. Bultje authored 9 years ago
  
  6354ff03
28 Aug, 2015 1 commit
- lavc: Drop deprecated deinterlace module · cad40a38
  Vittorio Giovara authored 9 years ago
```
Deprecated in 03/2013.
```
  cad40a38
30 Jul, 2015 1 commit

x86/aacpsdsp: add SSE and SSE3 optimized functions · 9dcaae70

James Almer authored 9 years ago

Between 1.5 and 2.5 times faster
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>

9dcaae70

17 Jul, 2015 2 commits
- configure: Factor out vp8dsp module · d42191c7
  Vittorio Giovara authored 9 years ago
  
  d42191c7
- configure: Factor out rv34dsp module · 5cb4bdb2
  Vittorio Giovara authored 9 years ago
  
  5cb4bdb2
13 Jun, 2015 1 commit

avcodec/jpeg200dsp: add ff_ict_float_{sse,avx} · 7912a683

James Almer authored 9 years ago

Original intrinsics version by Nicolas Bertrand.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>

7912a683

14 Mar, 2015 1 commit

x86: xvid_idct: port MMX iDCT to yasm · c3bf5271

Christophe Gisquet authored 9 years ago

Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

c3bf5271

13 Mar, 2015 1 commit

x86: xvid_idct: port SSE2 iDCT to yasm · 2999bd7d

Christophe Gisquet authored 9 years ago

The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.
Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2999bd7d

28 Feb, 2015 1 commit
- lavc: do not compile fmtconvert unconditionally · 71f1ad37
  Anton Khirnov authored 10 years ago
```
Only ac3dec and dcadec use it.
```
  71f1ad37
16 Feb, 2015 1 commit

x86/g722dsp: add ff_g722_apply_qmf_sse2 · 03adafb3

James Almer authored 10 years ago

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>

03adafb3

01 Feb, 2015 1 commit

x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2} · fa3eccb4

James Almer authored 10 years ago

Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer

Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U

width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips

width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

fa3eccb4

05 Dec, 2014 1 commit

v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 9a738c27

Kieran Kunhya authored 10 years ago

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>

9a738c27

26 Nov, 2014 1 commit
- v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 36091742
  Kieran Kunhya authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  36091742
23 Nov, 2014 2 commits
- Fix standalone compilation of the apng decoder on x86. · 600e38f5
  Carl Eugen Hoyos authored 10 years ago
  
  600e38f5
- avcodec/x86/Makefile: fix order · 65ce8f88
  Michael Niedermayer authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  65ce8f88
03 Oct, 2014 1 commit

x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2} · 0de1d628

James Almer authored 10 years ago

2x to 2.5x faster than the C version.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>

0de1d628