Commits · fd30e4d57fe5841385f845440688505b88c0f4a9 · Linshizhi / ffmpeg.wasm-core

02 Feb, 2017 2 commits

x86/rv34dsp: add ff_rv34_idct_dc_add_sse2 · c8467abb

James Almer authored 8 years ago

Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.
Signed-off-by: James Almer <jamrial@gmail.com>

c8467abb

x86/vp8dsp: add ff_vp8_idct_dc_add_sse2 · ab5c4d00

James Almer authored 8 years ago

Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.
Signed-off-by: James Almer <jamrial@gmail.com>

ab5c4d00

01 Feb, 2017 1 commit

Revert "Merge commit ''" · 536ac72f

Michael Niedermayer authored 8 years ago

The assumption this is based on is wrong, the code is not always run with bitexact flags

This reverts commit a956164e, reversing
changes made to f6005907.
Approved-by: James Almer <jamrial@gmail.com>

536ac72f

31 Jan, 2017 1 commit
- lavc/hevc: remove a few random spaces to reduce diff with libav · 7c300a8e
  Clément Bœsch authored 8 years ago
  
  7c300a8e
13 Jan, 2017 5 commits
- lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16 · 6d4c9f2a
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  6d4c9f2a
- huffyuvdsp: move functions only used by huffyuv from lossless_videodsp · 47f21232
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  47f21232
- huffyuvencdsp: move shared functions to a new lossless_videoencdsp context · cf9ef839
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  cf9ef839
- huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp · 30c1f272
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  30c1f272
- lossless_videodsp: move shared functions from huffyuvdsp · 5ac1dd8e
  James Almer authored 8 years ago
```
Several codecs other than huffyuv use them.
Signed-off-by: James Almer <jamrial@gmail.com>
```
  5ac1dd8e
02 Jan, 2017 2 commits

avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10 · aa952920

Michael Niedermayer authored 8 years ago

make fate passes
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

aa952920

avcodec/x86/imdct36: fix building with nasm 2.11.05 · d0651875

John Comeau authored 8 years ago

fixes `operation size not specified` errors as described here:
http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2

I rebuilt again with yasm and made sure it didn't break that.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

d0651875

20 Dec, 2016 1 commit
- avcodec/magicyuv: add 10 bit support · 6d09d6ed
  Paul B Mahol authored 8 years ago
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  6d09d6ed
07 Dec, 2016 1 commit
- avcodec/h264: resolve assert being triggered when stack is not aligned · acdd2d80
  James Darnley authored 8 years ago
```
32-bit msvc.
```
  acdd2d80
06 Dec, 2016 4 commits

avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter · 728651df

James Darnley authored 8 years ago

Yorkfield:
 - mmx2: 2.53x (504 vs. 199 cycles)
 - sse2: 3.83x (504 vs. 131 cycles)

Nehalem:
 - mmx2: 2.42x (365 vs. 151 cycles)
 - sse2: 3.56x (365 vs. 103 cycles)

Skylake:
 - mmx2: 1.81x (308 vs. 170 cycles)
 - sse2: 2.84x (308 vs. 108 cycles)
 - avx:  2.93x (308 vs. 105 cycles)

728651df

avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter · add21d0b

James Darnley authored 8 years ago

Yorkfield:
 - mmx2: 2.45x (279 vs. 114 cycles)
 - sse2: 3.36x (279 vs.  83 cycles)

Nehalem:
 - mmx2: 2.10x (192 vs.  92 cycles)
 - sse2: 2.84x (192 vs.  68 cycles)

Skylake:
 - mmx2: 1.75x (170 vs.  97 cycles)
 - sse2: 2.47x (170 vs.  69 cycles)
 - avx:  2.47x (170 vs.  69 cycles)

add21d0b

whitespace changes after last commit · 58ca2ef6
James Darnley authored 8 years ago

58ca2ef6
avcodec/h264: clean up and expand x86 function definitions · f33714a6
James Darnley authored 8 years ago

f33714a6

30 Nov, 2016 3 commits

avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions · 13d71c28

James Darnley authored 8 years ago

Yorkfield:
 - sse2:
   - complex: 4.13x faster (1514 vs. 367 cycles)
   - simple:  4.38x faster (1836 vs. 419 cycles)

Skylake:
 - sse2:
   - complex: 3.61x faster ( 936 vs. 260 cycles)
   - simple:  3.97x faster (1126 vs. 284 cycles)
 - avx (versus sse2):
   - complex: 1.07x faster (260 vs. 244 cycles)
   - simple:  1.03x faster (284 vs. 274 cycles)

13d71c28

avcodec/h264: mmx 4:2:2 idct add8 function · 1dae7ffa
James Darnley authored 8 years ago
```
2.87 times faster (1830 vs. 638 cycles)
```
1dae7ffa
avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter · 815ea8c6
James Darnley authored 8 years ago
```
2.1 times faster (401 vs. 194 cycles)
```
815ea8c6

18 Nov, 2016 1 commit

x86/vp9itxfm: add missing AVX2 guards · 2de1c79b

James Almer authored 8 years ago

Fixes compilation with Yasm 1.1.0 and older.
Signed-off-by: James Almer <jamrial@gmail.com>

2de1c79b

15 Nov, 2016 1 commit

vp9: add avx2 iadst16 implementations. · 83a139e3

Ronald S. Bultje authored 8 years ago

Also a small cosmetic change to the avx2 idct16 version to make it
explicit that one of the arguments to the write-out macros is unused
for >=avx2 (it uses pmovzxbw instead of punpcklbw).

83a139e3

21 Oct, 2016 1 commit

doc: fix spelling errors · c8a6eb58

Andreas Cadhalpun authored 8 years ago

Thanks to Mathieu Malaterre <malat@debian.org> for reporting the
Que/Queue typo. (https://bugs.debian.org/839542)
Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>

c8a6eb58

18 Oct, 2016 1 commit

aacenc: add SIMD optimizations for abs_pow34 and quantization · d2ae5f77

Rostislav Pehlivanov authored 8 years ago

Performance improvements:

quant_bands:
with:     681 decicycles in quant_bands, 8388453 runs,    155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,    222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>

d2ae5f77

02 Oct, 2016 1 commit
- avcodec: fix arguments on xmm/neon clobber test wrappers · 42111e85
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  42111e85
01 Oct, 2016 1 commit
- avcodec: add missing xmm/neon clobber test wrappers for the new encode API · 449f263f
  James Almer authored 8 years ago
```
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  449f263f
23 Sep, 2016 2 commits
- x86/h264_weight: use appropriate register size for weight parameters · 5ae0ad00
  Hendrik Leppkes authored 8 years ago
```
Fixes trac 5579
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Acked-by: Michael Niedermayer <michael@niedermayer.cc>
```
  5ae0ad00
- avcodec/h264: Use ptrdiff_t for (bi)weight functions · bc26fe89
  Michael Niedermayer authored 8 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  bc26fe89
06 Aug, 2016 1 commit

avcodec/ttadsp: cosmetics · d950279c

James Almer authored 8 years ago

Clean some header includes and use the same naming scheme as
in ttaencdsp
Signed-off-by: James Almer <jamrial@gmail.com>

d950279c

02 Aug, 2016 1 commit
- x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} · efc9d5c4
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  efc9d5c4
26 Jul, 2016 3 commits

vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. · a4edaa02

Ronald S. Bultje authored 8 years ago

Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).

a4edaa02

vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. · 7ca422bb

Ronald S. Bultje authored 8 years ago

Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).

7ca422bb

vp9: add 32x32 idct AVX2 implementation. · 726501a3

Ronald S. Bultje authored 8 years ago

About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):

nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0

726501a3

20 Jul, 2016 7 commits
- x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32 · 7a15cf42
  James Almer authored 8 years ago
```
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  7a15cf42
- x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate · d06dfaa5
  Diego Biurrun authored 9 years ago
  
  d06dfaa5
- x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate · 4efab893
  Diego Biurrun authored 9 years ago
  
  4efab893
- x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code · 0a39c9ac
  Diego Biurrun authored 9 years ago
```
That code is only ever initialized with that flag set.
```
  0a39c9ac
- x86: hpeldsp: Drop unused function parameters · 95c1df92
  Diego Biurrun authored 9 years ago
  
  95c1df92
- x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate · c3e83ad3
  Diego Biurrun authored 9 years ago
  
  c3e83ad3
- x86: hpeldsp: Split off VP3-specific bits into a separate file · 1dfc3cf8
  Diego Biurrun authored 9 years ago
  
  1dfc3cf8