Commits · adb54e59c18db347f39e55832104fc3e40a3c42b · Linshizhi / ffmpeg.wasm-core

17 Jan, 2017 14 commits
- vaapi_hevc: Convert to use the new VAAPI hwaccel code · adb54e59
  Anton Khirnov authored Oct 02, 2016
```
(cherry picked from commit ea8b730d)
Signed-off-by: Mark Thompson <sw@jkqxz.net>
```
  adb54e59
- vaapi_mpeg4: Convert to use the new VAAPI hwaccel code · fd1a6a01
  Mark Thompson authored Aug 07, 2016
```
(cherry picked from commit ccd0316f)
```
  fd1a6a01
- vaapi_vc1: Convert to use the new VAAPI hwaccel code · 32b3812b
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit 520fb772)
```
  32b3812b
- vaapi_mpeg2: Convert to use the new VAAPI hwaccel code · 71acbea1
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit 102e13c3)
```
  71acbea1
- vaapi_h264: Convert to use the new VAAPI hwaccel code · c8b26d59
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit 2fe93244)
```
  c8b26d59
- lavc: Rewrite VAAPI decode infrastructure · 79307ae5
  Mark Thompson authored Aug 06, 2016
```
Moves much of the setup logic for VAAPI decoding into lavc; the user
now need only provide the hw_frames_ctx.

(cherry picked from commit 123ccd07)
(cherry picked from commit 5e879b54)
(cherry picked from commit 0aec37e6)
(cherry picked from commit cfa4eb4f)
```
  79307ae5
- vaapi_vc1: Remove redundant version check · d07d01bc
  Mark Thompson authored Aug 06, 2016
```
The lowest supported VAAPI version is 0.34 (checked at configure
time), so this test is no longer needed.

(cherry picked from commit 5a667322)
```
  d07d01bc
- vaapi_vc1: Constify pointers · 845c2c14
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit 01d6f84f)
```
  845c2c14
- vaapi_mpeg2: Constify pointers · 6bc2808c
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit ee906129)
```
  6bc2808c
- vaapi_h264: Constify pointers · d0897da9
  Mark Thompson authored Aug 06, 2016
```
(cherry picked from commit 03adfe91)
```
  d0897da9
- libavformat/mpegtsenc: support hevc with missing in stream headers like h.264 · b05d8e71
  Michael Niedermayer authored Jan 15, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  b05d8e71
- configure: Don't disable SSA Optimizer on MSVC v19.00.24218+. · 2064a3b8
  Kacper Michajłow authored Jan 12, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  2064a3b8
- Merge commit 'f450cc7b' · bdbbb8f1
  Matthieu Bouron authored Jan 16, 2017
```
* commit 'f450cc7b':
  h264: eliminate decode_postinit()

Also includes fixes from 1f7b4f9a and e344e651.

Original patch replace H264Context.next_output_pic (H264Picture *) by
H264Context.output_frame (AVFrame *). This change is discarded as it
is incompatible with the frame reconstruction and motion vectors
display code which needs the extra information from the H264Picture.
Merged-by: Clément Bœsch <u@pkh.me>
Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>
```
  bdbbb8f1
- avutil/tests: add aes_ctr, audio_fifo and imgutils to .gitignore · adf5dc90
  Matthieu Bouron authored Jan 16, 2017
  
  adf5dc90
16 Jan, 2017 11 commits
- configure: Fix standalone compilation of aiff and caf muxers. · e6647302
  Carl Eugen Hoyos authored Jan 16, 2017
  
  e6647302
- lavc/h264dec: reconstruct and debug flush frames as well · 9561de41
  Clément Bœsch authored Jan 13, 2017
  
  9561de41
- lavc/h264_slice: drop redundant current_slice reset · bd520e85
  Clément Bœsch authored Jan 11, 2017
```
It is done unconditionally in ff_h264_field_end()
```
  bd520e85
- lavc/pthread_frame: protect read state access in setup finish function · a91c265f
  Clément Bœsch authored Jan 11, 2017
  
  a91c265f
- avformat/aadec: use avio_get_str() · 591be9e3
  Paul B Mahol authored Jan 16, 2017
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  591be9e3
- avformat/aadec: stop ignoring file metadata · e0665d38
  Paul B Mahol authored Jan 16, 2017
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  e0665d38
- avcodec: add SIPR parser · 40cf9437
  Paul B Mahol authored Jan 14, 2017
```
Fixes #2056.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  40cf9437
- dxva2: allow an empty array of ID3D11VideoDecoderOutputView · 8fb48659
  Steve Lhomme authored Jan 13, 2017
```
We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView
casted from data[3].

Also added myself as maintainer for DXVA2 and D3D11VA.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  8fb48659
- dxva2: get the slice number directly from the surface in D3D11VA · 153b36fc
  Steve Lhomme authored Jan 13, 2017
```
No need to loop through the known surfaces, we'll use the requested surface
anyway.

The loop is only done for DXVA2.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  153b36fc
- dxva2: use a single macro to test if the DXVA context is valid · 77742c75
  Steve Lhomme authored Jan 13, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  77742c75
- libopenmpt: add missing avio_read return value check · 367cac78
  Andreas Cadhalpun authored Jan 01, 2017
```
This fixes heap-buffer-overflows in libopenmpt caused by interpreting
the negative size value as unsigned size_t.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Reviewed-by: Jörn Heusipp <osmanx@problemloesungsmaschine.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  367cac78
15 Jan, 2017 3 commits

dcaenc: Implementation of Huffman codes for DCA encoder · c2500d62
Daniil Cherednik authored Jan 07, 2017
```
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
```
c2500d62
dcaenc: Reverse data layout to prevent data copies during Huffman encoding introduction · a6191d09
Daniil Cherednik authored Jan 05, 2017
```
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
```
a6191d09

matroskaenc: remove unofficial compliance on color information · e7dec52d

Rostislav Pehlivanov authored Jan 15, 2017

When support for this was added the details weren't yet finalized.
This is no longer the case.
Fixes writing of mkv/webm files with HDR.
Reported-by: Kagami Hiiragi <kagami@genshiken.org>
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>

e7dec52d

14 Jan, 2017 12 commits

aarch64: vp9mc: Fix a comment to refer to a register with the right name · 0ba01875
Martin Storsjö authored Jan 09, 2017
```
This is cherrypicked from libav commit
85ad5ea7.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
0ba01875

aarch64: vp9dsp: Fix vertical alignment in the init file · 02cfb9a1

Martin Storsjö authored Jan 09, 2017

This is cherrypicked from libav commit
65074791.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

02cfb9a1

arm: vp9mc: Fix vertical alignment of operands · 656d9109

Martin Storsjö authored Jan 09, 2017

This is cherrypicked from libav commit
c536e5e8.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

656d9109

aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 8b11a89c

Martin Storsjö authored Jan 09, 2017

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0

By skipping individual 8x16 or 8x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8

I.e. in general a very minor overhead for the full subpartition case due
to the additional cmps, but a significant speedup for the cases when we
only need to process a small part of the actual input data.

This is cherrypicked from libav commits
cad42fad and
a0c443a3.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

8b11a89c

arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 388f6e67

Martin Storsjö authored Jan 09, 2017

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

388f6e67

arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination · ecd343aa

Martin Storsjö authored Jan 09, 2017

This avoids reloading them if they haven't been clobbered, if the
first pass also was idct.

This is similar to what was done in the aarch64 version.

This is cherrypicked from libav commit
3c87039a.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

ecd343aa

aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it · 37cb224e
Martin Storsjö authored Jan 09, 2017
```
This is cherrypicked from libav commit
2f99117f.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
37cb224e

arm: vp9itxfm: Rename a macro parameter to fit better · f69dd26d

Martin Storsjö authored Jan 09, 2017

Since the same parameter is used for both input and output,
the name inout is more fitting.

This matches the naming used below in the dmbutterfly macro.

This is cherrypicked from libav commit
79566ec8.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

f69dd26d

arm/aarch64: vp9itxfm: Fix indentation of macro arguments · 4a5874ea

Martin Storsjö authored Jan 09, 2017

This is cherrypicked from libav commit
721bc375.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

4a5874ea

aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter · a95e7de4

Martin Storsjö authored Jan 09, 2017

The clobbering tests in checkasm are only invoked when testing
correctness, so this bug didn't show up when benchmarking the
dc-only version.

This is cherrypicked from libav commit
4d960a11.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a95e7de4

arm: vp9itxfm: Simplify the stack alignment code · a71cd843

Janne Grunau authored Jan 09, 2017

This is one instruction less for thumb, and only have got
1/2 arm/thumb specific instructions.

This is cherrypicked from libav commit
e5b0fc17.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a71cd843

aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; · cb220eee

Janne Grunau authored Jan 09, 2017

The latter is 1 cycle faster on a cortex-53 and since the operands are
bytewise (or larger) bitmask (impossible to overflow to zero) both are
equivalent.

This is cherrypicked from libav commit
e7ae8f7a.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

cb220eee