Commits · 6752318c737663f0ac019de3acd63e3cea706864 · Linshizhi / ffmpeg.wasm-core

11 Mar, 2017 16 commits

aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability · 6752318c
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
3dd78272.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
6752318c

aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible · 19a0f952

Martin Storsjö authored Jan 03, 2017

The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.

Use a single-lane load where we don't need the semantics of ld1r.

This is cherrypicked from libav commit
ed8d2933.
Signed-off-by: Martin Storsjö <martin@martin.st>

19a0f952

aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 3006e525
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
4da4b2b8.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
3006e525
arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 1d8ab576
Martin Storsjö authored Jan 03, 2017
```
This is cherrypicked from libav commit
3933b86b.
Signed-off-by: Martin Storsjö <martin@martin.st>
```
1d8ab576

aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 · 9532a7d4

Martin Storsjö authored Nov 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2

This is cherrypicked from libav commit
a63da451.
Signed-off-by: Martin Storsjö <martin@martin.st>

9532a7d4

arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible · 82458955

Martin Storsjö authored Nov 22, 2016

This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0
vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8
vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6
vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0

This is cherrypicked from libav commit
5eb5aec4.
Signed-off-by: Martin Storsjö <martin@martin.st>

82458955

aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · a681c793

Martin Storsjö authored Feb 05, 2017

This allows reusing the macro for a separate implementation of the
pass2 function.

This is cherrypicked from libav commit
79d332eb.
Signed-off-by: Martin Storsjö <martin@martin.st>

a681c793

arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 3bd9b391

Martin Storsjö authored Feb 05, 2017

This allows reusing the macro for a separate implementation of the
pass2 function.

This is cherrypicked from libav commit
47b3c2c1.
Signed-off-by: Martin Storsjö <martin@martin.st>

3bd9b391

aarch64: vp9itxfm: Make the larger core transforms standalone functions · dc47bf38

Martin Storsjö authored Nov 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8

This is cherrypicked from libav commit
11547601.
Signed-off-by: Martin Storsjö <martin@martin.st>

dc47bf38

arm: vp9itxfm: Make the larger core transforms standalone functions · f8fcee0d

Martin Storsjö authored Nov 23, 2016

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9

This is cherrypicked from libav commit
0331c3f5.
Signed-off-by: Martin Storsjö <martin@martin.st>

f8fcee0d

aarch64: vp9itxfm: Restructure the idct32 store macros · 52c7366c

Martin Storsjö authored Dec 01, 2016

This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

This is also arguably more readable.

This is cherrypicked from libav commit
58d87e0f.
Signed-off-by: Martin Storsjö <martin@martin.st>

52c7366c

arm: vp9itxfm: Avoid .irp when it doesn't save any lines · 31e41350

Martin Storsjö authored Feb 04, 2017

This makes it more readable.

This is cherrypicked from libav commit
3bc5b28d.
Signed-off-by: Martin Storsjö <martin@martin.st>

31e41350

libavfilter/avf_showwaves: make sqrt and cbrt scale option values available to showwavespic by name · 114bbb0b

Moritz Barsnick authored Mar 09, 2017

The 'sqrt' and 'cbrt' scalers were added in commit
80262d8c, but their symbolic option values
only made available to the showwaves filter, not showwavespic, despite
the scalers working properly by their numerical option values.
Signed-off-by: Moritz Barsnick <barsnick@gmx.net>

114bbb0b

ffprobe: add AVCodecContext help message into ffprobe · 51e35019

Steven Liu authored Mar 11, 2017

because the ffprobe can use AVCodecContext parameters
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>

51e35019

avcodec/vp56: Reset have_undamaged_frame on resolution changes · 6e913f21

Michael Niedermayer authored Mar 09, 2017

Fixes: timeout in 758/clusterfuzz-testcase-4720832028868608

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

6e913f21

avcodec/h264_ps: Forward errors from decode_scaling_list() · dc0b9b21
Michael Niedermayer authored Mar 09, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
dc0b9b21

10 Mar, 2017 2 commits

libavcodec/libopenjpegenc: enable lossless option, remove layer option, and improve defaults · 195784ec

Aaron Boxer authored Mar 10, 2017

1. limit to single layer, as there is no current support for setting distortion/quality of multiple layers
2. encoder mode should be kept at default setting (0)
3. remove fixed_alloc parameter from context : seldom if ever used, and no way of properly configuring at the moment
4. add irreversible setting, to allow for lossless encoding. Set to OpenJPEG default (enabled)
5. set numresolution max to 33, which is the maximum number of allowed resolutions according the J2K spec
Signed-off-by: Michael Bradshaw <mjbshaw@google.com>

195784ec

avcodec/vp8: Fix hang with slice threads · 9bbc73ae

Thomas Guilbert authored Mar 09, 2017

Fixes: 447860.webm
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

9bbc73ae

09 Mar, 2017 12 commits

avcodec/movtextdec: run mov_text_cleanup() before overwriting pointers · bac9c03e

Michael Niedermayer authored Mar 08, 2017

Fixes: memleak
Fixes: 741/clusterfuzz-testcase-586996200452915

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

bac9c03e

avcodec/mpeg4videodec: Fix runtime error: signed integer overflow: -135088512... · e2a4f1a9

Michael Niedermayer authored Mar 08, 2017

avcodec/mpeg4videodec: Fix runtime error: signed integer overflow: -135088512 * 16 cannot be represented in type 'int'

Fixes: 736/clusterfuzz-testcase-5580263943831552

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

e2a4f1a9

avcodec/h264_mvpred: Fix runtime error: left shift of negative value -1 · 222c9f03

Michael Niedermayer authored Mar 08, 2017

Fixes: 734/clusterfuzz-testcase-4821293192970240

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

222c9f03

avcodec/mjpegdec: Fix runtime error: left shift of negative value -127 · 800d02ab

Michael Niedermayer authored Mar 08, 2017

Fixes: 733/clusterfuzz-testcase-4682158096515072

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

800d02ab

avcodec/mpegaudiodec_template: Check for negative e · 58dd25f8

Michael Niedermayer authored Mar 08, 2017

Fixes: undefined shift
Fixes: 631/clusterfuzz-testcase-6725491035734016

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

58dd25f8

avcodec/cuvid: add support for cropping/resizing · 5cd3cd5b
Timo Rothenpieler authored Mar 05, 2017
```
Overhauled version, original patch by Miroslav Slugeň <thunder.m@email.cz>.
```
5cd3cd5b
avformat/matroskaenc: add support for Spherical Video elements · 58eb0f57
James Almer authored Mar 08, 2017
```
Reviewed-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
58eb0f57

avcodec: clarify some decoding/encoding API details · f940492b

wm4 authored Mar 06, 2017

Make it clear that there is no timing-dependent behavior. In particular,
there is no state in which both input and output are denied, and where
you have to wait for a while yourself to make progress (apparently some
hardware decoders like to do this).

Avoid wording that makes references to time. It shouldn't be mistaken
for some kind of asynchronous API (like POSIX read() can return EAGAIN
if there is no new input yet). It's a state machine, so try to use
appropriate terms.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

Merges Libav commit 8a60bba0.

f940492b

hls: pass AVFormatContext flags to sub demuxer · 597c6b78
wm4 authored Mar 09, 2017

597c6b78
concatdec: pass AVFormatContext flags to sub demuxer · f5da453b
wm4 authored Mar 09, 2017

f5da453b

aacdec: do not mutate input packet metadata · fcfc78cb

wm4 authored Mar 08, 2017

Apparently the demuxer outputs the wrong padding for HE-AAC (based on
the raw sample rate, or so). aacdec contains a hack to adjust the muxer
padding accordingly before it's used to trim the decoder output. This
modified the packet side data, which in combination with the old
decoding API would change the packet the user passed to the decoder.
This is clearly not allowed, and it breaks running some gapless fate
tests with "-fflags +keepside" applied (without keepside, the packet
metadata is typically newly allocated, essentially making a copy and not
modifying the user's input packet).

This should probably be fixed in the demuxer (and consequently also the
muxer), but for now only fix the immediate problem.

Regression since 946ed78f (2012).

fcfc78cb

swresample/resample: do not allow odd filter_length · 53a5cea4

Muhammad Faiz authored Mar 08, 2017

except filter_length == 1

odd filter_length gives worse frequency response,
even when compared with shorter filter_length

also makes build_filter simpler
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>

53a5cea4

08 Mar, 2017 7 commits

avcodec/wavpack: Fix runtime error: left shift of negative value -5 · 3016e919

Michael Niedermayer authored Mar 06, 2017

Fixes: 729/clusterfuzz-testcase-5154831595470848

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

3016e919

avcodec/pictordec: Fix runtime error: left shift of 64 by 25 places cannot be... · 01a33b83

Michael Niedermayer authored Mar 06, 2017

avcodec/pictordec: Fix runtime error: left shift of 64 by 25 places cannot be represented in type 'int'

Fixes: 724/clusterfuzz-testcase-6738249571631104

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

01a33b83

fate/swresample: fix FUZZ typo · fe57bf7c

Muhammad Faiz authored Mar 08, 2017

unintentionally changed to 0.01 at
'61926b6c'
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>

fe57bf7c

avutil/tests/lfg: Remove debugging start/stop timer · 1d0bad42
Michael Niedermayer authored Mar 08, 2017
```
Fixes code with qemu ARM
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
1d0bad42

avutil/tests/lfg.c: added proper normality test · a50ccbd2

Thomas Turner authored Mar 08, 2017

The Chen-Shapiro(CS) test was used to test normality for
Lagged Fibonacci PRNG.

Normality Hypothesis Test:

The null hypothesis formally tests if the population
the sample represents is normally-distributed. For
CS, when the normality hypothesis is True, the
distribution of QH will have a mean close to 1.

Information on CS can be found here:

http://www.stata-journal.com/sjpdf.html?articlenum=st0264
http://www.originlab.com/doc/Origin-Help/NormalityTest-AlgorithmSigned-off-by: Thomas Turner <thomastdt@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a50ccbd2

swresample/resample: use uniform normalization · 61926b6c

Muhammad Faiz authored Mar 01, 2017

this gives better frequency response

update swresample fate and other fates
that depend on resampling
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>

61926b6c

Revert "lavu/atomic: add support for the new memory model aware gcc built-ins" · dbc932e7

James Almer authored Mar 07, 2017

This reverts commit faa9d298.

This change became superfluous when support for C11 atomics was introduced.
Reverting it will make the removal of this implementation in an upcoming
merge conflict free.
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

dbc932e7

07 Mar, 2017 3 commits
- lsws/slice: Move a misplaced const. · 851f4255
  Carl Eugen Hoyos authored Feb 26, 2017
```
Fixes a gcc warning:
libswscale/slice.c:178:56: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
```
  851f4255
- lsws/input: Do not define unused functions. · a9c20598
  Carl Eugen Hoyos authored Mar 07, 2017
```
Fixes warnings like the following:
libswscale/input.c:951:13: warning: ‘planar_rgb14be_to_a’ defined but not used
```
  a9c20598
- lavc/libx265: Add gray10 and gray12 encoding support. · 587226ad
  Carl Eugen Hoyos authored Mar 07, 2017
  
  587226ad