Commits · 48ad3fe1beee5c3eb453093defde503f354c8f1f · Linshizhi / ffmpeg.wasm-core

24 Jan, 2017 21 commits

aarch64: vp9dsp: Restructure the bpp checks · 48ad3fe1

Martin Storsjö authored Dec 14, 2016

This work is sponsored by, and copyright, Google.

This is more in line with how it will be extended for more bitdepths.
Signed-off-by: Martin Storsjö <martin@martin.st>

48ad3fe1

arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter · 1e5d87ee

Martin Storsjö authored Jan 05, 2017

This work is sponsored by, and copyright, Google.

This is pretty much similar to the 8 bpp version, but in some senses
simpler. All input pixels are 16 bits, and all intermediates also fit
in 16 bits, so there's no lengthening/narrowing in the filter at all.

For the full 16 pixel wide filter, we can only process 4 pixels at a time
(using an implementation very much similar to the one for 8 bpp),
but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with
a different implementation of the core filter.

Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_loop_filter_h_4_8_10bpp_neon: 1.83 2.16 1.40 2.09
vp9_loop_filter_h_8_8_10bpp_neon: 1.39 1.67 1.24 1.70
vp9_loop_filter_h_16_8_10bpp_neon: 1.56 1.47 1.10 1.81
vp9_loop_filter_h_16_16_10bpp_neon: 1.94 1.69 1.33 2.24
vp9_loop_filter_mix2_h_44_16_10bpp_neon: 2.01 2.27 1.67 2.39
vp9_loop_filter_mix2_h_48_16_10bpp_neon: 1.84 2.06 1.45 2.19
vp9_loop_filter_mix2_h_84_16_10bpp_neon: 1.89 2.20 1.47 2.29
vp9_loop_filter_mix2_h_88_16_10bpp_neon: 1.69 2.12 1.47 2.08
vp9_loop_filter_mix2_v_44_16_10bpp_neon: 3.16 3.98 2.50 4.05
vp9_loop_filter_mix2_v_48_16_10bpp_neon: 2.84 3.64 2.25 3.77
vp9_loop_filter_mix2_v_84_16_10bpp_neon: 2.65 3.45 2.16 3.54
vp9_loop_filter_mix2_v_88_16_10bpp_neon: 2.55 3.30 2.16 3.55
vp9_loop_filter_v_4_8_10bpp_neon: 2.85 3.97 2.24 3.68
vp9_loop_filter_v_8_8_10bpp_neon: 2.27 3.19 1.96 3.08
vp9_loop_filter_v_16_8_10bpp_neon: 3.42 2.74 2.26 4.40
vp9_loop_filter_v_16_16_10bpp_neon: 2.86 2.44 1.93 3.88

The speedup vs C code measured in checkasm is around 1.1-4x.
These numbers are quite inconclusive though, since the checkasm test
runs multiple filterings on top of each other, so later rounds might
end up with different codepaths (different decisions on which filter
to apply, based on input pixel differences).

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 2-4x.
Signed-off-by: Martin Storsjö <martin@martin.st>

1e5d87ee

arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm · 2ed67eba

Martin Storsjö authored Dec 17, 2016

This work is sponsored by, and copyright, Google.

This is structured similarly to the 8 bit version. In the 8 bit
version, the coefficients are 16 bits, and intermediates are 32 bits.

Here, the coefficients are 32 bit. For the 4x4 transforms for 10 bit
content, the intermediates also fit in 32 bits, but for all other
transforms (4x4 for 12 bit content, and 8x8 and larger for both 10
and 12 bit) the intermediates are 64 bit.

For the existing 8 bit case, the 8x8 transform fit all coefficients in
registers; for 10/12 bit, when the coefficients are 32 bit, the 8x8
transform also has to be done in slices of 4 pixels (just as 16x16 and
32x32 for 8 bit).

The slice width also shrinks from 4 elements to 2 elements in parallel
for the 16x16 and 32x32 cases.

The 16 bit coefficients from idct_coeffs and similar tables also need
to be lenghtened to 32 bit in order to be used in multiplication with
vectors with 32 bit elements. This leads to the fixed coefficient
vectors needing more space, leading to more cases where they have to
be reloaded within the transform (in iadst16).

This technically would need testing in checkasm for subpartitions
in increments of 2, but that slows down normal checkasm runs
excessively.

Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_inv_adst_adst_4x4_sub4_add_10_neon: 4.83 11.36 5.22 6.77
vp9_inv_adst_adst_8x8_sub8_add_10_neon: 4.12 7.60 4.06 4.84
vp9_inv_adst_adst_16x16_sub16_add_10_neon: 3.93 8.16 4.52 5.35
vp9_inv_dct_dct_4x4_sub1_add_10_neon: 1.36 2.57 1.41 1.61
vp9_inv_dct_dct_4x4_sub4_add_10_neon: 4.24 8.66 5.06 5.81
vp9_inv_dct_dct_8x8_sub1_add_10_neon: 2.63 4.18 1.68 2.87
vp9_inv_dct_dct_8x8_sub4_add_10_neon: 4.52 9.47 4.24 5.39
vp9_inv_dct_dct_8x8_sub8_add_10_neon: 3.45 7.34 3.45 4.30
vp9_inv_dct_dct_16x16_sub1_add_10_neon: 3.56 6.21 2.47 4.32
vp9_inv_dct_dct_16x16_sub2_add_10_neon: 5.68 12.73 5.28 7.07
vp9_inv_dct_dct_16x16_sub8_add_10_neon: 4.42 9.28 4.24 5.45
vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3.41 7.29 3.35 4.19
vp9_inv_dct_dct_32x32_sub1_add_10_neon: 4.52 8.35 3.83 6.40
vp9_inv_dct_dct_32x32_sub2_add_10_neon: 5.86 13.19 6.14 7.04
vp9_inv_dct_dct_32x32_sub16_add_10_neon: 4.29 8.11 4.59 5.06
vp9_inv_dct_dct_32x32_sub32_add_10_neon: 3.31 5.70 3.56 3.84
vp9_inv_wht_wht_4x4_sub4_add_10_neon: 1.89 2.80 1.82 1.97

The speedup compared to the C functions is around 1.3 to 7x for the
full transforms, even higher for the smaller subpartitions.
Signed-off-by: Martin Storsjö <martin@martin.st>

2ed67eba

arm: Add NEON optimizations for 10 and 12 bit vp9 MC · a4d4bad7

Martin Storsjö authored Dec 08, 2016

This work is sponsored by, and copyright, Google.

The plain pixel put/copy functions are used from the 8 bit version,
for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
copy128 is added.

Compared with the 8 bit version, the filters can no longer use the
trick to accumulate in 16 bit with only saturation at the end, but now
the accumulators need to be 32 bit. This avoids the need to keep track
of which filter index is the largest though, reducing the size of the
executable code for these filters.

For the horizontal filters, we only do 4 or 8 pixels wide in parallel
(while doing two rows at a time), since we don't have enough register
space to filter 16 pixels wide.

For the vertical filters, we still do 4 and 8 pixels in parallel just
as in the 8 bit case, but we need to store the output after every 2
rows instead of after every 4 rows.

Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16
vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50
vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72
vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42
vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70
vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75
vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32
vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31
vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87
vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47
vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79
vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71
vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10
vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41
vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33
vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84
vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89
vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30
vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66
vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63
vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21
vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21
vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63
vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25
vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89
vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49
vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63
vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92

For the larger 8tap filters, the speedup vs C code is around 4-14x.
Signed-off-by: Martin Storsjö <martin@martin.st>

a4d4bad7

arm: vp9dsp: Restructure the bpp checks · cda9a3e8

Martin Storsjö authored Dec 08, 2016

This work is sponsored by, and copyright, Google.

This is more in line with how it will be extended for more bitdepths.
Signed-off-by: Martin Storsjö <martin@martin.st>

cda9a3e8

Merge commit '' · 1400598c

Clément Bœsch authored Jan 24, 2017

* commit 'fd5e6a09':
  x86util: Extend SPLATW for avx2

This commit is a noop, see 1ace9573
(only libavutil/x86/x86util.asm chunk).
Merged-by: Clément Bœsch <u@pkh.me>

1400598c

Merge commit '' · f84ece0a

Clément Bœsch authored Jan 24, 2017

* commit '37961044':
  checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR
  cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp
  checkasm: arm: Don't start new const blocks for each string

This merge is a noop: the changes were included in 9f1c81e5.
Merged-by: Clément Bœsch <u@pkh.me>

f84ece0a

Merge commit '' · 727c463f

Clément Bœsch authored Jan 24, 2017

* commit '5ece6911':
  apichanges: Fill in missing hashes and dates

This commit is a noop as we need to fill with our own hashes.
Merged-by: Clément Bœsch <u@pkh.me>

727c463f

Merge commit '' · 4181d774

Clément Bœsch authored Jan 24, 2017

* commit 'facdfe40':
  swscale: Add proper ff_ prefix to init functions

This commit is a noop, see e8c37160

I'm keeping our ff_sws_ vs ff_ since we use ff_sws_ in other places in
swscale.
Merged-by: Clément Bœsch <u@pkh.me>

4181d774

Merge commit '' · 4ad5b936

Clément Bœsch authored Jan 24, 2017

* commit 'c0fd2fb2':
  swscale: Rename sws_context_class to ff_sws_context_class

This commit is a noop, see 8bfbc8c5Merged-by: Clément Bœsch <u@pkh.me>

4ad5b936

Merge commit '' · 9f1c81e5

Clément Bœsch authored Jan 24, 2017

* commit '71a04721':
  checkasm: arm: report the first clobbered register in checkasm_checked_call

Also includes 446353ea, 59aeed93, and 37961044 to avoid breaking
too much stuff.
Merged-by: Clément Bœsch <u@pkh.me>

9f1c81e5

avcodec/mjpegdec: Check remaining bitstream in ljpeg_decode_yuv_scan() · 755933cb

Michael Niedermayer authored Jan 24, 2017

Fixes timeout
Fixes: 445/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_MJPEG_fuzzer
Fixes: 456/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_JPEGLS_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

755933cb

Merge commit '' · 8504d64b

Clément Bœsch authored Jan 24, 2017

* commit 'a8fce24b':
  avconv_dxva2: support HEVC Main10 decoding

This commit is a noop, see 1ec14612Merged-by: Clément Bœsch <u@pkh.me>

8504d64b

Merge commit '' · 5f74ce0e

Clément Bœsch authored Jan 24, 2017

* commit '33f6690e':
  hevc: offer DXVA2 for 10bit 420

This commit is a noop, see ccb94789Merged-by: Clément Bœsch <u@pkh.me>

5f74ce0e

Merge commit '' · 74480198

Clément Bœsch authored Jan 17, 2017

* commit '38efff92':
  FATE: add a test for H.264 with two fields per packet
  h264: fix decoding multiple fields per packet with slice threads

This merge includes two commits because the FATE test was useful in
order to make proper testing.

The merge gets rid of the now unused:
- SLICE_SINGLETHREAD and SLICE_SKIPED macros
- max_contexts
- "again" label in decode_nal_units()

This commit also includes the fix from d3e4d406.

Thanks to wm4 and Michael Niedermayer for their testing.
Merged-by: Clément Bœsch <u@pkh.me>
Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>

74480198

avformat/hlsenc: improve to write m3u8 head block · 1033f56b
Steven Liu authored Jan 24, 2017
```
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
```
1033f56b

avcodec/h264dec: Fix regression with "make fate-h264-attachment-631 THREADS=8" · 25f4f08b

Michael Niedermayer authored Jan 23, 2017

This treats the case of no slices like no frames which it basically is.

The field is added to the context as other nal related fields are also there
and passing the has_slices field per *arguments is ugly and not consistent

Found-by: ubitux
Approved-by: ubitux
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

25f4f08b

avfilter: add EIA-608 line extractor · 08e57323

Paul B Mahol authored Jan 14, 2017

Signed-off-by: Dave Rice <dave@dericed.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>

08e57323

avformat/flvenc: refine the flvenc shift_data code · 1bb192ef
Steven Liu authored Jan 24, 2017
```
refine the flvenc shift_data move data option
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
```
1bb192ef

avformat/hlsenc: refine the code readable for time unit · 2f7cc21b

Steven Liu authored Jan 24, 2017

Reviewed-by: Bodecs Bela <bodecsb@vivanet.hu>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>

2f7cc21b

libavformat/tee: tee was passing a wrong option name for fifo's format_options · b7665642

Felipe Astroza authored Jan 23, 2017

If fifo is enabled on tee muxer, ffmpeg exits because of an unknown option passed to fifo muxer.
Option name "format_options" was replaced by "format_opts" on tee muxer.
Signed-off-by: Felipe Astroza <felipe@astroza.cl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b7665642

23 Jan, 2017 4 commits

avcodec/cuvid: fail early if GPU can't handle video resolution · 9ea29985

Pavel Koshevoy authored Jan 22, 2017

CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when
decoding 8K h264 packets. However, it does return an error during
cuvidCreateDecoder call if the indicated video resolution is not
supported.

Given that stream resolution is typically known as a result of probing
it is better to use this information during avcodec_open2 call to fail
immediately, rather than proceeding to decode and never receiving any
frames from the decoder nor receiving any indication of decode failure.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>

9ea29985

hwcontext_cuda: implement frames_get_constraints · c16fe143
wm4 authored Jan 16, 2017
```
Copied and modified from hwcontext_qsv.c.
```
c16fe143

lavf/segment: fix crash when failing to open segment list · 2b202900

Rodger Combs authored Jan 21, 2017

This happens because segment_end() returns an error, so seg_write_packet
never proceeds to segment_start(), and seg->avf->pb is never re-set,
so we crash with a null pb when av_write_trailer flushes the packet
queue.

This doesn't seem to be clearly recoverable, so I'm just failing more
gracefully.

Repro:
ffmpeg -i input.ts -f segment -c copy -segment_list /noaxx.m3u8 test-%05d.ts

(assuming you don't have write access to /)

2b202900

avcodec/pngdec: Fix off by 1 size in decode_zbuf() · e371f031

Michael Niedermayer authored Jan 23, 2017

Fixes out of array access
Fixes: 444/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_PNG_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

e371f031

22 Jan, 2017 9 commits

avcodec/error_resilience: update indention after last commit · a0341b4d
Michael Niedermayer authored Jan 22, 2017
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
a0341b4d

avcodec/error_resilience: Optimize motion recovery code by using blcok lists · d9d9fd94

Michael Niedermayer authored Jan 22, 2017

This makes the code 7 times faster with the testcase from libfuzzer
and should reduce the amount of timeouts we hit in automated fuzzing.
(for example 438/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer)

The code is also faster with more realistic input though the difference
is small here as that is far from the worst cases the fuzzers pick out

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>

d9d9fd94

ffplay: fix indentation after last commit · f1214ad5
Marton Balint authored Jan 15, 2017
```
Signed-off-by: Marton Balint <cus@passwd.hu>
```
f1214ad5

ffplay: do not preallocate video texture · 076fc75b

Marton Balint authored Jul 27, 2014

Since the uploads happen in the main display function, it does not matter much.
Signed-off-by: Marton Balint <cus@passwd.hu>

076fc75b

avformat: add MIDI Sample Dump Standard demuxer · 7f9978b0
Paul B Mahol authored Jan 20, 2017
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
7f9978b0

avcodec/ac3dec: add consistent noise generation option. · d5d474ae

Jonathan Campbell authored Sep 03, 2016

use av_lfg_init_from_data() to seed AC-3 dithering from the AC-3 frame
data to make it consistent given the same AC-3 frame, if option is set.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

d5d474ae

libavutil: add av_lfg_init_from_data() function · 76c5a69e
Jonathan Campbell authored Sep 03, 2016
```
seeds an AVLFG from binary data.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
76c5a69e
avfilter/af_hdcd: Fix leak of memory allocated by ff_make_format_list() · 0a5add45
Michael Niedermayer authored Jan 21, 2017
```
Fixes CID1396265
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
0a5add45
vaapi_mpeg4: Restore changes overwritten by merge · d40a1ae7
Mark Thompson authored Jan 21, 2017
```
From 2aa8e33d.
```
d40a1ae7

21 Jan, 2017 5 commits
- avfilter/avf_showspectrum: Fix memleak of text allocated by av_asprintf() · 61164112
  Michael Niedermayer authored Jan 21, 2017
```
Fixes CID1396261
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  61164112
- avfilter/vf_palettegen: Fix leak and simplify code · e740e9c7
  Michael Niedermayer authored Jan 21, 2017
```
Fixes CID1270818
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  e740e9c7
- avcodec/fraps: add support for PAL8 · d60f090d
  Paul B Mahol authored Jan 19, 2017
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  d60f090d
- avcodec: Add FF_CODEC_CAP_SKIP_FRAME_FILL_PARAM to most h263 based codecs · cde007dc
  Michael Niedermayer authored Dec 22, 2016
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  cde007dc
- avfilter/avfiltergraph: Add assert to write down in machine readable form what... · 5f2b360f
  Michael Niedermayer authored Jan 21, 2017
```
avfilter/avfiltergraph: Add assert to write down in machine readable form what is assumed about sample rates in swap_samplerates_on_filter()

Fixes CID1397292
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  5f2b360f
20 Jan, 2017 1 commit
- lavc/h264dec: re-indent after previous commit · cf3affab
  Matthieu Bouron authored Jan 20, 2017
  
  cf3affab