Commits · 37394ef01b040605f8e1c98e73aa12b1c0bcba07 · Linshizhi / ffmpeg.wasm-core

19 Feb, 2019 15 commits

aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 · 37394ef0

Martin Storsjö authored Feb 01, 2019

This makes it similar to put_epel16_v6, and gives a large speedup
on Cortex A53, a minor speedup on A72 and a very minor slowdown on
A73.

Before:                 Cortex A53     A72     A73
vp8_put_epel16_h6v6_neon:   2211.4  1586.5  1431.7
After:
vp8_put_epel16_h6v6_neon:   1736.9  1522.0  1448.1
Signed-off-by: Martin Storsjö <martin@martin.st>

37394ef0

arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 · cef914e0

Martin Storsjö authored Feb 01, 2019

This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.

Before:                   Cortex A7       A8       A9      A53     A72
vp8_put_epel16_h6v6_neon:    3058.0   2218.5   2459.8   2183.0  1572.2
After:
vp8_put_epel16_h6v6_neon:    2670.8   1934.2   2244.4   1729.4  1503.9
Signed-off-by: Martin Storsjö <martin@martin.st>

cef914e0

aarch64: vp8: Port bilin functions from arm version · e39a9212

Martin Storsjö authored Feb 01, 2019

                      Cortex A53     A72     A73
vp8_put_bilin4_h_c:        303.8   102.2   161.8
vp8_put_bilin4_h_neon:     100.0    40.9    41.2
vp8_put_bilin4_hv_c:       322.8   201.0   305.9
vp8_put_bilin4_hv_neon:    156.8    72.6    77.0
vp8_put_bilin4_v_c:        304.7   101.7   166.5
vp8_put_bilin4_v_neon:      82.7    41.2    33.0
vp8_put_bilin8_h_c:       1192.7   352.5   623.8
vp8_put_bilin8_h_neon:     213.5    70.2    87.8
vp8_put_bilin8_hv_c:      1098.6   769.2  1041.9
vp8_put_bilin8_hv_neon:    324.0   123.5   146.0
vp8_put_bilin8_v_c:       1193.9   350.4   617.7
vp8_put_bilin8_v_neon:     183.9    60.7    64.7
vp8_put_bilin16_h_c:      2353.1   671.2  1223.3
vp8_put_bilin16_h_neon:    261.9   140.7   145.0
vp8_put_bilin16_hv_c:     2453.2  1470.9  2355.2
vp8_put_bilin16_hv_neon:   383.9   196.0   217.0
vp8_put_bilin16_v_c:      2349.3   669.8  1251.2
vp8_put_bilin16_v_neon:    202.9   110.7    96.2
Signed-off-by: Martin Storsjö <martin@martin.st>

e39a9212

aarch64: vp8: Port epel4 functions from arm version · 58d15492

Martin Storsjö authored Feb 01, 2019

                      Cortex A53    A72    A73
vp8_put_epel4_h4_c:        631.4  291.7  367.8
vp8_put_epel4_h4_neon:     241.0  131.0  155.7
vp8_put_epel4_h4v4_c:      967.5  529.3  667.7
vp8_put_epel4_h4v4_neon:   429.3  241.8  279.7
vp8_put_epel4_h4v6_c:     1374.7  657.5  864.5
vp8_put_epel4_h4v6_neon:   515.5  295.5  334.7
vp8_put_epel4_h6_c:        851.0  421.0  486.0
vp8_put_epel4_h6_neon:     321.5  195.0  217.7
vp8_put_epel4_h6v4_c:     1111.3  621.1  781.2
vp8_put_epel4_h6v4_neon:   539.2  328.0  365.3
vp8_put_epel4_h6v6_c:     1561.3  763.3  999.7
vp8_put_epel4_h6v6_neon:   645.5  401.0  434.7
vp8_put_epel4_v4_c:        663.8  298.3  357.0
vp8_put_epel4_v4_neon:     116.0   81.5   72.5
vp8_put_epel4_v6_c:        870.5  437.0  507.4
vp8_put_epel4_v6_neon:     147.7  108.8   92.0
Signed-off-by: Martin Storsjö <martin@martin.st>

58d15492

aarch64: vp8: Port missing epel8 functions from arm version · cc7ba00c

Martin Storsjö authored Feb 01, 2019

                      Cortex A53     A72     A73
vp8_put_epel8_h4_c:       2594.8  1159.6  1374.8
vp8_put_epel8_h4_neon:     506.4   244.2   314.0
vp8_put_epel8_h6_c:       3445.8  1677.1  1811.3
vp8_put_epel8_h6_neon:     634.4   371.7   433.0
vp8_put_epel8_v4_c:       2614.0  1174.8  1378.0
vp8_put_epel8_v4_neon:     321.0   221.7   235.8
vp8_put_epel8_v6_c:       3635.5  1703.0  2079.2
vp8_put_epel8_v6_neon:     416.9   317.0   295.5
Signed-off-by: Martin Storsjö <martin@martin.st>

cc7ba00c

aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version · 52c9b0a6

Martin Storsjö authored Feb 01, 2019

                     Cortex A53    A72    A73
vp8_luma_dc_wht_c:        115.7   75.7   90.7
vp8_luma_dc_wht_neon:      60.7   41.2   45.7
vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0
Signed-off-by: Martin Storsjö <martin@martin.st>

52c9b0a6

aarch64: vp8: Fix a typo in a comment · c513fcd7
Martin Storsjö authored Jan 31, 2019
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
c513fcd7
aarch64: vp8: Reorder the function pointer inits to match the arm original · f1011ea2
Martin Storsjö authored Jan 31, 2019
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
f1011ea2

aarch64: vp8: Move the vp8dsp makefile entries to the right places · b4b27dce

Martin Storsjö authored Jan 31, 2019

Even if NEON would be disabled, the init functions should be built
as they are called as long as ARCH_AARCH64 is set.

These functions are part of a generic DSP subsytem, not tied directly
to one decoder. (They should be built if the vp7 decoder is enabled,
even if the vp8 decoder is disabled.)
Signed-off-by: Martin Storsjö <martin@martin.st>

b4b27dce

aarch64: vp8: Remove superfluous includes · ad32f7b1

Martin Storsjö authored Jan 31, 2019

This fixes building with MSVC, which lacks unistd.h.
Signed-off-by: Martin Storsjö <martin@martin.st>

ad32f7b1

aarch64: vp8: Use the proper aarch64 form for conditional branches · 85bfaa49

Martin Storsjö authored Feb 01, 2019

The previous form also does seem to assemble on current tools,
but I think it might fail on some older aarch64 tools.
Signed-off-by: Martin Storsjö <martin@martin.st>

85bfaa49

aarch64: vp8: Fix assembling with armasm64 · 2eeac799
Martin Storsjö authored Jan 31, 2019
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
2eeac799

aarch64: vp8: Fix assembling with clang · 26d7af4c

Martin Storsjö authored Jan 31, 2019

This also partially fixes assembling with MS armasm64 (via
gas-preprocessor).

The movrel macro invocations need to pass the offset via a separate
parameter. Mach-o and COFF relocations don't allow a negative
offset to a symbol, which is handled properly if the offset is passed
via the parameter. If no offset parameter is given, the macro
evaluates to something like "adrp x17, subpel_filters-16+(0)", which
older clang versions also fail to parse (the older clang versions
only support one single offset term, although it can be a parenthesis.
Signed-off-by: Martin Storsjö <martin@martin.st>

26d7af4c

libavcodec: vp8 neon optimizations for aarch64 · 0801853e

Magnus Röös authored Jan 31, 2019

Partial port of the ARM Neon for aarch64.

Benchmarks from fate:

benchmarking with Linux Perf Monitoring API
nop: 58.6
checkasm: using random seed 1760970128
NEON:
 - vp8dsp.idct       [OK]
 - vp8dsp.mc         [OK]
 - vp8dsp.loopfilter [OK]
checkasm: all 21 tests passed
vp8_idct_add_c: 201.6
vp8_idct_add_neon: 83.1
vp8_idct_dc_add_c: 107.6
vp8_idct_dc_add_neon: 33.8
vp8_idct_dc_add4y_c: 426.4
vp8_idct_dc_add4y_neon: 59.4
vp8_loop_filter8uv_h_c: 688.1
vp8_loop_filter8uv_h_neon: 216.3
vp8_loop_filter8uv_inner_h_c: 649.3
vp8_loop_filter8uv_inner_h_neon: 195.3
vp8_loop_filter8uv_inner_v_c: 544.8
vp8_loop_filter8uv_inner_v_neon: 131.3
vp8_loop_filter8uv_v_c: 706.1
vp8_loop_filter8uv_v_neon: 141.1
vp8_loop_filter16y_h_c: 668.8
vp8_loop_filter16y_h_neon: 242.8
vp8_loop_filter16y_inner_h_c: 647.3
vp8_loop_filter16y_inner_h_neon: 224.6
vp8_loop_filter16y_inner_v_c: 647.8
vp8_loop_filter16y_inner_v_neon: 128.8
vp8_loop_filter16y_v_c: 721.8
vp8_loop_filter16y_v_neon: 154.3
vp8_loop_filter_simple_h_c: 387.8
vp8_loop_filter_simple_h_neon: 187.6
vp8_loop_filter_simple_v_c: 384.1
vp8_loop_filter_simple_v_neon: 78.6
vp8_put_epel8_h4v4_c: 3971.1
vp8_put_epel8_h4v4_neon: 855.1
vp8_put_epel8_h4v6_c: 5060.1
vp8_put_epel8_h4v6_neon: 989.6
vp8_put_epel8_h6v4_c: 4320.8
vp8_put_epel8_h6v4_neon: 1007.3
vp8_put_epel8_h6v6_c: 5449.3
vp8_put_epel8_h6v6_neon: 1158.1
vp8_put_epel16_h6_c: 6683.8
vp8_put_epel16_h6_neon: 831.8
vp8_put_epel16_h6v6_c: 11110.8
vp8_put_epel16_h6v6_neon: 2214.8
vp8_put_epel16_v6_c: 7024.8
vp8_put_epel16_v6_neon: 799.6
vp8_put_pixels8_c: 112.8
vp8_put_pixels8_neon: 78.1
vp8_put_pixels16_c: 131.3
vp8_put_pixels16_neon: 129.8

This contains a fix to include guards by Carl Eugen Hoyos.
Signed-off-by: Martin Storsjö <martin@martin.st>

0801853e

Unbreak travis on macos · 899ee030
Luca Barbato authored Feb 12, 2019

899ee030

16 Feb, 2019 11 commits
- tests: Add a convenience function for video-only lavf tests · f8df5e2f
  Diego Biurrun authored Aug 08, 2018
```
Rename a test in the process for consistency and simplicity and
remove the remnants of the now-unused lavf regression test scripts.
```
  f8df5e2f
- tests: Convert lavf container tests to non-legacy test scripts · 618d02c1
  Diego Biurrun authored Feb 02, 2019
```
Rename some tests in the process for consistency and simplicity.
```
  618d02c1
- tests: Convert lavf pixfmt conversion tests to non-legacy test scripts · 896fe15d
  Diego Biurrun authored Aug 14, 2018
```
Also split monolithic lavf-pixfmt test into individual tests.
```
  896fe15d
- tests: Convert lavf image tests to non-legacy test scripts · a957e937
  Diego Biurrun authored Feb 02, 2019
```
Rename some tests in the process for consistency and simplicity.
```
  a957e937
- tests: Convert audio-only lavf tests to non-legacy test scripts · eb8a8115
  Diego Biurrun authored Feb 02, 2019
```
Rename some tests in the process for consistency and simplicity.
```
  eb8a8115
- tests: Convert image2pipe tests to non-legacy test scripts · a70eac7a
  Diego Biurrun authored Aug 08, 2018
  
  a70eac7a
- tests: Use a predefined function for lavf-rm test · 5846b496
  Diego Biurrun authored Aug 14, 2018
  
  5846b496
- tests: Enable CRC test for yuv4mpeg · dad5fd59
  Diego Biurrun authored Aug 14, 2018
  
  dad5fd59
- tests: Drop duplicate variable declaration · 86291498
  Diego Biurrun authored Sep 20, 2018
  
  86291498
- tests: Unify output directory creation · e22ffb38
  Diego Biurrun authored Feb 01, 2019
  
  e22ffb38
- build: Rename OBJDIRS variable to OUTDIRS · 7e5bde93
  Diego Biurrun authored Feb 03, 2019
```
These directories are not just for object files.
```
  7e5bde93
12 Feb, 2019 1 commit

srt: Set srto_sender flag to sender srt socket · 90b15f60

Sven Dueking authored Feb 07, 2019

SRT API Documentation:
This flag is superfluous if both parties are at least version 1.3.0
(this shall be enforced by setting this value to SRTO_MINVERSION if
you expect that it be true) and therefore support HSv5 handshake,
where the SRT extended handshake is done with the overall handshake
process.

This flag is however obligatory if at least one party may be using
SRT below version 1.3.0 and does not support HSv5.

90b15f60

27 Jan, 2019 1 commit
- h264/x86: sign extend int stride in deblock functions · 156ea66c
  Janne Grunau authored Jan 27, 2019
```
Fixes checkasm errors after adding the h264 deblock tests.
```
  156ea66c
26 Jan, 2019 5 commits

libopenh264dec: Use a newer decoding entry point function · eec93e57

Martin Storsjö authored Jan 25, 2019

The "new" entry point actually has existed since OpenH264 1.4 in
2015 and is the the recommended decoding entry point.

The name of this function, DecodeFrameNoDelay, is rather backwards
considering that it doesn't return the latest decoded frame immediately,
but actually does proper delaying and reordering of frames.
Signed-off-by: Martin Storsjö <martin@martin.st>

eec93e57

h264/aarch64: add intra loop filter neon asm · 28a8b541

Janne Grunau authored Aug 13, 2018

Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.

Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3

28a8b541

h264/aarch64: optimize neon loop filter · 846c3d6a

Janne Grunau authored Jan 01, 2019

Exit as soon as possible if no filtering will be done.

Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c:      72.4 ->  72.5
h264_h_loop_filter_luma_8bpp_neon:   97.1 ->  56.3
h264_v_loop_filter_luma_8bpp_c:     174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon:   62.9 ->  60.9
h264_h_loop_filter_chroma_8bpp_c:    30.2 ->  30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 ->  25.7
h264_v_loop_filter_chroma_8bpp_c:    57.3 ->  57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 ->  24.0

846c3d6a

checkasm/h264: add loop filter tests · d7f4f5c4
Janne Grunau authored Jan 01, 2019

d7f4f5c4
h264/aarch64: sign extend int stride in loop filter asm · bb515e3a
Janne Grunau authored Jan 01, 2019

bb515e3a

25 Jan, 2019 1 commit

arm: Create proper .rdata sections for COFF · 41cf3e3b

Martin Storsjö authored Jan 11, 2019

As .rodata isn't one of the default created sections for COFF, it was
created as a read-write data section. By using the default .rdata
section name for COFF, it automatically becomes a read-only data section.
The existing ".section .rodata" works as intended for ELF though.

This is based on an original patch and diagnose by Tom Tan
<Tom.Tan@microsoft.com>.
Signed-off-by: Martin Storsjö <martin@martin.st>

41cf3e3b

23 Jan, 2019 1 commit

avcodec/libdav1d: properly free all output picture references · ca44fa5d

James Almer authored Jan 23, 2019

Dav1dPictures contain more than one buffer reference, so we're forced to use the
API properly to free them all.
Signed-off-by: James Almer <jamrial@gmail.com>

ca44fa5d

17 Jan, 2019 1 commit
- cook: Use the correct table for 6-bit stereo coupling · 90adbf4a
  Luca Barbato authored Oct 14, 2018
```
Thanks to Kostya for digging it out and telling me.
```
  90adbf4a
12 Dec, 2018 1 commit

libdav1d: update API usage to the first stable release · 70ab2778

James Almer authored Dec 11, 2018

The color fields were moved to another struct, and a way to propagate
timestamps and other input metadata was introduced, so the packet
fifo can be removed.

Add support for 12bit streams, an option to disable film grain, and
read the profile from the sequence header referenced by the ouput
picture instead of guessing based on output pix_fmt.
Signed-off-by: James Almer <jamrial@gmail.com>

70ab2778

15 Nov, 2018 1 commit
- libdav1d: fix build after a recent API break · 56f50183
  James Almer authored Nov 15, 2018
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  56f50183
13 Nov, 2018 1 commit

qsvenc: Add VDENC support for H264 and HEVC · e716323f

Linjie Fu authored Nov 05, 2018

Add VDENC(lowpower mode) support for QSV h264 and HEVC

It's an experimental function(like lowpower in vaapi) with
some limitations:
- CBR/VBR require HuC which should be explicitly loaded via i915
module parameter(i915.enable_guc=2 for linux kerner version >= 4.16)
- HEVC VDENC was supported >= ICE LAKE

use option "-low_power 1" to enable VDENC.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>

e716323f

06 Nov, 2018 1 commit

avcodec: libdav1d AV1 decoder wrapper. · 9bf9358b

James Almer authored Nov 06, 2018

Originally written by Ronald S. Bultje, with fixes, optimizations and
improvements by James Almer.
Signed-off-by: James Almer <jamrial@gmail.com>

9bf9358b