Commits · f015711ed1244fe8c09fd470262b57dfce6a2dbf · Linshizhi / ffmpeg.wasm-core

14 Jan, 2017 1 commit

arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 388f6e67

Martin Storsjö authored 8 years ago

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

388f6e67

27 Dec, 2016 1 commit
- checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST). · 1c8fbd7b
  Ronald S. Bultje authored 8 years ago
  
  1c8fbd7b
03 Aug, 2016 1 commit
- checkasm: add vp9 MC tests. · e99ecda5
  Ronald S. Bultje authored 9 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  e99ecda5
27 Jul, 2016 1 commit
- checkasm/vp9dsp: use declare_func_emms in check_loopfilter · 54a0a52b
  James Almer authored 8 years ago
```
Fixes checkasm failures on mmxext functions
Signed-off-by: James Almer <jamrial@gmail.com>
```
  54a0a52b
13 Oct, 2015 1 commit

vp9: add itxfm_add eob shortcuts to 10/12bpp functions. · eb4b5ff7

Ronald S. Bultje authored 9 years ago

These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.

eb4b5ff7

28 Sep, 2015 2 commits
- checkasm/vp9dsp: Fix iszero() to read the correct data · 69e456d7
  Henrik Gramner authored 9 years ago
  
  69e456d7
- checkasm: add vp9dsp.itxfm_add tests. · 0b227c6d
  Ronald S. Bultje authored 9 years ago
  
  0b227c6d
26 Sep, 2015 3 commits
- checkasm/vp9dsp: add const to suppress "discards const qualifier" warnings · 4e03f0ab
  James Almer authored 9 years ago
```
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  4e03f0ab
- checkasm: clip vp9 loopfilter test pixels inside allowed bitdepth range. · 7a4b97e9
  Ronald S. Bultje authored 9 years ago
  
  7a4b97e9
- tests/checkasm: make randomize_buffers a function for easier debugging · f559812a
  Rodger Combs authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  f559812a
24 Sep, 2015 1 commit

tests/checkasm/vp9dsp: Revert first hunk of · 5ba40c3c

Michael Niedermayer authored 9 years ago

The change was wrong, also add a comment explaining it

Found-by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

5ba40c3c

22 Sep, 2015 1 commit
- vp9: fix loopfilter test code to address Hendrik's comments. · 350e9c67
  Ronald S. Bultje authored 9 years ago
```
(I forgot to actually merge them into the patch I just pushed.)
```
  350e9c67
20 Sep, 2015 3 commits

tests/checkasm: fix stack smash in check_loopfilter · df2a2643
Rodger Combs authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
df2a2643
tests/checkasm/vp9dsp: Add () to protect macro arguments · bddcf758
Michael Niedermayer authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
bddcf758

checkasm: add VP9 loopfilter tests. · b0743674

Ronald S. Bultje authored 9 years ago

The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.

b0743674

16 Sep, 2015 1 commit
- tests/checkasm/vp9dsp: Use snprintf() for safetey · a860adb4
  Michael Niedermayer authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  a860adb4
15 Sep, 2015 2 commits
- checkasm: add vp9 intra pred tests. · bbd44e12
  Ronald S. Bultje authored 9 years ago
  
  bbd44e12
- checkasm: add vp9 MC tests. · 084451e1
  Ronald S. Bultje authored 9 years ago
  
  084451e1