Commits · 032ad7a0bb3951180182668b84471daaa530bd45 · Linshizhi / ffmpeg.wasm-core

27 Mar, 2017 1 commit
- lavc/vp9: split into vp9{block,data,mvs} · 1c9f4b50
  Clément Bœsch authored 7 years ago
```
This is following Libav layout to ease merges.
```
  1c9f4b50
14 Jan, 2017 1 commit

arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 388f6e67

Martin Storsjö authored 7 years ago

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

388f6e67

27 Dec, 2016 1 commit
- checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST). · 1c8fbd7b
  Ronald S. Bultje authored 8 years ago
  
  1c8fbd7b
30 Nov, 2016 1 commit

arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 9c8bc74c

Martin Storsjö authored 8 years ago

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.
Signed-off-by: Martin Storsjö <martin@martin.st>

9c8bc74c

23 Nov, 2016 2 commits
- checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST). · 06fec74c
  Ronald S. Bultje authored 8 years ago
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  06fec74c
- Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately" · effc1430
  Martin Storsjö authored 8 years ago
```
This reverts commit 81d7f0bb.

Instead of just benchmarking dc separately, test all relevant subparts
(in the next commit).
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  effc1430
16 Nov, 2016 1 commit

checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately · 81d7f0bb

Martin Storsjö authored 8 years ago

The dc-only mode is already checked to work correctly above, but this
allows benchmarking this mode for performance tuning, and allows making
sure that it actually is correctly hooked up.
Signed-off-by: Martin Storsjö <martin@martin.st>

81d7f0bb

11 Nov, 2016 1 commit

checkasm: add vp9dsp.itxfm_add tests. · 0b37cd09

Ronald S. Bultje authored 9 years ago

This includes fixes by Henrik Gramner.

The forward transforms are derived from the reference encoder.
Signed-off-by: Martin Storsjö <martin@martin.st>

0b37cd09

03 Nov, 2016 1 commit

vp9: Flip the order of arguments in MC functions · 2e55e26b

Martin Storsjö authored 8 years ago

This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.
Signed-off-by: Martin Storsjö <martin@martin.st>

2e55e26b

04 Oct, 2016 1 commit

checkasm: add VP9 loopfilter tests. · c935b54b

Ronald S. Bultje authored 9 years ago

The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.

Some fixes and improvements by Rodger Combs <rodger.combs@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

c935b54b

03 Aug, 2016 1 commit
- checkasm: add vp9 MC tests. · e99ecda5
  Ronald S. Bultje authored 9 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
  e99ecda5
27 Jul, 2016 1 commit
- checkasm/vp9dsp: use declare_func_emms in check_loopfilter · 54a0a52b
  James Almer authored 8 years ago
```
Fixes checkasm failures on mmxext functions
Signed-off-by: James Almer <jamrial@gmail.com>
```
  54a0a52b
13 Oct, 2015 1 commit

vp9: add itxfm_add eob shortcuts to 10/12bpp functions. · eb4b5ff7

Ronald S. Bultje authored 9 years ago

These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.

eb4b5ff7

28 Sep, 2015 2 commits
- checkasm/vp9dsp: Fix iszero() to read the correct data · 69e456d7
  Henrik Gramner authored 9 years ago
  
  69e456d7
- checkasm: add vp9dsp.itxfm_add tests. · 0b227c6d
  Ronald S. Bultje authored 9 years ago
  
  0b227c6d
26 Sep, 2015 3 commits
- checkasm/vp9dsp: add const to suppress "discards const qualifier" warnings · 4e03f0ab
  James Almer authored 9 years ago
```
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  4e03f0ab
- checkasm: clip vp9 loopfilter test pixels inside allowed bitdepth range. · 7a4b97e9
  Ronald S. Bultje authored 9 years ago
  
  7a4b97e9
- tests/checkasm: make randomize_buffers a function for easier debugging · f559812a
  Rodger Combs authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  f559812a
24 Sep, 2015 1 commit

tests/checkasm/vp9dsp: Revert first hunk of · 5ba40c3c

Michael Niedermayer authored 9 years ago

The change was wrong, also add a comment explaining it

Found-by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

5ba40c3c

22 Sep, 2015 1 commit
- vp9: fix loopfilter test code to address Hendrik's comments. · 350e9c67
  Ronald S. Bultje authored 9 years ago
```
(I forgot to actually merge them into the patch I just pushed.)
```
  350e9c67
20 Sep, 2015 3 commits

tests/checkasm: fix stack smash in check_loopfilter · df2a2643
Rodger Combs authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
df2a2643
tests/checkasm/vp9dsp: Add () to protect macro arguments · bddcf758
Michael Niedermayer authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
bddcf758

checkasm: add VP9 loopfilter tests. · b0743674

Ronald S. Bultje authored 9 years ago

b0743674

16 Sep, 2015 1 commit
- tests/checkasm/vp9dsp: Use snprintf() for safetey · a860adb4
  Michael Niedermayer authored 9 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  a860adb4
15 Sep, 2015 2 commits
- checkasm: add vp9 intra pred tests. · bbd44e12
  Ronald S. Bultje authored 9 years ago
  
  bbd44e12
- checkasm: add vp9 MC tests. · 084451e1
  Ronald S. Bultje authored 9 years ago
  
  084451e1