- 14 Jan, 2017 1 commit
-
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit 9c8bc74c. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 27 Dec, 2016 1 commit
-
-
Ronald S. Bultje authored
-
- 03 Aug, 2016 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 27 Jul, 2016 1 commit
-
-
James Almer authored
Fixes checkasm failures on mmxext functions Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 13 Oct, 2015 1 commit
-
-
Ronald S. Bultje authored
These aren't quite as helpful as the ones in 8bpp, since over there, we can use pmulhrsw, but here the coefficients have too many bits to be able to take advantage of pmulhrsw. However, we can still skip cols for which all coefs are 0, and instead just zero the input data for the row itx. This helps a few % on overall decoding speed.
-
- 28 Sep, 2015 2 commits
-
-
Henrik Gramner authored
-
Ronald S. Bultje authored
-
- 26 Sep, 2015 3 commits
-
-
James Almer authored
Reviewed-by:
Henrik Gramner <henrik@gramner.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
Ronald S. Bultje authored
-
Rodger Combs authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 24 Sep, 2015 1 commit
-
-
Michael Niedermayer authored
The change was wrong, also add a comment explaining it Found-by: BBB Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 22 Sep, 2015 1 commit
-
-
Ronald S. Bultje authored
(I forgot to actually merge them into the patch I just pushed.)
-
- 20 Sep, 2015 3 commits
-
-
Rodger Combs authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
Ronald S. Bultje authored
The randomize_buffer() implementation assures that "most of the time", we'll do a good mix of wide16/wide8/hev/regular/no filters for complete code coverage. However, this is not mathematically assured because that would make the code either much more complex, or much less random.
-
- 16 Sep, 2015 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 15 Sep, 2015 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-