• Martin Storsjö's avatar
    arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 9c8bc74c
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    Previously all subpartitions except the eob=1 (DC) case ran with
    the same runtime:
    
                                         Cortex A7       A8       A9      A53
    vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3
    
    By skipping individual 4x16 or 4x32 pixel slices in the first pass,
    we reduce the runtime of these functions like this:
    
    vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
    vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
    vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
    vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
    vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
    vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
    vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
    vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
    vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
    vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
    vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
    vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
    vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
    vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
    vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1
    
    I.e. in general a very minor overhead for the full subpartition case due
    to the additional loads and cmps, but a significant speedup for the cases
    when we only need to process a small part of the actual input data.
    
    In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
    16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
    8x8 or 16x16 subpartitions respectively.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    9c8bc74c
Name
Last commit
Last update
..
Makefile Loading commit data...
aac.h Loading commit data...
aacpsdsp_init_arm.c Loading commit data...
aacpsdsp_neon.S Loading commit data...
ac3dsp_arm.S Loading commit data...
ac3dsp_armv6.S Loading commit data...
ac3dsp_init_arm.c Loading commit data...
ac3dsp_neon.S Loading commit data...
apedsp_init_arm.c Loading commit data...
apedsp_neon.S Loading commit data...
asm-offsets.h Loading commit data...
audiodsp_arm.h Loading commit data...
audiodsp_init_arm.c Loading commit data...
audiodsp_init_neon.c Loading commit data...
audiodsp_neon.S Loading commit data...
blockdsp_arm.h Loading commit data...
blockdsp_init_arm.c Loading commit data...
blockdsp_init_neon.c Loading commit data...
blockdsp_neon.S Loading commit data...
cabac.h Loading commit data...
dca.h Loading commit data...
dcadsp_init_arm.c Loading commit data...
dcadsp_neon.S Loading commit data...
dcadsp_vfp.S Loading commit data...
fft_fixed_init_arm.c Loading commit data...
fft_fixed_neon.S Loading commit data...
fft_init_arm.c Loading commit data...
fft_neon.S Loading commit data...
fft_vfp.S Loading commit data...
flacdsp_arm.S Loading commit data...
flacdsp_init_arm.c Loading commit data...
fmtconvert_init_arm.c Loading commit data...
fmtconvert_neon.S Loading commit data...
fmtconvert_vfp.S Loading commit data...
g722dsp_init_arm.c Loading commit data...
g722dsp_neon.S Loading commit data...
h264chroma_init_arm.c Loading commit data...
h264cmc_neon.S Loading commit data...
h264dsp_init_arm.c Loading commit data...
h264dsp_neon.S Loading commit data...
h264idct_neon.S Loading commit data...
h264pred_init_arm.c Loading commit data...
h264pred_neon.S Loading commit data...
h264qpel_init_arm.c Loading commit data...
h264qpel_neon.S Loading commit data...
hpeldsp_arm.S Loading commit data...
hpeldsp_arm.h Loading commit data...
hpeldsp_armv6.S Loading commit data...
hpeldsp_init_arm.c Loading commit data...
hpeldsp_init_armv6.c Loading commit data...
hpeldsp_init_neon.c Loading commit data...
hpeldsp_neon.S Loading commit data...
idct.h Loading commit data...
idctdsp_arm.S Loading commit data...
idctdsp_arm.h Loading commit data...
idctdsp_armv6.S Loading commit data...
idctdsp_init_arm.c Loading commit data...
idctdsp_init_armv5te.c Loading commit data...
idctdsp_init_armv6.c Loading commit data...
idctdsp_init_neon.c Loading commit data...
idctdsp_neon.S Loading commit data...
int_neon.S Loading commit data...
jrevdct_arm.S Loading commit data...
mathops.h Loading commit data...
mdct_fixed_init_arm.c Loading commit data...
mdct_fixed_neon.S Loading commit data...
mdct_init_arm.c Loading commit data...
mdct_neon.S Loading commit data...
mdct_vfp.S Loading commit data...
me_cmp_armv6.S Loading commit data...
me_cmp_init_arm.c Loading commit data...
mlpdsp_armv5te.S Loading commit data...
mlpdsp_armv6.S Loading commit data...
mlpdsp_init_arm.c Loading commit data...
mpegaudiodsp_fixed_armv6.S Loading commit data...
mpegaudiodsp_init_arm.c Loading commit data...
mpegvideo_arm.c Loading commit data...
mpegvideo_arm.h Loading commit data...
mpegvideo_armv5te.c Loading commit data...
mpegvideo_armv5te_s.S Loading commit data...
mpegvideo_neon.S Loading commit data...
mpegvideoencdsp_armv6.S Loading commit data...
mpegvideoencdsp_init_arm.c Loading commit data...
neon.S Loading commit data...
neontest.c Loading commit data...
pixblockdsp_armv6.S Loading commit data...
pixblockdsp_init_arm.c Loading commit data...
rdft_init_arm.c Loading commit data...
rdft_neon.S Loading commit data...
rv34dsp_init_arm.c Loading commit data...
rv34dsp_neon.S Loading commit data...
rv40dsp_init_arm.c Loading commit data...
rv40dsp_neon.S Loading commit data...
sbrdsp_init_arm.c Loading commit data...
sbrdsp_neon.S Loading commit data...
simple_idct_arm.S Loading commit data...
simple_idct_armv5te.S Loading commit data...
simple_idct_armv6.S Loading commit data...
simple_idct_neon.S Loading commit data...
startcode.h Loading commit data...
startcode_armv6.S Loading commit data...
synth_filter_neon.S Loading commit data...
synth_filter_vfp.S Loading commit data...
vc1dsp.h Loading commit data...
vc1dsp_init_arm.c Loading commit data...
vc1dsp_init_neon.c Loading commit data...
vc1dsp_neon.S Loading commit data...
videodsp_arm.h Loading commit data...
videodsp_armv5te.S Loading commit data...
videodsp_init_arm.c Loading commit data...
videodsp_init_armv5te.c Loading commit data...
vorbisdsp_init_arm.c Loading commit data...
vorbisdsp_neon.S Loading commit data...
vp3dsp_init_arm.c Loading commit data...
vp3dsp_neon.S Loading commit data...
vp56_arith.h Loading commit data...
vp6dsp_init_arm.c Loading commit data...
vp6dsp_neon.S Loading commit data...
vp8.h Loading commit data...
vp8_armv6.S Loading commit data...
vp8dsp.h Loading commit data...
vp8dsp_armv6.S Loading commit data...
vp8dsp_init_arm.c Loading commit data...
vp8dsp_init_armv6.c Loading commit data...
vp8dsp_init_neon.c Loading commit data...
vp8dsp_neon.S Loading commit data...
vp9dsp_init_arm.c Loading commit data...
vp9itxfm_neon.S Loading commit data...
vp9lpf_neon.S Loading commit data...
vp9mc_neon.S Loading commit data...