• Martin Storsjö's avatar
    aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · cad42fad
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    Previously all subpartitions except the eob=1 (DC) case ran with
    the same runtime:
    
    vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
    vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0
    
    By skipping individual 8x16 or 8x32 pixel slices in the first pass,
    we reduce the runtime of these functions like this:
    
    vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
    vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
    vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
    vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
    vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
    vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
    vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
    vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
    vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
    vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
    vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
    vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
    vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8
    
    I.e. in general a very minor overhead for the full subpartition case due
    to the additional cmps, but a significant speedup for the cases when we
    only need to process a small part of the actual input data.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    cad42fad
Name
Last commit
Last update
..
Makefile Loading commit data...
asm-offsets.h Loading commit data...
cabac.h Loading commit data...
dcadsp_init.c Loading commit data...
dcadsp_neon.S Loading commit data...
fft_init_aarch64.c Loading commit data...
fft_neon.S Loading commit data...
fmtconvert_init.c Loading commit data...
fmtconvert_neon.S Loading commit data...
h264chroma_init_aarch64.c Loading commit data...
h264cmc_neon.S Loading commit data...
h264dsp_init_aarch64.c Loading commit data...
h264dsp_neon.S Loading commit data...
h264idct_neon.S Loading commit data...
h264pred_init.c Loading commit data...
h264pred_neon.S Loading commit data...
h264qpel_init_aarch64.c Loading commit data...
h264qpel_neon.S Loading commit data...
hpeldsp_init_aarch64.c Loading commit data...
hpeldsp_neon.S Loading commit data...
imdct15_init.c Loading commit data...
imdct15_neon.S Loading commit data...
mdct_init.c Loading commit data...
mdct_neon.S Loading commit data...
mpegaudiodsp_init.c Loading commit data...
mpegaudiodsp_neon.S Loading commit data...
neon.S Loading commit data...
neontest.c Loading commit data...
rv40dsp_init_aarch64.c Loading commit data...
synth_filter_neon.S Loading commit data...
vc1dsp_init_aarch64.c Loading commit data...
videodsp.S Loading commit data...
videodsp_init.c Loading commit data...
vorbisdsp_init.c Loading commit data...
vorbisdsp_neon.S Loading commit data...
vp9dsp_init_aarch64.c Loading commit data...
vp9itxfm_neon.S Loading commit data...
vp9lpf_neon.S Loading commit data...
vp9mc_neon.S Loading commit data...