• Martin Storsjö's avatar
    aarch64: vp9: Add NEON itxfm routines · 3c9546df
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    These are ported from the ARM version; thanks to the larger
    amount of registers available, we can do the 16x16 and 32x32
    transforms in slices 8 pixels wide instead of 4. This gives
    a speedup of around 1.4x compared to the 32 bit version.
    
    The fact that aarch64 doesn't have the same d/q register
    aliasing makes some of the macros quite a bit simpler as well.
    
    Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                           ARM  AArch64
    vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
    vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
    vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
    vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
    vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
    vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
    vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
    vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7
    
    The speedup vs C code (2-4x) is smaller than in the 32 bit case,
    mostly because the C code ends up significantly faster (around
    1.6x faster, with GCC 5.4) when built for aarch64.
    
    Examples of runtimes vs C on a Cortex A57 (for a slightly older version
    of the patch):
                                    A57 gcc-5.3   neon
    vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
    vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
    vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
    vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
    vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
    vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
    vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
    vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8
    
    The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
    is around 30-50% faster on the a57 compared to the a53.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    3c9546df
Name
Last commit
Last update
..
Makefile Loading commit data...
asm-offsets.h Loading commit data...
cabac.h Loading commit data...
dcadsp_init.c Loading commit data...
dcadsp_neon.S Loading commit data...
fft_init_aarch64.c Loading commit data...
fft_neon.S Loading commit data...
fmtconvert_init.c Loading commit data...
fmtconvert_neon.S Loading commit data...
h264chroma_init_aarch64.c Loading commit data...
h264cmc_neon.S Loading commit data...
h264dsp_init_aarch64.c Loading commit data...
h264dsp_neon.S Loading commit data...
h264idct_neon.S Loading commit data...
h264pred_init.c Loading commit data...
h264pred_neon.S Loading commit data...
h264qpel_init_aarch64.c Loading commit data...
h264qpel_neon.S Loading commit data...
hpeldsp_init_aarch64.c Loading commit data...
hpeldsp_neon.S Loading commit data...
imdct15_init.c Loading commit data...
imdct15_neon.S Loading commit data...
mdct_init.c Loading commit data...
mdct_neon.S Loading commit data...
mpegaudiodsp_init.c Loading commit data...
mpegaudiodsp_neon.S Loading commit data...
neon.S Loading commit data...
neontest.c Loading commit data...
rv40dsp_init_aarch64.c Loading commit data...
synth_filter_neon.S Loading commit data...
vc1dsp_init_aarch64.c Loading commit data...
videodsp.S Loading commit data...
videodsp_init.c Loading commit data...
vorbisdsp_init.c Loading commit data...
vorbisdsp_neon.S Loading commit data...
vp9dsp_init_aarch64.c Loading commit data...
vp9itxfm_neon.S Loading commit data...
vp9mc_neon.S Loading commit data...