• Martin Storsjö's avatar
    aarch64: vp9: Add NEON itxfm routines · 3c9546df
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    These are ported from the ARM version; thanks to the larger
    amount of registers available, we can do the 16x16 and 32x32
    transforms in slices 8 pixels wide instead of 4. This gives
    a speedup of around 1.4x compared to the 32 bit version.
    
    The fact that aarch64 doesn't have the same d/q register
    aliasing makes some of the macros quite a bit simpler as well.
    
    Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                           ARM  AArch64
    vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
    vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
    vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
    vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
    vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
    vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
    vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
    vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7
    
    The speedup vs C code (2-4x) is smaller than in the 32 bit case,
    mostly because the C code ends up significantly faster (around
    1.6x faster, with GCC 5.4) when built for aarch64.
    
    Examples of runtimes vs C on a Cortex A57 (for a slightly older version
    of the patch):
                                    A57 gcc-5.3   neon
    vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
    vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
    vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
    vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
    vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
    vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
    vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
    vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8
    
    The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
    is around 30-50% faster on the a57 compared to the a53.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    3c9546df
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL Loading commit data...
LICENSE Loading commit data...
Makefile Loading commit data...
README Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
avconv.c Loading commit data...
avconv.h Loading commit data...
avconv_dxva2.c Loading commit data...
avconv_filter.c Loading commit data...
avconv_opt.c Loading commit data...
avconv_qsv.c Loading commit data...
avconv_vaapi.c Loading commit data...
avconv_vda.c Loading commit data...
avconv_vdpau.c Loading commit data...
avplay.c Loading commit data...
avprobe.c Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...