• Martin Storsjö's avatar
    aarch64: vp9: Add NEON optimizations of VP9 MC functions · 1f7801c2
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    These are ported from the ARM version; it is essentially a 1:1
    port with no extra added features, but with some hand tuning
    (especially for the plain copy/avg functions). The ARM version
    isn't very register starved to begin with, so there's not much
    to be gained from having more spare registers here - we only
    avoid having to clobber callee-saved registers.
    
    Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                         ARM   AArch64
    vp9_avg4_neon:                      27.2      23.7
    vp9_avg8_neon:                      56.5      54.7
    vp9_avg16_neon:                    169.9     167.4
    vp9_avg32_neon:                    585.8     585.2
    vp9_avg64_neon:                   2460.3    2294.7
    vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
    vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
    vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
    vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
    vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
    vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
    vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
    vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
    vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
    vp9_put4_neon:                      18.0      17.2
    vp9_put8_neon:                      40.2      37.7
    vp9_put16_neon:                     97.4      99.5
    vp9_put32_neon/armv8:              346.0     307.4
    vp9_put64_neon/armv8:             1319.0    1107.5
    vp9_put_8tap_smooth_4h_neon:       126.7     118.2
    vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
    vp9_put_8tap_smooth_4v_neon:       113.0      86.5
    vp9_put_8tap_smooth_8h_neon:       229.7     221.6
    vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
    vp9_put_8tap_smooth_8v_neon:       215.0     187.5
    vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
    vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
    vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4
    
    These are generally about as fast as the corresponding ARM
    routines on the same CPU (at least on the A53), in most cases
    marginally faster.
    
    The speedup vs C code is pretty much the same as for the 32 bit
    case; on the A53 it's around 6-13x for ther larger 8tap filters.
    The exact speedup varies a little, since the C versions generally
    don't end up exactly as slow/fast as on 32 bit.
    
    This is an adapted cherry-pick from libav commit
    383d96aa.
    Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
    1f7801c2
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
cmdutils_opencl.c Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
ffmpeg.c Loading commit data...
ffmpeg.h Loading commit data...
ffmpeg_cuvid.c Loading commit data...
ffmpeg_dxva2.c Loading commit data...
ffmpeg_filter.c Loading commit data...
ffmpeg_opt.c Loading commit data...
ffmpeg_qsv.c Loading commit data...
ffmpeg_vaapi.c Loading commit data...
ffmpeg_vdpau.c Loading commit data...
ffmpeg_videotoolbox.c Loading commit data...
ffplay.c Loading commit data...
ffprobe.c Loading commit data...
ffserver.c Loading commit data...
ffserver_config.c Loading commit data...
ffserver_config.h Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...