• Martin Storsjö's avatar
    arm: vp9itxfm: Avoid reloading the idct32 coefficients · 600f4c9b
    Martin Storsjö authored
    The idct32x32 function actually pushed q4-q7 onto the stack even
    though it didn't clobber them; there are plenty of registers that
    can be used to allow keeping all the idct coefficients in registers
    without having to reload different subsets of them at different
    stages in the transform.
    
    Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
    q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
    in the idct16 function), and the lanewise vmul needs a register in
    the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
    while doing idct16.
    
    While keeping these coefficients in registers, we still can skip pushing
    q7.
    
    Before:                              Cortex A7       A8       A9      A53
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
    After:
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8
    
    This is cherrypicked from libav commit
    402546a1.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    600f4c9b
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
cmdutils_opencl.c Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
ffmpeg.c Loading commit data...
ffmpeg.h Loading commit data...
ffmpeg_cuvid.c Loading commit data...
ffmpeg_dxva2.c Loading commit data...
ffmpeg_filter.c Loading commit data...
ffmpeg_opt.c Loading commit data...
ffmpeg_qsv.c Loading commit data...
ffmpeg_vaapi.c Loading commit data...
ffmpeg_vdpau.c Loading commit data...
ffmpeg_videotoolbox.c Loading commit data...
ffplay.c Loading commit data...
ffprobe.c Loading commit data...
ffserver.c Loading commit data...
ffserver_config.c Loading commit data...
ffserver_config.h Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...