• Martin Storsjö's avatar
    aarch64: vp8: Optimize vp8_idct_add_neon for aarch64 · 7e42d5f0
    Martin Storsjö authored
    The previous version was a pretty exact translation of the arm
    version. This version does do some unnecessary arithemetic (it does
    more operations on vectors that are only half filled; it does 4
    uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
    of packing data together (which could be done for free in the arm
    version).
    
    This gives a decent speedup on Cortex A53, a minor speedup on
    A72 and a very minor slowdown on Cortex A73.
    
    Before:        Cortex A53    A72    A73
    vp8_idct_add_neon:   79.7   67.5   65.0
    After:
    vp8_idct_add_neon:   67.7   64.8   66.7
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    7e42d5f0
Name
Last commit
Last update
avbuild Loading commit data...
avtools Loading commit data...
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL Loading commit data...
LICENSE Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
configure Loading commit data...