• Martin Storsjö's avatar
    arm: vp9lpf: Implement the mix2_44 function with one single filter pass · 575e31e9
    Martin Storsjö authored
    For this case, with 8 inputs but only changing 4 of them, we can fit
    all 16 input pixels into a q register, and still have enough temporary
    registers for doing the loop filter.
    
    The wd=8 filters would require too many temporary registers for
    processing all 16 pixels at once though.
    
    Before:                          Cortex A7      A8     A9     A53
    vp9_loop_filter_mix2_v_44_16_neon:   289.7   256.2  237.5   181.2
    After:
    vp9_loop_filter_mix2_v_44_16_neon:   221.2   150.5  177.7   138.0
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    575e31e9
Name
Last commit
Last update
avbuild Loading commit data...
avtools Loading commit data...
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL Loading commit data...
LICENSE Loading commit data...
Makefile Loading commit data...
README Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
configure Loading commit data...