• Martin Storsjö's avatar
    aarch64: vp9itxfm: Avoid reloading the idct32 coefficients · 2905657b
    Martin Storsjö authored
    The idct32x32 function actually pushed d8-d15 onto the stack even
    though it didn't clobber them; there are plenty of registers that
    can be used to allow keeping all the idct coefficients in registers
    without having to reload different subsets of them at different
    stages in the transform.
    
    After this, we still can skip pushing d12-d15.
    
    Before:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
    After:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3
    
    This is cherrypicked from libav commit
    65aa002d.
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    2905657b
vp9itxfm_neon.S 62 KB