• Martin Storsjö's avatar
    aarch64: vp9itxfm: Avoid reloading the idct32 coefficients · 65aa002d
    Martin Storsjö authored
    The idct32x32 function actually pushed d8-d15 onto the stack even
    though it didn't clobber them; there are plenty of registers that
    can be used to allow keeping all the idct coefficients in registers
    without having to reload different subsets of them at different
    stages in the transform.
    
    After this, we still can skip pushing d12-d15.
    
    Before:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
    After:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    65aa002d
vp9itxfm_neon.S 62 KB