• Martin Storsjö's avatar
    arm: vp9itxfm16: Make the larger core transforms standalone functions · 0ea60320
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
    17500 to 14516 bytes.
    
    This gives a small slowdown of a couple tens of cycles, up to around
    150 cycles for the full case of the largest transform, but makes
    it more feasible to add more optimized versions of these transforms.
    
    Before:                                 Cortex A7       A8       A9      A53
    vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4237.4   3561.5   3971.8   2525.3
    vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6371.9   5452.0   5779.3   3910.5
    vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22068.8  17867.5  19555.2  13871.6
    vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37268.9  38684.2  32314.2  23969.0
    
    After:
    vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4375.1   3571.9   4283.8   2567.2
    vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6415.6   5578.9   5844.6   3948.3
    vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22653.7  18079.7  19603.7  13905.3
    vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37593.2  38862.2  32235.8  24070.9
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    0ea60320
vp9itxfm_16bpp_neon.S 55.8 KB