• Clément Bœsch's avatar
    lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon · e4a27e2f
    Clément Bœsch authored
    The code originally pre-multiply by 2 the steps, causing the running sum
    of the h factors to drift away due to the lack of precision. It quickly
    causes an inaccuracy > 0.01.
    
    I tried diverse approaches such as multiply by 2.0 (instead of adding
    the value itself) without success.
    
    I'm unable to bench the impact of this change, feel free to compare.
    
    This commit fixes the incoming aacpsdsp tests.
    
    Following is an alternative simplified function (matching the incoming
    AArch64 code) that may be used:
    
    function ff_ps_stereo_interpolate_neon, export=1
            vld1.32         {q0}, [r2]
            vld1.32         {q1}, [r3]
            ldr             r12, [sp]
            vmov.f32        q8, q0
            vmov.f32        q9, q1
            vzip.32         q8, q0
            vzip.32         q9, q1
    1:
            vld1.32         {d4}, [r0,:64]
            vld1.32         {d6}, [r1,:64]
            vadd.f32        q8, q8, q9
            vadd.f32        q0, q0, q1
            vmov.f32        d5, d4
            vmov.f32        d7, d6
            vmul.f32        q2, q2, q8
            vmla.f32        q2, q3, q0
            vst1.32         {d4}, [r0,:64]!
            vst1.32         {d5}, [r1,:64]!
            subs            r12, r12, #1
            bgt             1b
            bx              lr
    endfunc
    e4a27e2f
aacpsdsp_neon.S 9.73 KB