1. 28 Jun, 2017 1 commit
    • Clément Bœsch's avatar
      lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon · e4a27e2f
      Clément Bœsch authored
      The code originally pre-multiply by 2 the steps, causing the running sum
      of the h factors to drift away due to the lack of precision. It quickly
      causes an inaccuracy > 0.01.
      
      I tried diverse approaches such as multiply by 2.0 (instead of adding
      the value itself) without success.
      
      I'm unable to bench the impact of this change, feel free to compare.
      
      This commit fixes the incoming aacpsdsp tests.
      
      Following is an alternative simplified function (matching the incoming
      AArch64 code) that may be used:
      
      function ff_ps_stereo_interpolate_neon, export=1
              vld1.32         {q0}, [r2]
              vld1.32         {q1}, [r3]
              ldr             r12, [sp]
              vmov.f32        q8, q0
              vmov.f32        q9, q1
              vzip.32         q8, q0
              vzip.32         q9, q1
      1:
              vld1.32         {d4}, [r0,:64]
              vld1.32         {d6}, [r1,:64]
              vadd.f32        q8, q8, q9
              vadd.f32        q0, q0, q1
              vmov.f32        d5, d4
              vmov.f32        d7, d6
              vmul.f32        q2, q2, q8
              vmla.f32        q2, q3, q0
              vst1.32         {d4}, [r0,:64]!
              vst1.32         {d5}, [r1,:64]!
              subs            r12, r12, #1
              bgt             1b
              bx              lr
      endfunc
      e4a27e2f
  2. 27 Jun, 2017 34 commits
  3. 26 Jun, 2017 5 commits