1. 28 Jun, 2017 6 commits
    • Michael Niedermayer's avatar
      avcodec/utvideodec: Move bitstream end check out of inner loop · 1835c5e7
      Michael Niedermayer authored
      This is not needed when the buffer is large enough for the worst case of a line
      
      2% faster vlc reading
      Reviewed-by: 's avatarPaul B Mahol <onemda@gmail.com>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      1835c5e7
    • Clément Bœsch's avatar
    • Clément Bœsch's avatar
      lavc/aarch64: add a few SIMD functions for AAC PS · ff0ecef6
      Clément Bœsch authored
      ☭ tests/checkasm/checkasm --bench --test=aacpsdsp
      checkasm: using random seed 3318985180
      MMX implied by specified flags
      MMX implied by specified flags
      NEON:
       - aacpsdsp.add_squares        [OK]
       - aacpsdsp.mul_pair_single    [OK]
       - aacpsdsp.hybrid_analysis    [OK]
       - aacpsdsp.stereo_interpolate [OK]
      checkasm: all 5 tests passed
      nop: 10.0
      ps_add_squares_c: 63221.2
      ps_add_squares_neon: 22311.7
      ps_hybrid_analysis_c: 2466.6
      ps_hybrid_analysis_neon: 1521.9
      ps_mul_pair_single_c: 68592.0
      ps_mul_pair_single_neon: 17426.6
      ps_stereo_interpolate_c: 72344.3
      ps_stereo_interpolate_neon: 72308.8
      ps_stereo_interpolate_ipdopd_c: 117415.2
      ps_stereo_interpolate_ipdopd_neon: 113386.3
      ff0ecef6
    • Clément Bœsch's avatar
      9bbb0fbd
    • Clément Bœsch's avatar
      checkasm: add AAC PS tests · edd041e6
      Clément Bœsch authored
      This includes various fixes and improvements from James Almer.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      edd041e6
    • Clément Bœsch's avatar
      lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon · e4a27e2f
      Clément Bœsch authored
      The code originally pre-multiply by 2 the steps, causing the running sum
      of the h factors to drift away due to the lack of precision. It quickly
      causes an inaccuracy > 0.01.
      
      I tried diverse approaches such as multiply by 2.0 (instead of adding
      the value itself) without success.
      
      I'm unable to bench the impact of this change, feel free to compare.
      
      This commit fixes the incoming aacpsdsp tests.
      
      Following is an alternative simplified function (matching the incoming
      AArch64 code) that may be used:
      
      function ff_ps_stereo_interpolate_neon, export=1
              vld1.32         {q0}, [r2]
              vld1.32         {q1}, [r3]
              ldr             r12, [sp]
              vmov.f32        q8, q0
              vmov.f32        q9, q1
              vzip.32         q8, q0
              vzip.32         q9, q1
      1:
              vld1.32         {d4}, [r0,:64]
              vld1.32         {d6}, [r1,:64]
              vadd.f32        q8, q8, q9
              vadd.f32        q0, q0, q1
              vmov.f32        d5, d4
              vmov.f32        d7, d6
              vmul.f32        q2, q2, q8
              vmla.f32        q2, q3, q0
              vst1.32         {d4}, [r0,:64]!
              vst1.32         {d5}, [r1,:64]!
              subs            r12, r12, #1
              bgt             1b
              bx              lr
      endfunc
      e4a27e2f
  2. 27 Jun, 2017 34 commits