1. 13 Apr, 2014 1 commit
  2. 04 Apr, 2014 3 commits
  3. 05 Mar, 2014 1 commit
  4. 28 Feb, 2014 3 commits
    • Christophe Gisquet's avatar
      dcadec: simplify decoding of VQ high frequencies · 4cb69642
      Christophe Gisquet authored
      The vector dequantization has a test in a loop preventing effective SIMD
      implementation. By moving it out of the loop, this loop can be DSPized.
      
      Therefore, modify the current DSP implementation. In particular, the
      DSP implementation no longer has to handle null loop sizes.
      
      The decode_hf implementations have following timings:
      
      For x86 Arrandale:
              C  SSE SSE2 SSE4
      win32: 260 162  119  104
      win64: 242 N/A   89   72
      
      The arm NEON optimizations follow in a later patch as external asm. The
      now unused check for the y modifier in arm inline asm is removed from
      configure.
      4cb69642
    • Christophe Gisquet's avatar
      x86: synth filter float: implement SSE2 version · 08e3ea60
      Christophe Gisquet authored
      Timings for Arrandale:
                C    SSE
      win32:  2108   334
      win64:  1152   322
      
      Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
      the jmp destination being aligned.
      
      Unrolling for ARCH_X86_64 is a 20 cycles gain.
      Signed-off-by: 's avatarJanne Grunau <janne-libav@jannau.net>
      08e3ea60
    • Christophe Gisquet's avatar
      x86: dcadsp: implement SSE lfe_dir · ad507d79
      Christophe Gisquet authored
      Results for Arrandale/Windows:
      32: 1670 -> 316
      64:  728 -> 298
      Signed-off-by: 's avatarJanne Grunau <janne-libav@jannau.net>
      ad507d79
  5. 07 Feb, 2014 1 commit
  6. 29 Aug, 2013 1 commit
  7. 17 Jul, 2013 1 commit
  8. 05 Feb, 2013 1 commit
  9. 23 Jan, 2013 1 commit
  10. 21 Jan, 2013 1 commit
  11. 20 Jan, 2013 1 commit