1. 23 Jun, 2017 1 commit
  2. 07 Apr, 2017 1 commit
  3. 22 Mar, 2017 1 commit
  4. 14 Feb, 2017 1 commit
  5. 05 Jan, 2017 2 commits
    • Rostislav Pehlivanov's avatar
      imdct15: replace the FFT with a faster PFA FFT algorithm · 2d208aaa
      Rostislav Pehlivanov authored
      This commit replaces the current inefficient non-power-of-two FFT with a
      much faster FFT based on the Prime Factor Algorithm.
      Although it is already much faster than the old algorithm without SIMD,
      the new algorithm makes use of the already very throughouly SIMD'd power
      of two FFT, which improves performance even more across all platforms
      which we have SIMD support for.
      
      Most of the work was done by Peter Barfuss, who passed the code to me to
      implement into the iMDCT and the current codebase. The code for a
      5-point and 15-point FFT was derived from the previous implementation,
      although it was optimized and simplified, which will make its future
      SIMD easier. The 15-point FFT is currently using 6% of the current
      overall decoder overhead.
      
      The FFT can now easily be used as a forward transform by simply not
      multiplying the 5-point FFT's imaginary component by -1 (which comes
      from the fact that changing the complex exponential's angle by -1 also
      changes the output by that) and by multiplying the "theta" angle of the
      main exptab by -1. Hence the deliberately left multiplication by -1 at
      the end.
      
      FATE passes, and performance reports on other platforms/CPUs are
      welcome.
      
      Performance comparisons:
      
      iMDCT, PFA:
      101127 decicycles in speed,   32765 runs,      3 skips
      iMDCT, Old:
      211022 decicycles in speed,   32768 runs,      0 skips
      
      Standalone FFT, 300000 transforms of size 960:
          PFA        Old FFT     kiss_fft    libfftw3f
          3.659695s, 15.726912s, 13.300789s, 1.182222s
      
      Being only 3x slower than libfftw3f is a big achievement by itself.
      
      There appears to be something capping the performance in the iMDCT side
      of things, possibly during the pre-stage reindexing. However, it is
      certainly fast enough for now.
      Signed-off-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      2d208aaa
    • Rostislav Pehlivanov's avatar
      imdct15: remove the AArch64 assembly · 4fdacf4c
      Rostislav Pehlivanov authored
      Prep work for the next commit, which will add a new FFT algorithm
      which makes the iMDCT over 3x faster than it is currently (standalone,
      the FFT is with some framesizes over 10x faster).
      
      The new FFT algorithm uses the already thouroughly SIMD'd power of two
      FFT which already has SIMD for AArch64, so users of that platform will
      still see an improvement.
      
      The previous FFT+SIMD was barely 2.5x faster than the C versions on these
      platforms.
      Signed-off-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      4fdacf4c
  6. 02 Feb, 2015 1 commit
  7. 12 Jan, 2015 1 commit
  8. 15 May, 2014 2 commits
    • Janne Grunau's avatar
      aarch64: opus NEON iMDCT and FFT · d3f5b947
      Janne Grunau authored
      Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on
      Apple's A7.
      d3f5b947
    • Anton Khirnov's avatar
      lavc: add a native Opus decoder. · b70d7a4a
      Anton Khirnov authored
      Initial implementation by Andrew D'Addesio <modchipv12@gmail.com> during
      GSoC 2012.
      
      Completion by Anton Khirnov <anton@khirnov.net>, sponsored by the
      Mozilla Corporation.
      
      Further contributions by:
      Christophe Gisquet <christophe.gisquet@gmail.com>
      Janne Grunau <janne-libav@jannau.net>
      Luca Barbato <lu_zero@gentoo.org>
      b70d7a4a