1. 30 Jul, 2017 1 commit
    • Rostislav Pehlivanov's avatar
      mdct15: add inverse transform postrotation SIMD · 70eb77b3
      Rostislav Pehlivanov authored
      2.5ms frames:
      Before   (c):  2638 decicycles in postrotate, 2097040 runs,    112 skips
      After (sse3):  1467 decicycles in postrotate, 2097083 runs,     69 skips
      After (avx2):  1244 decicycles in postrotate, 2097085 runs,     67 skips
      
      5ms frames:
      Before   (c):  4987 decicycles in postrotate, 1048371 runs,    205 skips
      After (sse3):  2644 decicycles in postrotate, 1048509 runs,     67 skips
      After (avx2):  2031 decicycles in postrotate, 1048523 runs,     53 skips
      
      10ms frames:
      Before   (c):  9153 decicycles in postrotate,  523575 runs,    713 skips
      After (sse3):  5110 decicycles in postrotate,  523726 runs,    562 skips
      After (avx2):  3738 decicycles in postrotate,  524223 runs,     65 skips
      
      20ms frames:
      Before   (c): 17857 decicycles in postrotate,  261866 runs,    278 skips
      After (sse3): 10041 decicycles in postrotate,  261746 runs,    398 skips
      After (avx2):  7050 decicycles in postrotate,  262116 runs,     28 skips
      
      Improves total decoding performance for real world content by 9% with avx2.
      Signed-off-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      70eb77b3
  2. 11 Jul, 2017 1 commit
  3. 23 Jun, 2017 1 commit
  4. 14 Feb, 2017 1 commit
  5. 05 Jan, 2017 2 commits
    • Rostislav Pehlivanov's avatar
      imdct15: replace the FFT with a faster PFA FFT algorithm · 2d208aaa
      Rostislav Pehlivanov authored
      This commit replaces the current inefficient non-power-of-two FFT with a
      much faster FFT based on the Prime Factor Algorithm.
      Although it is already much faster than the old algorithm without SIMD,
      the new algorithm makes use of the already very throughouly SIMD'd power
      of two FFT, which improves performance even more across all platforms
      which we have SIMD support for.
      
      Most of the work was done by Peter Barfuss, who passed the code to me to
      implement into the iMDCT and the current codebase. The code for a
      5-point and 15-point FFT was derived from the previous implementation,
      although it was optimized and simplified, which will make its future
      SIMD easier. The 15-point FFT is currently using 6% of the current
      overall decoder overhead.
      
      The FFT can now easily be used as a forward transform by simply not
      multiplying the 5-point FFT's imaginary component by -1 (which comes
      from the fact that changing the complex exponential's angle by -1 also
      changes the output by that) and by multiplying the "theta" angle of the
      main exptab by -1. Hence the deliberately left multiplication by -1 at
      the end.
      
      FATE passes, and performance reports on other platforms/CPUs are
      welcome.
      
      Performance comparisons:
      
      iMDCT, PFA:
      101127 decicycles in speed,   32765 runs,      3 skips
      iMDCT, Old:
      211022 decicycles in speed,   32768 runs,      0 skips
      
      Standalone FFT, 300000 transforms of size 960:
          PFA        Old FFT     kiss_fft    libfftw3f
          3.659695s, 15.726912s, 13.300789s, 1.182222s
      
      Being only 3x slower than libfftw3f is a big achievement by itself.
      
      There appears to be something capping the performance in the iMDCT side
      of things, possibly during the pre-stage reindexing. However, it is
      certainly fast enough for now.
      Signed-off-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      2d208aaa
    • Rostislav Pehlivanov's avatar
      imdct15: remove the AArch64 assembly · 4fdacf4c
      Rostislav Pehlivanov authored
      Prep work for the next commit, which will add a new FFT algorithm
      which makes the iMDCT over 3x faster than it is currently (standalone,
      the FFT is with some framesizes over 10x faster).
      
      The new FFT algorithm uses the already thouroughly SIMD'd power of two
      FFT which already has SIMD for AArch64, so users of that platform will
      still see an improvement.
      
      The previous FFT+SIMD was barely 2.5x faster than the C versions on these
      platforms.
      Signed-off-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      4fdacf4c
  6. 02 Feb, 2015 1 commit
  7. 15 May, 2014 1 commit
  8. 22 Mar, 2014 1 commit
  9. 13 Mar, 2014 1 commit
  10. 19 Apr, 2013 1 commit
  11. 13 Mar, 2013 1 commit
  12. 09 Feb, 2013 1 commit
    • Michael Niedermayer's avatar
      rnd_avg: fix author attribution · 5fd6d85d
      Michael Niedermayer authored
      Reference:
      commit 41fda91d
      Author: BERO <bero@geocities.co.jp>
      Date:   Wed May 14 17:46:55 2003 +0000
      
          aligned dsputil (for sh4) patch by (BERO <bero at geocities dot co dot jp>)
      
          Originally committed as revision 1880 to svn://svn.ffmpeg.org/ffmpeg/trunk
      
      commit 8dbe5856
      Author: Oskar Arvidsson <oskar@irock.se>
      Date:   Tue Mar 29 17:48:59 2011 +0200
      
          Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
      
          This patch lets e.g. dsputil_init chose dsp functions with respect to
          the bit depth to decode. The naming scheme of bit depth dependent
          functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
          clear_blocks_c is now named clear_blocks_8_c).
      
          Note: Some of the functions for high bit depth is not dependent on the
          bit depth, but only on the pixel size. This leaves some room for
          optimizing binary size.
      
          Preparatory patch for high bit depth h264 decoding support.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      5fd6d85d
  13. 08 Feb, 2013 1 commit
  14. 19 Mar, 2011 1 commit
  15. 10 Jul, 2010 1 commit
  16. 20 Apr, 2010 1 commit
  17. 09 Mar, 2010 1 commit
  18. 01 Feb, 2009 1 commit
  19. 21 Oct, 2008 1 commit
  20. 31 Aug, 2008 1 commit
  21. 09 May, 2008 1 commit
  22. 17 Oct, 2007 1 commit
  23. 20 May, 2007 2 commits
  24. 07 Oct, 2006 1 commit
  25. 19 Sep, 2006 1 commit
  26. 12 Jan, 2006 1 commit
  27. 26 May, 2005 2 commits
  28. 03 Mar, 2003 1 commit
  29. 11 Feb, 2003 1 commit
  30. 20 Nov, 2002 1 commit
  31. 19 Nov, 2002 2 commits
  32. 25 Oct, 2002 1 commit
  33. 06 Oct, 2002 1 commit
  34. 25 May, 2002 1 commit
  35. 13 Aug, 2001 1 commit