1. 09 Dec, 2014 1 commit
  2. 08 Dec, 2014 2 commits
  3. 17 Jul, 2014 1 commit
    • Ben Avison's avatar
      armv6: Accelerate ff_fft_calc for general case (nbits != 4) · 87552d54
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires nbits == 4 (fft16()). This case was (and still is) linked directly
      rather than being indirected through ff_fft_calc_vfp(), but now the full
      range from radix-4 up to radix-65536 is available. This benefits other codecs
      such as AAC and AC3.
      
      The implementaion is based upon the C version, with each routine larger than
      radix-16 calling a hierarchy of smaller FFT functions, then performing a
      post-processing pass. This pass benefits a lot from loop unrolling to
      counter the long pipelines in the VFP. A relaxed calling standard also
      reduces the overhead of the call hierarchy, and avoiding the excessive
      inlining performed by GCC probably helps with I-cache utilisation too.
      
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in the FFT routines (fft4() to fft512() and pass()) for the
      same sample AAC stream:
      
                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
      FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      87552d54
  4. 22 Jul, 2013 1 commit