1. 25 Sep, 2014 1 commit
  2. 23 Sep, 2014 1 commit
  3. 02 Sep, 2014 1 commit
  4. 15 Aug, 2014 2 commits
  5. 12 Aug, 2014 1 commit
  6. 06 Aug, 2014 1 commit
  7. 04 Aug, 2014 2 commits
  8. 27 Jul, 2014 1 commit
  9. 25 Jul, 2014 1 commit
  10. 21 Jul, 2014 2 commits
  11. 20 Jul, 2014 1 commit
  12. 18 Jul, 2014 4 commits
  13. 17 Jul, 2014 3 commits
    • Ben Avison's avatar
      armv6: Accelerate ff_fft_calc for general case (nbits != 4) · 87552d54
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires nbits == 4 (fft16()). This case was (and still is) linked directly
      rather than being indirected through ff_fft_calc_vfp(), but now the full
      range from radix-4 up to radix-65536 is available. This benefits other codecs
      such as AAC and AC3.
      
      The implementaion is based upon the C version, with each routine larger than
      radix-16 calling a hierarchy of smaller FFT functions, then performing a
      post-processing pass. This pass benefits a lot from loop unrolling to
      counter the long pipelines in the VFP. A relaxed calling standard also
      reduces the overhead of the call hierarchy, and avoiding the excessive
      inlining performed by GCC probably helps with I-cache utilisation too.
      
      I benchmarked the result by measuring the number of gperftools samples that
      hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
      specifically in the FFT routines (fft4() to fft512() and pass()) for the
      same sample AAC stream:
      
                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
      Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
      FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      87552d54
    • Ben Avison's avatar
      armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) · 5c22e8e4
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires mdct_bits == 6. This relatively small size lent itself to
      unrolling the loops a small number of times, and encoding offsets
      calculated at assembly time within the load/store instructions of each
      iteration.
      
      In the more general case (codecs such as AAC and AC3) much larger arrays
      are used - mdct_bits == [8, 9, 11]. The old method does not scale for
      these cases, so more integer registers are used with non-unrolled versions
      of the loops (and with some stack spillage). The postrotation filter loop
      is still unrolled by a factor of 2 to permit the double-buffering of some
      VFP registers to facilitate overlap of neighbouring iterations.
      
      I benchmarked the result by measuring the number of gperftools samples
      that hit anywhere in the AAC decoder (starting from aac_decode_frame())
      or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
      example AAC stream:
      
                        Before          After
                        Mean   StdDev   Mean   StdDev  Confidence  Change
      aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
      ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5c22e8e4
    • Diego Biurrun's avatar
  14. 16 Jul, 2014 1 commit
  15. 13 Jul, 2014 1 commit
    • Ben Avison's avatar
      armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) · 42c1cc35
      Ben Avison authored
      The previous implementation targeted DTS Coherent Acoustics, which only
      requires mdct_bits == 6. This relatively small size lent itself to
      unrolling the loops a small number of times, and encoding offsets
      calculated at assembly time within the load/store instructions of each
      iteration.
      
      In the more general case (codecs such as AAC and AC3) much larger arrays
      are used - mdct_bits == [8, 9, 11]. The old method does not scale for
      these cases, so more integer registers are used with non-unrolled versions
      of the loops (and with some stack spillage). The postrotation filter loop
      is still unrolled by a factor of 2 to permit the double-buffering of some
      VFP registers to facilitate overlap of neighbouring iterations.
      
      I benchmarked the result by measuring the number of gperftools samples
      that hit anywhere in the AAC decoder (starting from aac_decode_frame())
      or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
      example AAC stream:
      
                        Before          After
                        Mean   StdDev   Mean   StdDev  Confidence  Change
      aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
      ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      42c1cc35
  16. 11 Jul, 2014 1 commit
  17. 09 Jul, 2014 1 commit
  18. 08 Jul, 2014 1 commit
  19. 06 Jul, 2014 1 commit
  20. 30 Jun, 2014 1 commit
  21. 23 Jun, 2014 1 commit
  22. 22 Jun, 2014 1 commit
  23. 19 Jun, 2014 1 commit
  24. 18 Jun, 2014 1 commit
  25. 05 Jun, 2014 1 commit
  26. 03 Jun, 2014 1 commit
    • Janne Grunau's avatar
      arm: check if AS supports .dn · 896a5bff
      Janne Grunau authored
      Move the GNU as check before the arch specific asm checks since the .dn
      check requires gas compatible assembler.
      
      Disable the VC-1 motion compensation NEON asm which is the only part
      using that directive. The integrated assembler in the upcoming clang 3.5
      does not support .dn/.qn without plans to change that. Too much effort
      to implement it while it is rarely used.
      
      http://llvm.org/bugs/show_bug.cgi?id=18199.
      896a5bff
  27. 29 May, 2014 1 commit
  28. 29 Apr, 2014 1 commit
  29. 25 Apr, 2014 2 commits
  30. 24 Apr, 2014 1 commit
  31. 20 Apr, 2014 1 commit