1. 03 Jul, 2016 1 commit
  2. 25 Jun, 2016 1 commit
  3. 11 May, 2016 1 commit
  4. 04 May, 2016 1 commit
  5. 07 Apr, 2016 1 commit
    • Diego Biurrun's avatar
      build: miscellaneous cosmetics · 01621202
      Diego Biurrun authored
      Restore alphabetical order in lists, break overly long lines, do some
      prettyprinting, add some explanatory section comments, group parts
      together that belong together logically.
      01621202
  6. 26 Mar, 2016 1 commit
    • Martin Storsjö's avatar
      aarch64: Make transpose_4x4H do a regular transpose · cdb1665f
      Martin Storsjö authored
      Previously, ff_h264_idct_add_neon (originally in the arm version) used
      a non-regular transpose in order to be able to use more instructions
      that deal with registers as 128 bit register pairs. The aarch64
      translation doesn't do it to the same extent, but brought along the
      same structure since it was a straight translation.
      
      This reshuffles ff_h264_idct_add_neon, bringing it closer to
      the C implementation, making the transpose_4x4H macro do a regular
      transpose, usable for other algorithms as well.
      
      Previously, the third and fourth output from transpose_4x4H were
      swapped, and prior to cc29d96d, the same inputs as well. In
      addition to just swapping the outputs, also renumber the intermediate
      registers for better readability (making the register order match
      transpose_4x8B).
      
      This runs with the same number of cycles as before.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      cdb1665f
  7. 01 Mar, 2016 1 commit
  8. 26 Feb, 2016 1 commit
  9. 31 Jan, 2016 2 commits
  10. 25 Jan, 2016 1 commit
  11. 24 Dec, 2015 1 commit
  12. 21 Dec, 2015 1 commit
  13. 19 Dec, 2015 1 commit
  14. 17 Dec, 2015 1 commit
  15. 14 Dec, 2015 3 commits
    • Janne Grunau's avatar
      arm64: int32_to_float_fmul neon asm · a0fc780a
      Janne Grunau authored
      3% faster dts decoding on a cortex-a57.
      
                                       cortex-a57   cortex-a53
      int32_to_float_fmul_array8_c:    1270.9       4475.6
      int32_to_float_fmul_array8_neon:  328.6        569.2
      int32_to_float_fmul_scalar_c:     928.5       4119.6
      int32_to_float_fmul_scalar_neon:  309.1        524.1
      a0fc780a
    • Janne Grunau's avatar
      arm64: port synth_filter_float_neon from arm · 705f5e5e
      Janne Grunau authored
      ~25% faster dts decoding overall. The checkasm CPU cycles numbers are
      not that useful since synth_filter_float() calls FFTContext.imdct_half().
      
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1866.2       3490.9
      synth_filter_float_neon:  915.0       1531.5
      
      With fftc.imdct_half forced to imdct_half_neon:
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1718.4       3025.3
      synth_filter_float_neon:  926.2       1530.1
      705f5e5e
    • Janne Grunau's avatar
      arm64: convert dcadsp neon asm from arm · c33c1fa8
      Janne Grunau authored
      ~2% faster dts decoding overall.
      
                          cortex-a57   cortex-a53
      dca_decode_hf_c:    474.8        1659.9
      dca_decode_hf_neon: 225.2         301.1
      dca_lfe_fir0_c:     913.2        1537.7
      dca_lfe_fir0_neon:  286.8         451.9
      dca_lfe_fir1_c:     848.7        1711.5
      dca_lfe_fir1_neon:  387.1         506.4
      c33c1fa8
  16. 12 Dec, 2015 1 commit
  17. 20 Jul, 2015 1 commit
  18. 24 Jun, 2015 1 commit
  19. 02 Feb, 2015 1 commit
  20. 31 Jan, 2015 1 commit
  21. 09 Dec, 2014 1 commit
  22. 15 Nov, 2014 1 commit
  23. 30 Aug, 2014 1 commit
  24. 03 Aug, 2014 1 commit
  25. 23 Jun, 2014 1 commit
  26. 15 May, 2014 1 commit
  27. 13 May, 2014 1 commit
  28. 22 Apr, 2014 4 commits
  29. 06 Apr, 2014 1 commit
  30. 20 Mar, 2014 1 commit
  31. 08 Mar, 2014 1 commit
    • Janne Grunau's avatar
      aarch64: get_cabac inline asm · dfe224f3
      Janne Grunau authored
      Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets
      approximately 20% faster (no cycle counts available) compared to clang
      from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall
      speedup might be explained by additional inlining of get_cabac().
      dfe224f3
  32. 20 Feb, 2014 1 commit
  33. 15 Jan, 2014 2 commits