1. 11 Jan, 2016 4 commits
  2. 08 Jan, 2016 1 commit
  3. 07 Jan, 2016 8 commits
  4. 04 Jan, 2016 2 commits
  5. 03 Jan, 2016 2 commits
  6. 01 Jan, 2016 3 commits
  7. 31 Dec, 2015 1 commit
  8. 30 Dec, 2015 1 commit
    • Janne Grunau's avatar
      x86: use emms after ff_int32_to_float_fmul_scalar_sse · 8563f988
      Janne Grunau authored
      Intel's Instruction Set Reference (as of September 2015) clearly states
      that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
      source is a memory location. The Instruction Set Reference from 1999
      (Order Number 243191) describes this behaviour but all later versions
      I've seen have make no distinction whether MMX registers or memory is
      used as source.
      The documentation for the matching SSE2 instruction to convert to double
      (cvtpi2pd) was fixed (see the valgrind bug
      https://bugs.kde.org/show_bug.cgi?id=210264).
      
      It will take time to get a clarification and fixes in place. In the
      meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
      be correct according to the documentation. The vast majority of users
      will have SSE2 so a change to the SSE version has little effect.
      
      Fixes fate-checkasm on x86 valgrind targets.
      
      Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
      8563f988
  9. 29 Dec, 2015 2 commits
  10. 26 Dec, 2015 2 commits
  11. 24 Dec, 2015 1 commit
  12. 23 Dec, 2015 2 commits
    • Alexandra Hájková's avatar
      dca: change the core to work with integer coefficients. · aebf0707
      Alexandra Hájková authored
      The DCA core decoder converts integer coefficients read from the
      bitstream to floats just after reading them (along with dequantization).
      All the other steps of the audio reconstruction are done with floats
      which makes the output for the DTS lossless extension (XLL)
      actually lossy.
      This patch changes the DCA core to work with integer coefficients
      until QMF. At this point the integer coefficients are converted to floats.
      The coefficients for the LFE channel (lfe_data) are not touched.
      This is the first step for the really lossless XLL decoding.
      aebf0707
    • Alexandra Hájková's avatar
      dca: Add math helpers. · 85990140
      Alexandra Hájková authored
      They will be used by the integer core decoder.
      85990140
  13. 21 Dec, 2015 6 commits
  14. 16 Dec, 2015 2 commits
  15. 14 Dec, 2015 3 commits
    • Janne Grunau's avatar
      arm: add ff_int32_to_float_fmul_array8_neon · 90b1b935
      Janne Grunau authored
      Quite a bit faster than int32_to_float_fmul_array8_c calling
      ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
      Number of cycles per int32_to_float_fmul_array8 call while decoding
      padded.dts on exynos5422:
      
                     before  after   change
      cortex-a7:     1270     951    -25%
      cortex-a15:     434     285    -34%
      
      checkasm --bench cycle counts:     cortex-a15   cortex-a7
      int32_to_float_fmul_array8_c:      1730.4       4384.5
      int32_to_float_fmul_array8_neon_c:  571.5       1694.3
      int32_to_float_fmul_array8_neon:    374.0       1448.8
      
      Interesting are the differences between
      int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
      The former is current behaviour of calling
      ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
      The raw numbers differ since checkasm uses different lengths than the
      dca decoder.
      90b1b935
    • Janne Grunau's avatar
      arm64: int32_to_float_fmul neon asm · a0fc780a
      Janne Grunau authored
      3% faster dts decoding on a cortex-a57.
      
                                       cortex-a57   cortex-a53
      int32_to_float_fmul_array8_c:    1270.9       4475.6
      int32_to_float_fmul_array8_neon:  328.6        569.2
      int32_to_float_fmul_scalar_c:     928.5       4119.6
      int32_to_float_fmul_scalar_neon:  309.1        524.1
      a0fc780a
    • Janne Grunau's avatar
      arm64: port synth_filter_float_neon from arm · 705f5e5e
      Janne Grunau authored
      ~25% faster dts decoding overall. The checkasm CPU cycles numbers are
      not that useful since synth_filter_float() calls FFTContext.imdct_half().
      
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1866.2       3490.9
      synth_filter_float_neon:  915.0       1531.5
      
      With fftc.imdct_half forced to imdct_half_neon:
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1718.4       3025.3
      synth_filter_float_neon:  926.2       1530.1
      705f5e5e