1. 07 Jan, 2016 8 commits
  2. 04 Jan, 2016 2 commits
  3. 03 Jan, 2016 2 commits
  4. 01 Jan, 2016 3 commits
  5. 31 Dec, 2015 1 commit
  6. 30 Dec, 2015 1 commit
    • Janne Grunau's avatar
      x86: use emms after ff_int32_to_float_fmul_scalar_sse · 8563f988
      Janne Grunau authored
      Intel's Instruction Set Reference (as of September 2015) clearly states
      that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
      source is a memory location. The Instruction Set Reference from 1999
      (Order Number 243191) describes this behaviour but all later versions
      I've seen have make no distinction whether MMX registers or memory is
      used as source.
      The documentation for the matching SSE2 instruction to convert to double
      (cvtpi2pd) was fixed (see the valgrind bug
      https://bugs.kde.org/show_bug.cgi?id=210264).
      
      It will take time to get a clarification and fixes in place. In the
      meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
      be correct according to the documentation. The vast majority of users
      will have SSE2 so a change to the SSE version has little effect.
      
      Fixes fate-checkasm on x86 valgrind targets.
      
      Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
      8563f988
  7. 29 Dec, 2015 2 commits
  8. 26 Dec, 2015 2 commits
  9. 24 Dec, 2015 1 commit
  10. 23 Dec, 2015 2 commits
    • Alexandra Hájková's avatar
      dca: change the core to work with integer coefficients. · aebf0707
      Alexandra Hájková authored
      The DCA core decoder converts integer coefficients read from the
      bitstream to floats just after reading them (along with dequantization).
      All the other steps of the audio reconstruction are done with floats
      which makes the output for the DTS lossless extension (XLL)
      actually lossy.
      This patch changes the DCA core to work with integer coefficients
      until QMF. At this point the integer coefficients are converted to floats.
      The coefficients for the LFE channel (lfe_data) are not touched.
      This is the first step for the really lossless XLL decoding.
      aebf0707
    • Alexandra Hájková's avatar
      dca: Add math helpers. · 85990140
      Alexandra Hájková authored
      They will be used by the integer core decoder.
      85990140
  11. 21 Dec, 2015 6 commits
  12. 16 Dec, 2015 2 commits
  13. 14 Dec, 2015 8 commits
    • Janne Grunau's avatar
      arm: add ff_int32_to_float_fmul_array8_neon · 90b1b935
      Janne Grunau authored
      Quite a bit faster than int32_to_float_fmul_array8_c calling
      ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
      Number of cycles per int32_to_float_fmul_array8 call while decoding
      padded.dts on exynos5422:
      
                     before  after   change
      cortex-a7:     1270     951    -25%
      cortex-a15:     434     285    -34%
      
      checkasm --bench cycle counts:     cortex-a15   cortex-a7
      int32_to_float_fmul_array8_c:      1730.4       4384.5
      int32_to_float_fmul_array8_neon_c:  571.5       1694.3
      int32_to_float_fmul_array8_neon:    374.0       1448.8
      
      Interesting are the differences between
      int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
      The former is current behaviour of calling
      ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
      The raw numbers differ since checkasm uses different lengths than the
      dca decoder.
      90b1b935
    • Janne Grunau's avatar
      arm64: int32_to_float_fmul neon asm · a0fc780a
      Janne Grunau authored
      3% faster dts decoding on a cortex-a57.
      
                                       cortex-a57   cortex-a53
      int32_to_float_fmul_array8_c:    1270.9       4475.6
      int32_to_float_fmul_array8_neon:  328.6        569.2
      int32_to_float_fmul_scalar_c:     928.5       4119.6
      int32_to_float_fmul_scalar_neon:  309.1        524.1
      a0fc780a
    • Janne Grunau's avatar
      arm64: port synth_filter_float_neon from arm · 705f5e5e
      Janne Grunau authored
      ~25% faster dts decoding overall. The checkasm CPU cycles numbers are
      not that useful since synth_filter_float() calls FFTContext.imdct_half().
      
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1866.2       3490.9
      synth_filter_float_neon:  915.0       1531.5
      
      With fftc.imdct_half forced to imdct_half_neon:
                               cortex-a57   cortex-a53
      synth_filter_float_c:    1718.4       3025.3
      synth_filter_float_neon:  926.2       1530.1
      705f5e5e
    • Janne Grunau's avatar
      arm64: convert dcadsp neon asm from arm · c33c1fa8
      Janne Grunau authored
      ~2% faster dts decoding overall.
      
                          cortex-a57   cortex-a53
      dca_decode_hf_c:    474.8        1659.9
      dca_decode_hf_neon: 225.2         301.1
      dca_lfe_fir0_c:     913.2        1537.7
      dca_lfe_fir0_neon:  286.8         451.9
      dca_lfe_fir1_c:     848.7        1711.5
      dca_lfe_fir1_neon:  387.1         506.4
      c33c1fa8
    • Janne Grunau's avatar
      arm: add a cpu flag for the VFPv2 vector mode · e2710e79
      Janne Grunau authored
      The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
      implementations do not support it in hardware. Vector mode code will
      depending the OS either be emulated in software or result in an illegal
      instruction on cpus which does not support it. This was not really
      problem in practice since NEON implementations of the same functions are
      preferred. It will however become a problem for checkasm which tests
      every cpu flag separately.
      
      Since this is a cpu feature newer cpu do not support anymore the
      behaviour of this flag differs from the other flags. It can be only
      activated by runtime cpu feature selection.
      e2710e79
    • Janne Grunau's avatar
    • Janne Grunau's avatar
      arm64: add cycle counter support · 64034849
      Janne Grunau authored
      The ISB (instruction synchronization barrier) might be too heavy for
      START/STOPTIMER use but should be more accurate in checkasm where the
      timing overhead is subtracted.
      64034849
    • Janne Grunau's avatar
      libavutil: move FFALIGN macro from common.h to macros.h · 50078c1c
      Janne Grunau authored
      Include macros.h explicitly in common.h so that external code using
      FFALIGN does not break. It was already implicitly included through
      version.h. Include macros.h in lls.h and internal.h for FFALIGN.
      lls.h was including common.h only for FFALIGN and internal.h was
      missing the include for FFALIGN. `make checkheaders` did not catch it
      because it's an internal header.
      50078c1c