1. 10 Nov, 2016 8 commits
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm: vp9mc: Minor adjustments from review of the aarch64 version · 557c1675
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      The speedup for the large horizontal filters is surprisingly
      big on A7 and A53, while there's a minor slowdown (almost within
      measurement noise) on A8 and A9.
      
                                  Cortex    A7        A8        A9       A53
      orig:
      vp9_put_8tap_smooth_64h_neon:    20270.0   14447.3   19723.9   10910.9
      new:
      vp9_put_8tap_smooth_64h_neon:    20165.8   14466.5   19730.2   10668.8
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      557c1675
    • Martin Storsjö's avatar
      aarch64: vp9: Add NEON optimizations of VP9 MC functions · 383d96aa
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      These are ported from the ARM version; it is essentially a 1:1
      port with no extra added features, but with some hand tuning
      (especially for the plain copy/avg functions). The ARM version
      isn't very register starved to begin with, so there's not much
      to be gained from having more spare registers here - we only
      avoid having to clobber callee-saved registers.
      
      Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                           ARM   AArch64
      vp9_avg4_neon:                      27.2      23.7
      vp9_avg8_neon:                      56.5      54.7
      vp9_avg16_neon:                    169.9     167.4
      vp9_avg32_neon:                    585.8     585.2
      vp9_avg64_neon:                   2460.3    2294.7
      vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
      vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
      vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
      vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
      vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
      vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
      vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
      vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
      vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
      vp9_put4_neon:                      18.0      17.2
      vp9_put8_neon:                      40.2      37.7
      vp9_put16_neon:                     97.4      99.5
      vp9_put32_neon/armv8:              346.0     307.4
      vp9_put64_neon/armv8:             1319.0    1107.5
      vp9_put_8tap_smooth_4h_neon:       126.7     118.2
      vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
      vp9_put_8tap_smooth_4v_neon:       113.0      86.5
      vp9_put_8tap_smooth_8h_neon:       229.7     221.6
      vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
      vp9_put_8tap_smooth_8v_neon:       215.0     187.5
      vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
      vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
      vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4
      
      These are generally about as fast as the corresponding ARM
      routines on the same CPU (at least on the A53), in most cases
      marginally faster.
      
      The speedup vs C code is pretty much the same as for the 32 bit
      case; on the A53 it's around 6-13x for ther larger 8tap filters.
      The exact speedup varies a little, since the C versions generally
      don't end up exactly as slow/fast as on 32 bit.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      383d96aa
    • Martin Storsjö's avatar
      aarch64: Add an offset parameter to the movrel macro · c44a8a3e
      Martin Storsjö authored
      With apple tools, the linker fails with errors like these, if the
      offset is negative:
      
      ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      c44a8a3e
    • Martin Storsjö's avatar
      vp9: Make the subpel filters non-static · a4cfcddc
      Martin Storsjö authored
      Make them aligned, to allow efficient access to them from simd.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      a4cfcddc
    • James Almer's avatar
      matroskaenc: write updated STREAMINFO metadata for FLAC streams if available · 98cae966
      James Almer authored
      FLAC streams originating from the FLAC encoder send updated and more
      complete STREAMINFO metadata as part of the last packet, so write that
      to CodecPrivate instead of the incomplete one available in extradata
      during init.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      Signed-off-by: 's avatarAnton Khirnov <anton@khirnov.net>
      98cae966
    • James Almer's avatar
      matroskaenc: fix muxing AAC streams when using aac_adtstoasc bsf · f4bf2363
      James Almer authored
      aac_adtstoasc makes the aac extradata available only after the first packet
      is filtered, and as packet side data.
      
      Assume extradata will be available as part of the first packet if
      avpriv_mpeg4audio_get_config() fails the first time due to missing extradata
      and reserve space for the OutputSampleRate element in the Tracks master.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      Signed-off-by: 's avatarAnton Khirnov <anton@khirnov.net>
      f4bf2363
    • Anton Khirnov's avatar
  2. 09 Nov, 2016 9 commits
  3. 08 Nov, 2016 13 commits
  4. 07 Nov, 2016 10 commits