1. 29 Jan, 2017 2 commits
  2. 28 Jan, 2017 6 commits
  3. 27 Jan, 2017 7 commits
  4. 26 Jan, 2017 4 commits
  5. 25 Jan, 2017 7 commits
  6. 24 Jan, 2017 14 commits
    • Michael Niedermayer's avatar
      avcodec/utils: correct align value for interplay · 2080bc33
      Michael Niedermayer authored
      Fixes out of array access
      Fixes: 452/fuzz-1-ffmpeg_VIDEO_AV_CODEC_ID_INTERPLAY_VIDEO_fuzzer
      
      Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      2080bc33
    • Carl Eugen Hoyos's avatar
      Cosmetics: Reindent after last commit. · 0b607228
      Carl Eugen Hoyos authored
      0b607228
    • Carl Eugen Hoyos's avatar
    • Carl Eugen Hoyos's avatar
      lavf/rtmpproto: Make bytes_read variables 64bit. · 75bd4ea0
      Carl Eugen Hoyos authored
      When bytes_read overflowed, last_bytes_read did not yet overflow
      and no bytes-read report was created leading to a timeout.
      
      Analyzed-by: Thomas Bernhard
      
      Fixes ticket #5836.
      75bd4ea0
    • Marton Balint's avatar
      avfilter/formats: do not allow unknown layouts in ff_parse_channel_layout if nret is not set · 977fd884
      Marton Balint authored
      Current code returned the number of channels as channel layout in that case,
      and if nret is not set then unknown layouts are typically not supported.
      
      Also use the common parsing code. Use a temporary workaround to parse an
      unknown channel layout such as '13c', after a 1 year grace period only '13C'
      will work.
      Signed-off-by: 's avatarMarton Balint <cus@passwd.hu>
      977fd884
    • Marton Balint's avatar
      avutil/channel_layout: add av_get_extended_channel_layout · c4618f84
      Marton Balint authored
      Return a channel layout and the number of channels based on the specified name.
      
      This function is similar to av_get_channel_layout(), but can also parse unknown
      channel layout specifications.
      
      Unknown channel layout specifications are a decimal number and a capital 'C'
      suffix, in order to not break compatibility with the lowercase 'c' suffix,
      which is used for a guessed channel layout with the specified number of
      channels.
      Signed-off-by: 's avatarMarton Balint <cus@passwd.hu>
      c4618f84
    • Marton Balint's avatar
    • Carl Eugen Hoyos's avatar
      lavc/svq3: Fail for media key encryption. · 6d6faa2a
      Carl Eugen Hoyos authored
      Tested-by: ami_stuff
      
      Fixes a part of ticket #6094.
      6d6faa2a
    • Michael Niedermayer's avatar
      avcodec/vp56: Check for the bitstream end, pass error codes on · 9e6a2427
      Michael Niedermayer authored
      Fixes timeout
      Fixes: 446/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_VP6_fuzzer
      
      Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      9e6a2427
    • Martin Storsjö's avatar
      aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter · 9f10cff6
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This is similar to the arm version, but due to the larger registers
      on aarch64, we can do 8 pixels at a time for all filter sizes.
      
      Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                                   ARM AArch64
      vp9_loop_filter_h_4_8_10bpp_neon:          213.2   172.6
      vp9_loop_filter_h_8_8_10bpp_neon:          281.2   244.2
      vp9_loop_filter_h_16_8_10bpp_neon:         657.0   444.5
      vp9_loop_filter_h_16_16_10bpp_neon:       1280.4   877.7
      vp9_loop_filter_mix2_h_44_16_10bpp_neon:   397.7   358.0
      vp9_loop_filter_mix2_h_48_16_10bpp_neon:   465.7   429.0
      vp9_loop_filter_mix2_h_84_16_10bpp_neon:   465.7   428.0
      vp9_loop_filter_mix2_h_88_16_10bpp_neon:   533.7   499.0
      vp9_loop_filter_mix2_v_44_16_10bpp_neon:   271.5   244.0
      vp9_loop_filter_mix2_v_48_16_10bpp_neon:   330.0   305.0
      vp9_loop_filter_mix2_v_84_16_10bpp_neon:   329.0   306.0
      vp9_loop_filter_mix2_v_88_16_10bpp_neon:   386.0   365.0
      vp9_loop_filter_v_4_8_10bpp_neon:          150.0   115.2
      vp9_loop_filter_v_8_8_10bpp_neon:          209.0   175.5
      vp9_loop_filter_v_16_8_10bpp_neon:         492.7   345.2
      vp9_loop_filter_v_16_16_10bpp_neon:        951.0   682.7
      
      This is significantly faster than the ARM version in almost
      all cases except for the mix2 functions.
      
      Based on START_TIMER/STOP_TIMER wrapping around a few individual
      functions, the speedup vs C code is around 2-3x.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      9f10cff6
    • Martin Storsjö's avatar
      aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm · ceb36b81
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      Compared to the arm version, on aarch64 we can keep the full 8x8
      transform in registers, and for 16x16 and 32x32, we can process
      it in slices of 4 pixels instead of 2.
      
      Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                                      ARM  AArch64
      vp9_inv_adst_adst_4x4_sub4_add_10_neon:       111.0    109.7
      vp9_inv_adst_adst_8x8_sub8_add_10_neon:       914.0    733.5
      vp9_inv_adst_adst_16x16_sub16_add_10_neon:   5184.0   3745.7
      vp9_inv_dct_dct_4x4_sub1_add_10_neon:          65.0     65.7
      vp9_inv_dct_dct_4x4_sub4_add_10_neon:         100.0     96.7
      vp9_inv_dct_dct_8x8_sub1_add_10_neon:         111.0    119.7
      vp9_inv_dct_dct_8x8_sub8_add_10_neon:         618.0    494.7
      vp9_inv_dct_dct_16x16_sub1_add_10_neon:       295.1    284.6
      vp9_inv_dct_dct_16x16_sub2_add_10_neon:      2303.2   1883.9
      vp9_inv_dct_dct_16x16_sub8_add_10_neon:      2984.8   2189.3
      vp9_inv_dct_dct_16x16_sub16_add_10_neon:     3890.0   2799.4
      vp9_inv_dct_dct_32x32_sub1_add_10_neon:      1044.4   1012.7
      vp9_inv_dct_dct_32x32_sub2_add_10_neon:     13333.7   9695.1
      vp9_inv_dct_dct_32x32_sub16_add_10_neon:    18531.3  12459.8
      vp9_inv_dct_dct_32x32_sub32_add_10_neon:    24470.7  16160.2
      vp9_inv_wht_wht_4x4_sub4_add_10_neon:          83.0     79.7
      
      The larger transforms are significantly faster than the corresponding
      ARM versions.
      
      The speedup vs C code is smaller than in 32 bit mode, probably
      because the 64 bit intermediates in the C code can be expressed
      more efficiently in aarch64.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      ceb36b81
    • Martin Storsjö's avatar
      aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC · 638eceed
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This has mostly got the same differences to the 8 bit version as
      in the arm version. For the horizontal filters, we do 16 pixels
      in parallel as well. For the 8 pixel wide vertical filters, we can
      accumulate 4 rows before storing, just as in the 8 bit version.
      
      Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                                 ARM   AArch64
      vp9_avg4_10bpp_neon:                      35.7      30.7
      vp9_avg8_10bpp_neon:                      93.5      84.7
      vp9_avg16_10bpp_neon:                    324.4     296.6
      vp9_avg32_10bpp_neon:                   1236.5    1148.2
      vp9_avg64_10bpp_neon:                   4639.6    4571.1
      vp9_avg_8tap_smooth_4h_10bpp_neon:       130.0     128.0
      vp9_avg_8tap_smooth_4hv_10bpp_neon:      440.0     440.5
      vp9_avg_8tap_smooth_4v_10bpp_neon:       114.0     105.5
      vp9_avg_8tap_smooth_8h_10bpp_neon:       327.0     314.0
      vp9_avg_8tap_smooth_8hv_10bpp_neon:      918.7     865.4
      vp9_avg_8tap_smooth_8v_10bpp_neon:       330.0     300.2
      vp9_avg_8tap_smooth_16h_10bpp_neon:     1187.5    1155.5
      vp9_avg_8tap_smooth_16hv_10bpp_neon:    2663.1    2591.0
      vp9_avg_8tap_smooth_16v_10bpp_neon:     1107.4    1078.3
      vp9_avg_8tap_smooth_64h_10bpp_neon:    17754.6   17454.7
      vp9_avg_8tap_smooth_64hv_10bpp_neon:   33285.2   33001.5
      vp9_avg_8tap_smooth_64v_10bpp_neon:    16066.9   16048.6
      vp9_put4_10bpp_neon:                      25.5      21.7
      vp9_put8_10bpp_neon:                      56.0      52.0
      vp9_put16_10bpp_neon/armv8:              183.0     163.1
      vp9_put32_10bpp_neon/armv8:              678.6     563.1
      vp9_put64_10bpp_neon/armv8:             2679.9    2195.8
      vp9_put_8tap_smooth_4h_10bpp_neon:       120.0     118.0
      vp9_put_8tap_smooth_4hv_10bpp_neon:      435.2     435.0
      vp9_put_8tap_smooth_4v_10bpp_neon:       107.0      98.2
      vp9_put_8tap_smooth_8h_10bpp_neon:       303.0     290.0
      vp9_put_8tap_smooth_8hv_10bpp_neon:      893.7     828.7
      vp9_put_8tap_smooth_8v_10bpp_neon:       305.5     263.5
      vp9_put_8tap_smooth_16h_10bpp_neon:     1089.1    1059.2
      vp9_put_8tap_smooth_16hv_10bpp_neon:    2578.8    2452.4
      vp9_put_8tap_smooth_16v_10bpp_neon:     1009.5     933.5
      vp9_put_8tap_smooth_64h_10bpp_neon:    16223.4   15918.6
      vp9_put_8tap_smooth_64hv_10bpp_neon:   32153.0   31016.2
      vp9_put_8tap_smooth_64v_10bpp_neon:    14516.5   13748.1
      
      These are generally about as fast as the corresponding ARM
      routines on the same CPU (at least on the A53), in most cases
      marginally faster.
      
      The speedup vs C code is around 4-9x.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      638eceed
    • Martin Storsjö's avatar
      aarch64: vp9dsp: Restructure the bpp checks · 48ad3fe1
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This is more in line with how it will be extended for more bitdepths.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      48ad3fe1
    • Martin Storsjö's avatar
      arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter · 1e5d87ee
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This is pretty much similar to the 8 bpp version, but in some senses
      simpler. All input pixels are 16 bits, and all intermediates also fit
      in 16 bits, so there's no lengthening/narrowing in the filter at all.
      
      For the full 16 pixel wide filter, we can only process 4 pixels at a time
      (using an implementation very much similar to the one for 8 bpp),
      but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with
      a different implementation of the core filter.
      
      Examples of relative speedup compared to the C version, from checkasm:
                                         Cortex    A7     A8     A9    A53
      vp9_loop_filter_h_4_8_10bpp_neon:          1.83   2.16   1.40   2.09
      vp9_loop_filter_h_8_8_10bpp_neon:          1.39   1.67   1.24   1.70
      vp9_loop_filter_h_16_8_10bpp_neon:         1.56   1.47   1.10   1.81
      vp9_loop_filter_h_16_16_10bpp_neon:        1.94   1.69   1.33   2.24
      vp9_loop_filter_mix2_h_44_16_10bpp_neon:   2.01   2.27   1.67   2.39
      vp9_loop_filter_mix2_h_48_16_10bpp_neon:   1.84   2.06   1.45   2.19
      vp9_loop_filter_mix2_h_84_16_10bpp_neon:   1.89   2.20   1.47   2.29
      vp9_loop_filter_mix2_h_88_16_10bpp_neon:   1.69   2.12   1.47   2.08
      vp9_loop_filter_mix2_v_44_16_10bpp_neon:   3.16   3.98   2.50   4.05
      vp9_loop_filter_mix2_v_48_16_10bpp_neon:   2.84   3.64   2.25   3.77
      vp9_loop_filter_mix2_v_84_16_10bpp_neon:   2.65   3.45   2.16   3.54
      vp9_loop_filter_mix2_v_88_16_10bpp_neon:   2.55   3.30   2.16   3.55
      vp9_loop_filter_v_4_8_10bpp_neon:          2.85   3.97   2.24   3.68
      vp9_loop_filter_v_8_8_10bpp_neon:          2.27   3.19   1.96   3.08
      vp9_loop_filter_v_16_8_10bpp_neon:         3.42   2.74   2.26   4.40
      vp9_loop_filter_v_16_16_10bpp_neon:        2.86   2.44   1.93   3.88
      
      The speedup vs C code measured in checkasm is around 1.1-4x.
      These numbers are quite inconclusive though, since the checkasm test
      runs multiple filterings on top of each other, so later rounds might
      end up with different codepaths (different decisions on which filter
      to apply, based on input pixel differences).
      
      Based on START_TIMER/STOP_TIMER wrapping around a few individual
      functions, the speedup vs C code is around 2-4x.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      1e5d87ee