1. 11 Mar, 2017 15 commits
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible · 19a0f952
      Martin Storsjö authored
      The ld1r is a leftover from the arm version, where this trick is
      beneficial on some cores.
      
      Use a single-lane load where we don't need the semantics of ld1r.
      
      This is cherrypicked from libav commit
      ed8d2933.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      19a0f952
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 3006e525
      Martin Storsjö authored
      This is cherrypicked from libav commit
      4da4b2b8.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      3006e525
    • Martin Storsjö's avatar
      arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function · 1d8ab576
      Martin Storsjö authored
      This is cherrypicked from libav commit
      3933b86b.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      1d8ab576
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 · 9532a7d4
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This avoids loading and calculating coefficients that we know will
      be zero, and avoids filling the temp buffer with zeros in places
      where we know the second pass won't read.
      
      This gives a pretty substantial speedup for the smaller subpartitions.
      
      The code size increases from 14740 bytes to 24292 bytes.
      
      The idct16/32_end macros are moved above the individual functions; the
      instructions themselves are unchanged, but since new functions are added
      at the same place where the code is moved from, the diff looks rather
      messy.
      
      Before:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
      vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
      vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
      vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
      vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
      vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
      vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
      vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
      vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
      vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7
      
      After:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
      vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
      vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
      vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
      vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
      vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
      vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
      vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
      vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
      vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
      vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
      vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
      vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2
      
      This is cherrypicked from libav commit
      a63da451.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      9532a7d4
    • Martin Storsjö's avatar
      arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible · 82458955
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This avoids loading and calculating coefficients that we know will
      be zero, and avoids filling the temp buffer with zeros in places
      where we know the second pass won't read.
      
      This gives a pretty substantial speedup for the smaller subpartitions.
      
      The code size increases from 12388 bytes to 19784 bytes.
      
      The idct16/32_end macros are moved above the individual functions; the
      instructions themselves are unchanged, but since new functions are added
      at the same place where the code is moved from, the diff looks rather
      messy.
      
      Before:                              Cortex A7       A8       A9      A53
      vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
      vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
      vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
      vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
      vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
      vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
      vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
      vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
      vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
      vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
      vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
      vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7
      
      After:
      vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
      vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
      vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
      vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
      vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
      vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
      vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
      vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
      vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
      vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
      vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
      vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
      vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0
      
      This is cherrypicked from libav commit
      5eb5aec4.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      82458955
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · a681c793
      Martin Storsjö authored
      This allows reusing the macro for a separate implementation of the
      pass2 function.
      
      This is cherrypicked from libav commit
      79d332eb.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      a681c793
    • Martin Storsjö's avatar
      arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function · 3bd9b391
      Martin Storsjö authored
      This allows reusing the macro for a separate implementation of the
      pass2 function.
      
      This is cherrypicked from libav commit
      47b3c2c1.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      3bd9b391
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Make the larger core transforms standalone functions · dc47bf38
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
      19496 to 14740 bytes.
      
      This gives a small slowdown of a couple of tens of cycles, but makes
      it more feasible to add more optimized versions of these transforms.
      
      Before:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7
      
      After:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
      vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
      vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
      vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8
      
      This is cherrypicked from libav commit
      11547601.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      dc47bf38
    • Martin Storsjö's avatar
      arm: vp9itxfm: Make the larger core transforms standalone functions · f8fcee0d
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
      15324 to 12388 bytes.
      
      This gives a small slowdown of a couple tens of cycles, up to around
      150 cycles for the full case of the largest transform, but makes
      it more feasible to add more optimized versions of these transforms.
      
      Before:                              Cortex A7       A8       A9      A53
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2
      
      After:
      vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
      vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
      vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
      vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9
      
      This is cherrypicked from libav commit
      0331c3f5.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      f8fcee0d
    • Martin Storsjö's avatar
      aarch64: vp9itxfm: Restructure the idct32 store macros · 52c7366c
      Martin Storsjö authored
      This avoids concatenation, which can't be used if the whole macro
      is wrapped within another macro.
      
      This is also arguably more readable.
      
      This is cherrypicked from libav commit
      58d87e0f.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      52c7366c
    • Martin Storsjö's avatar
      arm: vp9itxfm: Avoid .irp when it doesn't save any lines · 31e41350
      Martin Storsjö authored
      This makes it more readable.
      
      This is cherrypicked from libav commit
      3bc5b28d.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      31e41350
    • Moritz Barsnick's avatar
      libavfilter/avf_showwaves: make sqrt and cbrt scale option values available to showwavespic by name · 114bbb0b
      Moritz Barsnick authored
      The 'sqrt' and 'cbrt' scalers were added in commit
      80262d8c, but their symbolic option values
      only made available to the showwaves filter, not showwavespic, despite
      the scalers working properly by their numerical option values.
      Signed-off-by: 's avatarMoritz Barsnick <barsnick@gmx.net>
      114bbb0b
    • Steven Liu's avatar
      ffprobe: add AVCodecContext help message into ffprobe · 51e35019
      Steven Liu authored
      because the ffprobe can use AVCodecContext parameters
      Signed-off-by: 's avatarSteven Liu <lq@chinaffmpeg.org>
      51e35019
    • Michael Niedermayer's avatar
      avcodec/vp56: Reset have_undamaged_frame on resolution changes · 6e913f21
      Michael Niedermayer authored
      Fixes: timeout in 758/clusterfuzz-testcase-4720832028868608
      
      Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      6e913f21
    • Michael Niedermayer's avatar
  2. 10 Mar, 2017 2 commits
  3. 09 Mar, 2017 12 commits
  4. 08 Mar, 2017 7 commits
  5. 07 Mar, 2017 4 commits