- 11 Mar, 2017 20 commits
-
-
Martin Storsjö authored
Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 This is cherrypicked from libav commit fea92a4b. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. This is cherrypicked from libav commit 5e0c2158. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This is cherrypicked from libav commit 0c0b87f1. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This is cherrypicked from libav commit 8476eb0d. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This is cherrypicked from libav commit 3dd78272. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. This is cherrypicked from libav commit ed8d2933. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This is cherrypicked from libav commit 4da4b2b8. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This is cherrypicked from libav commit 3933b86b. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 This is cherrypicked from libav commit a63da451. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 This is cherrypicked from libav commit 5eb5aec4. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This allows reusing the macro for a separate implementation of the pass2 function. This is cherrypicked from libav commit 79d332eb. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This allows reusing the macro for a separate implementation of the pass2 function. This is cherrypicked from libav commit 47b3c2c1. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 This is cherrypicked from libav commit 11547601. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 This is cherrypicked from libav commit 0331c3f5. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. This is cherrypicked from libav commit 58d87e0f. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This makes it more readable. This is cherrypicked from libav commit 3bc5b28d. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Moritz Barsnick authored
The 'sqrt' and 'cbrt' scalers were added in commit 80262d8c, but their symbolic option values only made available to the showwaves filter, not showwavespic, despite the scalers working properly by their numerical option values. Signed-off-by: Moritz Barsnick <barsnick@gmx.net>
-
Steven Liu authored
because the ffprobe can use AVCodecContext parameters Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
-
Michael Niedermayer authored
Fixes: timeout in 758/clusterfuzz-testcase-4720832028868608 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 10 Mar, 2017 2 commits
-
-
Aaron Boxer authored
1. limit to single layer, as there is no current support for setting distortion/quality of multiple layers 2. encoder mode should be kept at default setting (0) 3. remove fixed_alloc parameter from context : seldom if ever used, and no way of properly configuring at the moment 4. add irreversible setting, to allow for lossless encoding. Set to OpenJPEG default (enabled) 5. set numresolution max to 33, which is the maximum number of allowed resolutions according the J2K spec Signed-off-by: Michael Bradshaw <mjbshaw@google.com>
-
Thomas Guilbert authored
Fixes: 447860.webm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Mar, 2017 12 commits
-
-
Michael Niedermayer authored
Fixes: memleak Fixes: 741/clusterfuzz-testcase-586996200452915 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
avcodec/mpeg4videodec: Fix runtime error: signed integer overflow: -135088512 * 16 cannot be represented in type 'int' Fixes: 736/clusterfuzz-testcase-5580263943831552 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Fixes: 734/clusterfuzz-testcase-4821293192970240 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Fixes: 733/clusterfuzz-testcase-4682158096515072 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Fixes: undefined shift Fixes: 631/clusterfuzz-testcase-6725491035734016 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Timo Rothenpieler authored
Overhauled version, original patch by Miroslav Slugeň <thunder.m@email.cz>.
-
James Almer authored
Reviewed-by: Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
-
wm4 authored
Make it clear that there is no timing-dependent behavior. In particular, there is no state in which both input and output are denied, and where you have to wait for a while yourself to make progress (apparently some hardware decoders like to do this). Avoid wording that makes references to time. It shouldn't be mistaken for some kind of asynchronous API (like POSIX read() can return EAGAIN if there is no new input yet). It's a state machine, so try to use appropriate terms. Signed-off-by: Diego Biurrun <diego@biurrun.de> Merges Libav commit 8a60bba0.
-
wm4 authored
-
wm4 authored
-
wm4 authored
Apparently the demuxer outputs the wrong padding for HE-AAC (based on the raw sample rate, or so). aacdec contains a hack to adjust the muxer padding accordingly before it's used to trim the decoder output. This modified the packet side data, which in combination with the old decoding API would change the packet the user passed to the decoder. This is clearly not allowed, and it breaks running some gapless fate tests with "-fflags +keepside" applied (without keepside, the packet metadata is typically newly allocated, essentially making a copy and not modifying the user's input packet). This should probably be fixed in the demuxer (and consequently also the muxer), but for now only fix the immediate problem. Regression since 946ed78f (2012).
-
Muhammad Faiz authored
except filter_length == 1 odd filter_length gives worse frequency response, even when compared with shorter filter_length also makes build_filter simpler Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
-
- 08 Mar, 2017 6 commits
-
-
Michael Niedermayer authored
Fixes: 729/clusterfuzz-testcase-5154831595470848 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
avcodec/pictordec: Fix runtime error: left shift of 64 by 25 places cannot be represented in type 'int' Fixes: 724/clusterfuzz-testcase-6738249571631104 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpegSigned-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Muhammad Faiz authored
unintentionally changed to 0.01 at '61926b6c' Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
-
Michael Niedermayer authored
Fixes code with qemu ARM Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Thomas Turner authored
The Chen-Shapiro(CS) test was used to test normality for Lagged Fibonacci PRNG. Normality Hypothesis Test: The null hypothesis formally tests if the population the sample represents is normally-distributed. For CS, when the normality hypothesis is True, the distribution of QH will have a mean close to 1. Information on CS can be found here: http://www.stata-journal.com/sjpdf.html?articlenum=st0264 http://www.originlab.com/doc/Origin-Help/NormalityTest-AlgorithmSigned-off-by: Thomas Turner <thomastdt@googlemail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Muhammad Faiz authored
this gives better frequency response update swresample fate and other fates that depend on resampling Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
-