- 14 Jan, 2017 19 commits
-
-
Martin Storsjö authored
This is cherrypicked from libav commit 85ad5ea7. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 65074791. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit c536e5e8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. This is cherrypicked from libav commits cad42fad and a0c443a3. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit 9c8bc74c. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. This is cherrypicked from libav commit 3c87039a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 2f99117f. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. This is cherrypicked from libav commit 79566ec8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 721bc375. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. This is cherrypicked from libav commit 4d960a11. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. This is cherrypicked from libav commit e5b0fc17. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent. This is cherrypicked from libav commit e7ae8f7a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
Since aarch64 has enough free general purpose registers use them to branch to the appropiate storage code. 1-2 cycles faster for the functions using loop_filter 8/16, ... on a cortex-a53. Mixed results (up to 2 cycles faster/slower) on a cortex-a57. This is cherrypicked from libav commit d7595de0. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Bradshaw authored
Signed-off-by: Michael Bradshaw <mjbshaw@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Carl Eugen Hoyos authored
-
Carl Eugen Hoyos authored
Fixes ticket #6068.
-
Martin Vignali authored
The duotone file is interpreted as gray Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Vignali authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 13 Jan, 2017 10 commits
-
-
Matthieu Bouron authored
-
Matthieu Bouron authored
-
Paul B Mahol authored
Fixes part of #5918. Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Steinar H. Gunderson authored
Seemingly ff_clear_block_sse assumed that the block array is aligned, so make sure it is. Fixes ticket #6079 Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>
-
- 12 Jan, 2017 11 commits
-
-
Steven Liu authored
because the oc have been potint to hls->avf or hls->vtt_avf here is not needed point once again Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
-
Steven Liu authored
when hlsenc use flag second_level_segment_index, second_level_segment_size and second_level_segment_duration, the rename is ok but the output filename always use the old filename so move the rename operation after the close the ts file and before open new segment Reported-by: Christian Johannesen <chrisjohannesen@gmail.com> Reviewed-by: Bodecs Bela <bodecsb@vivanet.hu> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
-
Steven Liu authored
CID: 1396852 check the devices_list alloc status, and release the devices_list when alloc devices error Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
-
Thomas Turner authored
Signed-off-by: Thomas Turner <thomastdt@googlemail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
James Almer authored
Fixes compilation with hardcoded tables after eaff1aa0 and e71b8119Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: James Almer <jamrial@gmail.com>
-
Carl Eugen Hoyos authored
Fixes ticket #6075.
-
Sergey Kudryashov authored
-
Nicolas George authored
Hopefully fix compilation with suncc.
-
Carl Eugen Hoyos authored
Fixes decoding the sample from ticket #6072 with ffmpeg.
-
Nicolas George authored
-
Nicolas George authored
-