- 16 Jan, 2017 10 commits
-
-
Clément Bœsch authored
-
Clément Bœsch authored
It is done unconditionally in ff_h264_field_end()
-
Clément Bœsch authored
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Paul B Mahol authored
Fixes #2056. Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Steve Lhomme authored
We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView casted from data[3]. Also added myself as maintainer for DXVA2 and D3D11VA. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Steve Lhomme authored
No need to loop through the known surfaces, we'll use the requested surface anyway. The loop is only done for DXVA2. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Steve Lhomme authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Andreas Cadhalpun authored
This fixes heap-buffer-overflows in libopenmpt caused by interpreting the negative size value as unsigned size_t. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Reviewed-by: Jörn Heusipp <osmanx@problemloesungsmaschine.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 15 Jan, 2017 3 commits
-
-
Daniil Cherednik authored
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
-
Daniil Cherednik authored
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
-
Rostislav Pehlivanov authored
When support for this was added the details weren't yet finalized. This is no longer the case. Fixes writing of mkv/webm files with HDR. Reported-by: Kagami Hiiragi <kagami@genshiken.org> Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com>
-
- 14 Jan, 2017 19 commits
-
-
Martin Storsjö authored
This is cherrypicked from libav commit 85ad5ea7. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 65074791. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit c536e5e8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. This is cherrypicked from libav commits cad42fad and a0c443a3. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit 9c8bc74c. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. This is cherrypicked from libav commit 3c87039a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 2f99117f. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. This is cherrypicked from libav commit 79566ec8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 721bc375. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. This is cherrypicked from libav commit 4d960a11. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. This is cherrypicked from libav commit e5b0fc17. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent. This is cherrypicked from libav commit e7ae8f7a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
Since aarch64 has enough free general purpose registers use them to branch to the appropiate storage code. 1-2 cycles faster for the functions using loop_filter 8/16, ... on a cortex-a53. Mixed results (up to 2 cycles faster/slower) on a cortex-a57. This is cherrypicked from libav commit d7595de0. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Michael Bradshaw authored
Signed-off-by: Michael Bradshaw <mjbshaw@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Carl Eugen Hoyos authored
-
Carl Eugen Hoyos authored
Fixes ticket #6068.
-
Martin Vignali authored
The duotone file is interpreted as gray Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Vignali authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 13 Jan, 2017 8 commits
-
-
Matthieu Bouron authored
-
Matthieu Bouron authored
-
Paul B Mahol authored
Fixes part of #5918. Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Steinar H. Gunderson authored
Seemingly ff_clear_block_sse assumed that the block array is aligned, so make sure it is. Fixes ticket #6079 Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com>
-