- 17 Jan, 2017 14 commits
-
-
Anton Khirnov authored
(cherry picked from commit ea8b730d) Signed-off-by: Mark Thompson <sw@jkqxz.net>
-
Mark Thompson authored
(cherry picked from commit ccd0316f)
-
Mark Thompson authored
(cherry picked from commit 520fb772)
-
Mark Thompson authored
(cherry picked from commit 102e13c3)
-
Mark Thompson authored
(cherry picked from commit 2fe93244)
-
Mark Thompson authored
Moves much of the setup logic for VAAPI decoding into lavc; the user now need only provide the hw_frames_ctx. (cherry picked from commit 123ccd07) (cherry picked from commit 5e879b54) (cherry picked from commit 0aec37e6) (cherry picked from commit cfa4eb4f)
-
Mark Thompson authored
The lowest supported VAAPI version is 0.34 (checked at configure time), so this test is no longer needed. (cherry picked from commit 5a667322)
-
Mark Thompson authored
(cherry picked from commit 01d6f84f)
-
Mark Thompson authored
(cherry picked from commit ee906129)
-
Mark Thompson authored
(cherry picked from commit 03adfe91)
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Kacper Michajłow authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Matthieu Bouron authored
* commit 'f450cc7b': h264: eliminate decode_postinit() Also includes fixes from 1f7b4f9a and e344e651. Original patch replace H264Context.next_output_pic (H264Picture *) by H264Context.output_frame (AVFrame *). This change is discarded as it is incompatible with the frame reconstruction and motion vectors display code which needs the extra information from the H264Picture. Merged-by: Clément Bœsch <u@pkh.me> Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>
-
Matthieu Bouron authored
-
- 16 Jan, 2017 11 commits
-
-
Carl Eugen Hoyos authored
-
Clément Bœsch authored
-
Clément Bœsch authored
It is done unconditionally in ff_h264_field_end()
-
Clément Bœsch authored
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Paul B Mahol authored
Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Paul B Mahol authored
Fixes #2056. Signed-off-by: Paul B Mahol <onemda@gmail.com>
-
Steve Lhomme authored
We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView casted from data[3]. Also added myself as maintainer for DXVA2 and D3D11VA. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Steve Lhomme authored
No need to loop through the known surfaces, we'll use the requested surface anyway. The loop is only done for DXVA2. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Steve Lhomme authored
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Andreas Cadhalpun authored
This fixes heap-buffer-overflows in libopenmpt caused by interpreting the negative size value as unsigned size_t. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Reviewed-by: Jörn Heusipp <osmanx@problemloesungsmaschine.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
- 15 Jan, 2017 3 commits
-
-
Daniil Cherednik authored
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
-
Daniil Cherednik authored
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
-
Rostislav Pehlivanov authored
When support for this was added the details weren't yet finalized. This is no longer the case. Fixes writing of mkv/webm files with HDR. Reported-by: Kagami Hiiragi <kagami@genshiken.org> Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com>
-
- 14 Jan, 2017 12 commits
-
-
Martin Storsjö authored
This is cherrypicked from libav commit 85ad5ea7. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 65074791. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit c536e5e8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. This is cherrypicked from libav commits cad42fad and a0c443a3. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit 9c8bc74c. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. This is cherrypicked from libav commit 3c87039a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 2f99117f. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. This is cherrypicked from libav commit 79566ec8. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
This is cherrypicked from libav commit 721bc375. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Martin Storsjö authored
The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. This is cherrypicked from libav commit 4d960a11. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. This is cherrypicked from libav commit e5b0fc17. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-
Janne Grunau authored
The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent. This is cherrypicked from libav commit e7ae8f7a. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
-