- 10 Nov, 2016 8 commits
-
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>
-
James Almer authored
FLAC streams originating from the FLAC encoder send updated and more complete STREAMINFO metadata as part of the last packet, so write that to CodecPrivate instead of the incomplete one available in extradata during init. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
-
James Almer authored
aac_adtstoasc makes the aac extradata available only after the first packet is filtered, and as packet side data. Assume extradata will be available as part of the first packet if avpriv_mpeg4audio_get_config() fails the first time due to missing extradata and reserve space for the OutputSampleRate element in the Tracks master. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
-
Anton Khirnov authored
-
- 09 Nov, 2016 9 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
This will allow to error out immediately if incompatible options are passed on the command line instead of running time-consuming tests.
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
This avoids unnecessarily linking against xlib.
-
Diego Biurrun authored
Also drop a related and now redundant informative output line.
-
Diego Biurrun authored
Also drop a redundant output line.
-
Diego Biurrun authored
-
- 08 Nov, 2016 13 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
Inspired by a patch from Martin Storsjö <martin@martin.st>.
-
Vittorio Giovara authored
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
Vittorio Giovara authored
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
Vittorio Giovara authored
-
Vittorio Giovara authored
Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
Vittorio Giovara authored
-
Vittorio Giovara authored
Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
Diego Biurrun authored
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
-
Diego Biurrun authored
avplay.c:2928:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
-
Diego Biurrun authored
Fixes several warnings of the type avconv_opt.c:2356:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
-
- 07 Nov, 2016 10 commits
-
-
Andreas Cadhalpun authored
This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
Luca Barbato authored
-
Luca Barbato authored
-
Anton Khirnov authored
Handle the internal frame requests, which is required by the HEVC encoding plugin. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
This will allow implementing the allocator more fully, which is needed by the HEVC encoder plugin with video memory input. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
For encoding, this avoids modifying the input surface, which we are not allowed to do. This will also be useful in the following commits. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
Uploading/downloading data through VPP may not work for some formats, in that case we can still try to call av_hwframe_transfer_data() on the child context. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
Certain pixel formats (e.g. P8) might not be supported for download/upload through VPP operations, but can still be used otherwise. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-
Anton Khirnov authored
When using GPU surfaces with QSV, one needs to supply a frame allocator, which will be invoked to pass surface pools to libmfx. For encoding, this allocator gets invoked not only for the pool of input frames, but also for a separate pool of (apparently) reconstructed frames and another pool of MFX_FOURCC_P8, which on Windows needs to return D3DFMT_P8 D3D surfaces. Those are probably used to store the encoded bitstream on the GPU. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
-