- 25 Apr, 2014 13 commits
-
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit '6d69f9f3': vp9: write uveob as 16-bit value for 16x16/32x32 transforms. Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Ronald S. Bultje authored
This fixes make fate-vp9-00-quantizer-01 THREADS=2.
-
James Almer authored
This is needed for future AVX2 implementations Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
James Almer authored
Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Y.C. Liu authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
Reviewed-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ben Avison authored
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and for many CPUs, all the way out to main memory. The passes were: 1) locate frame boundaries (not needed for MKV etc) 2) copy the data into a contiguous block (not needed for MKV etc) 3) locate the start codes within each frame 4) unescape the data between start codes After this, the unescaped data was parsed to extract certain header fields, but because the unescape operation was so large, this was usually also effectively operating on uncached memory. Most of the unescaped data was simply thrown away and never processed further. Only step 2 - because it used memcpy - was using prefetch, making things even worse. This patch reorganises these steps so that, aside from the copying, the operations are performed in parallel, maximising cache utilisation. No more than the worst-case number of bytes needed for header parsing is unescaped. Most of the data is, in practice, only read in order to search for a start code, for which optimised implementations already existed in the H264 codec (notably the ARM version uses prefetch, so we end up doing both remaining passes at maximum speed). For MKV files, we know when we've found the last start code of interest in a given frame, so we are able to avoid doing even that one remaining pass for most of the buffer. In some use-cases (such as the Raspberry Pi) video decode is handled by the GPU, but the entire elementary stream is still fed through the parser to pick out certain elements of the header which are necessary to manage the decode process. As you might expect, in these cases, the performance of the parser is significant. To measure parser performance, I used the same VC-1 elementary stream in either an MPEG-2 transport stream or a MKV file, and fed it through ffmpeg with -c:v copy -c:a copy -f null. These are the gperftools counts for those streams, both filtered to only include vc1_parse() and its callees, and unfiltered (to include the whole binary). Lower numbers are better: Before After File Filtered Mean StdDev Mean StdDev Confidence Change M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5% MKV No 868.9 7.4 731.7 9.0 100.0% +18.8% M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9% MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3% Yes, that last case shows vc1_parse() running 86 times faster! The M2TS case does show a larger absolute improvement though, since it was worse to begin with. This patch has been tested with the FATE suite (albeit on x86 for speed). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ben Avison authored
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ben Avison authored
This permits re-use with parsers for codecs which use similar start codes. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 24 Apr, 2014 21 commits
-
-
Michael Niedermayer authored
* cehoyos/master: Enable muxing ac-3 in caf. Use correct msvc type specifiers for ptrdiff_t and size_t. Fix vf_eq.c and vf_eq2.c compilation with !HAVE_6REGS. Fix libpostproc compilation with !HAVE_6REGS. Never write 0 as maximum bitrate for asf files. Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit '8de77b66': fate: Add fic-in-avi test Conflicts: tests/ref/fate/fic-avi See: d66de500Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit 'a24a2527': aarch64: NEON optimized FIR audio resampling Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit 'cae8df78': lavr: define ResampleContext in resample.h Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit 'a88e1d1c': lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
James Almer authored
None of the handwritten asm in this function seems to be SSE2 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Derek Buitenhuis authored
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Michael Niedermayer authored
* commit '152b797c': flv: Do not mangle dts values for negative cts Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit '5d983fdb': flv: Warn only once Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit '374fdc8c': flv: Improve log messages Conflicts: libavformat/flvdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Janne Grunau authored
Optimized for the default filter length 16. 30% faster opus silk decoding.
-
Janne Grunau authored
Required for arch optimized resampling.
-
Janne Grunau authored
-
Carl Eugen Hoyos authored
The files play fine with QuickTime.
-
Carl Eugen Hoyos authored
The Windows runtime aborts if it finds %t or %z. Fixes ticket #3472. Reviewed-by: Ronald Bultje
-
Carl Eugen Hoyos authored
-
Carl Eugen Hoyos authored
-
Carl Eugen Hoyos authored
WMP refuses to play such streams.
-
Michael Niedermayer authored
Fixes: Ticket 3588 Found-by: jeremyhu Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
YuDenzel authored
Provide correct rpath flags to ld when --enable-rpath is provided.
-
- 23 Apr, 2014 6 commits
-
-
Michael Niedermayer authored
Fixes Ticket 3542 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
* commit '7cade8ea': on2avc: change a comment at #endif to match actual define Merged-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
before: 5225 decicycles in IDCT, 32756 runs, 12 skips after: 5057 decicycles in IDCT, 32765 runs, 3 skips Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
This also tests LINEAR_CORE_FLT_SSE Found-by: jamrial Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
The code was missing 1 bit in the src format Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Luca Barbato authored
Some applications really mean to send negative pts.
-