- 16 Mar, 2014 1 commit
-
-
Vittorio Giovara authored
-
- 15 Mar, 2014 2 commits
-
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
avcodec/h264_cabac: move the arm unchecked_bitstream reader special case closer to where the issue is Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 14 Mar, 2014 1 commit
-
-
Michael Niedermayer authored
The newly added optimizations do not work with the unchecked reader Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 25 Jan, 2014 1 commit
-
-
Janne Grunau authored
Added libavutil/timer.h include to all files with {START,STOP}_TIMER.
-
- 23 Jan, 2014 1 commit
-
-
Michael Niedermayer authored
Found-by: Dale Curtis <dalecurtis@google.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 03 Nov, 2013 1 commit
-
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 24 Aug, 2013 1 commit
-
-
Diego Biurrun authored
This ensures that decode_cabac_residual_internal actually does get inlined, which it otherwise does not, even though it is marked as always_inline.
-
- 20 Aug, 2013 1 commit
-
-
Diego Biurrun authored
-
- 21 Mar, 2013 7 commits
-
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
Anton Khirnov authored
This way it does not look like a constant.
-
- 08 Mar, 2013 1 commit
-
-
Anton Khirnov authored
-
- 25 Feb, 2013 1 commit
-
-
Diego Biurrun authored
-
- 19 Feb, 2013 1 commit
-
-
Ronald S. Bultje authored
Instead, keep them in the bitstream buffer until we read them verbatim, this saves a memcpy() and a subsequent clearing of the target buffer. decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from 6121.4 to 6095.5 cycles, i.e. 26 cycles faster. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 18 Feb, 2013 1 commit
-
-
Ronald S. Bultje authored
Instead, keep them in the bitstream buffer until we read them verbatim, this saves a memcpy() and a subsequent clearing of the target buffer. decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from 6121.4 to 6095.5 cycles, i.e. 26 cycles faster. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 15 Feb, 2013 1 commit
-
-
Anton Khirnov authored
Most of the changes are just trivial are just trivial replacements of fields from MpegEncContext with equivalent fields in H264Context. Everything in h264* other than h264.c are those trivial changes. The nontrivial parts are: 1) extracting a simplified version of the frame management code from mpegvideo.c. We don't need last/next_picture anymore, since h264 uses its own more complex system already and those were set only to appease the mpegvideo parts. 2) some tables that need to be allocated/freed in appropriate places. 3) hwaccels -- mostly trivial replacements. for dxva, the draw_horiz_band() call is moved from ff_dxva2_common_end_frame() to per-codec end_frame() callbacks, because it's now different for h264 and MpegEncContext-based decoders. 4) svq3 -- it does not use h264 complex reference system, so I just added some very simplistic frame management instead and dropped the use of ff_h264_frame_start(). Because of this I also had to move some initialization code to svq3. Additional fixes for chroma format and bit depth changes by Janne Grunau <janne-libav@jannau.net> Signed-off-by: Anton Khirnov <anton@khirnov.net>
-
- 31 Jan, 2013 1 commit
-
-
Michael Niedermayer authored
fix out of array read Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 23 Jan, 2013 1 commit
-
-
Diego Biurrun authored
It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 18 Nov, 2012 1 commit
-
-
Michael Niedermayer authored
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 05 Oct, 2012 2 commits
-
-
Ronald S. Bultje authored
The variable is copied to subsequent threads at the same time, so this may cause wrong ref_count[] values to be copied to subsequent threads. This bug was found using TSAN and Helgrind. Original patch by Ronald, adapted with a local_ref_count by Clément, following the suggestion of Michael Niedermayer. Signed-off-by: Clément Bœsch <clement.boesch@smartjog.com>
-
Ronald S. Bultje authored
The variable is copied to subsequent threads at the same time, so this may cause wrong ref_count[] values to be copied to subsequent threads. This bug was found using TSAN. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
- 01 Oct, 2012 2 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 26 Jul, 2012 1 commit
-
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 23 Jun, 2012 1 commit
-
-
Mans Rullgard authored
This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 28 Apr, 2012 4 commits
-
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... Now that only one table is used, there's some chance even darwin as compiles this (apparently the label arithmetic used previously doesn't work if it involves symbols defined in a different file, thanks to Ronald S. Bultje for helping me with this). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Roland Scheidegger authored
The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
Roland Scheidegger authored
The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 21 Apr, 2012 1 commit
-
-
Michael Niedermayer authored
This broke compilation on darwin, revert until a better solution is found. This reverts commit a812b599.
-
- 20 Apr, 2012 1 commit
-
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... v2: incorporated feedback from Loren Merritt to avoid rip-relative movs for every table, and got rid of unnecessary @GOTPCREL. v3: apply similar fixes to the the decode_significance functions, and use same macro arguments for non-pic case. v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect the c code to be faster otherwise since both cmov and sbb suck hard on a Prescott, even can't construct the mask with a 64bit shift as that's just as terrible - it's quite difficult to find usable instructions on that chip...). This is tested to work but not on a P4, in theory it _should_ be fast there. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 05 Apr, 2012 1 commit
-
-
Diego Biurrun authored
-
- 28 Mar, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 29 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
-
- 10 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
Conversion of the luma intra prediction mode to one of the constrained ("alzheimer") ones can happen by crafting special bitstreams, causing a crash because we'll call a NULL function pointer for 16x16 block intra prediction, since constrained intra prediction functions are only implemented for chroma (8x8 blocks). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
-