- 03 Dec, 2014 1 commit
-
-
Reimar Döffinger authored
Fixes fate failure on macosx clang x86-64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 22 Nov, 2014 1 commit
-
-
Reimar Döffinger authored
11674 -> 10877 decicycles on my Phenom II. Overall speedup was unfortunately within measurement error. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
-
- 07 May, 2014 1 commit
-
-
Matt Oliver authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 10 Apr, 2014 1 commit
-
-
Matt Oliver authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 18 Mar, 2014 1 commit
-
-
Matt Oliver authored
Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 13 Aug, 2012 1 commit
-
-
Mans Rullgard authored
This fixes two issues preventing suncc from building this code. The undocumented 'a' operand modifier, causing gcc to omit a $ in front of immediate operands (as required in addresses), is not supported by suncc. Luckily, the also undocumented 'c' modifer has the same effect and is supported. On some asm statements with a large number of operands, suncc for no obvious reason fails to correctly substitute some of the operands. Fortunately, some of the operands in these statements are plain numbers which can be inserted directly into the code block instead of passed as operands. With these changes, the code builds correctly with both gcc and suncc. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 25 Jun, 2012 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 23 Jun, 2012 1 commit
-
-
Mans Rullgard authored
This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 28 Apr, 2012 4 commits
-
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... Now that only one table is used, there's some chance even darwin as compiles this (apparently the label arithmetic used previously doesn't work if it involves symbols defined in a different file, thanks to Ronald S. Bultje for helping me with this). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Roland Scheidegger authored
The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
Roland Scheidegger authored
The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
-
- 21 Apr, 2012 1 commit
-
-
Michael Niedermayer authored
This broke compilation on darwin, revert until a better solution is found. This reverts commit a812b599.
-
- 20 Apr, 2012 1 commit
-
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... v2: incorporated feedback from Loren Merritt to avoid rip-relative movs for every table, and got rid of unnecessary @GOTPCREL. v3: apply similar fixes to the the decode_significance functions, and use same macro arguments for non-pic case. v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect the c code to be faster otherwise since both cmov and sbb suck hard on a Prescott, even can't construct the mask with a 64bit shift as that's just as terrible - it's quite difficult to find usable instructions on that chip...). This is tested to work but not on a P4, in theory it _should_ be fast there. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 28 Mar, 2012 3 commits
-
-
Ronald S. Bultje authored
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- 09 Jan, 2012 1 commit
-
-
Michael Niedermayer authored
This reverts commit c4f237a9. This didnt fix compilation on darwin with current clang.
-
- 06 Jan, 2012 1 commit
-
-
Michael Niedermayer authored
Author: Mans Rullgard <mans@mansr.com> Date: Sun Dec 11 21:41:59 2011 +0000 x86: cabac: replace explicit memory references with "m" operands This replaces the explicit offset(reg) memory references with "m" operands for the same locations. As a result, one fewer register operand is needed for these inline asm statements. This change appears to have broken compilation on darwin, and subsequent fixes by martin (which did not fix compilation) removed the register advantage, thus this change seems not a good idea to keep. See: http://fate.ffmpeg.org/log.cgi?time=20120103122446&log=compile&slot=i386-darwin-llvm-gcc-4.2.1Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 27 Dec, 2011 1 commit
-
-
Martin Storsjö authored
Originally, prior to 8742a4ff, the caller code was compiled within this condition: ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS) Since HAVE_7REGS is defined as (ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE)) the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal to HAVE_7REGS (for 32 bit at least). The correct simplification of the original condition thus is HAVE_7REGS, not HAVE_EBX_AVAILABLE. This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0 and HAVE_EBX_AVAILABLE = 1. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 21 Dec, 2011 1 commit
-
-
Diego Biurrun authored
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register is not available, but required to assemble the functions. This reverts commit 8742a4ff to a simplified version of the original constraints.
-
- 11 Dec, 2011 1 commit
-
-
Mans Rullgard authored
This replaces the explicit offset(reg) memory references with "m" operands for the same locations. As a result, one fewer register operand is needed for these inline asm statements. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 08 Nov, 2011 1 commit
-
-
Diego Biurrun authored
-
- 29 Jul, 2011 1 commit
-
-
Mans Rullgard authored
This fixes build with clang. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 28 Jul, 2011 2 commits
-
-
Mans Rullgard authored
Inspection of compiled code shows gcc handles these fine on its own. Benchmarking also shows no measurable speed difference. Removing the remaining cases in get_cabac_bypass_sign_x86() does cause more substantial changes to the compiled code with uncertain impact. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Jason Garrett-Glaser authored
-
- 29 Jun, 2011 1 commit
-
-
Carl Eugen Hoyos authored
-
- 22 Jun, 2011 1 commit
-
-
Carl Eugen Hoyos authored
-
- 20 Jun, 2011 9 commits
-
-
Mans Rullgard authored
Some operands need to be accessed in byte mode, which restricts the available registers in 32-bit mode. Using the 'q' constraint selects a suitable register. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Only the low-order bits are used here so the type is not important, but this avoids a compiler warning. Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 14 Jun, 2011 1 commit
-
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-
- 13 Jun, 2011 2 commits
-
-
Jason Garrett-Glaser authored
Needs some ARM/PPC asm modifications.
-
Jason Garrett-Glaser authored
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
-