- 06 Jul, 2012 1 commit
-
-
Martin Storsjö authored
The SPLATB_REG macro already adds the 'd' suffix internally. This fixes building on Win64, which has been broken since 878e6690. This worked for unix, where r2 happened to be rdx in this case, which with the first suffix rdxd was mapped to eax, and eaxd is defined back to eax. On win64 however, r2 happened to be R8 in this case, and R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 05 Jul, 2012 4 commits
-
-
Diego Biurrun authored
-
Loren Merritt authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
-
Martin Storsjö authored
This was missed in the the previous commit in 70a1c800. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 04 Jul, 2012 2 commits
-
-
Martin Storsjö authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Ronald S. Bultje authored
-
- 30 Jun, 2012 2 commits
-
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
This gets rid of a variable-length array and a for loop in C code. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 29 Jun, 2012 1 commit
-
-
Mans Rullgard authored
The problem is that the ssse3 psign instruction does the wrong thing here. Commit ea60dfe2 incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in ea60dfe2). Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 27 Jun, 2012 1 commit
-
-
Christophe Gisquet authored
The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 25 Jun, 2012 4 commits
-
-
Mans Rullgard authored
In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
-
Mans Rullgard authored
-
Ronald S. Bultje authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 23 Jun, 2012 4 commits
-
-
Mans Rullgard authored
This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Diego Biurrun authored
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 22 Jun, 2012 1 commit
-
-
Diego Biurrun authored
-
- 17 Jun, 2012 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 08 Jun, 2012 1 commit
-
-
Justin Ruggles authored
Move vector_fmul() from DSPContext to AVFloatDSPContext.
-
- 29 May, 2012 1 commit
-
-
Vitor Sessak authored
Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
-
- 22 May, 2012 1 commit
-
-
Justin Ruggles authored
This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 21 May, 2012 1 commit
-
-
Kieran Kunhya authored
Signed-off-by:
Justin Ruggles <justin.ruggles@gmail.com>
-
- 15 May, 2012 2 commits
-
-
Michael Kostylev authored
-
Justin Ruggles authored
Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.
-
- 14 May, 2012 1 commit
-
-
Vitor Sessak authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 12 May, 2012 1 commit
-
-
Michael Kostylev authored
-
- 10 May, 2012 1 commit
-
-
Christophe Gisquet authored
Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 02 May, 2012 1 commit
-
-
Ronald S. Bultje authored
Fixes a compile error with clang at -O0.
-
- 28 Apr, 2012 5 commits
-
-
Christophe GISQUET authored
Commit 356ee8d7 caused the initial inversion. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Christophe Gisquet authored
141 cycles down to 51. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Roland Scheidegger authored
This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Roland Scheidegger authored
The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
Roland Scheidegger authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 21 Apr, 2012 2 commits
-
-
Mans Rullgard authored
This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 16 Apr, 2012 1 commit
-
-
Ronald S. Bultje authored
Fixes crashes when using biweight on win64.
-
- 13 Apr, 2012 1 commit
-
-
Ronald S. Bultje authored
Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.
-