- 01 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 23 Feb, 2014 3 commits
-
-
James Almer authored
Based on x264 code Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Based on x264 code Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 20 Feb, 2014 1 commit
-
-
Christophe Gisquet authored
vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
-
- 26 Jan, 2014 1 commit
-
-
Loren Merritt authored
Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 25 Oct, 2013 1 commit
-
-
Kieran Kunhya authored
Patch based on x264's AVX2 detection Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 14 Oct, 2013 4 commits
-
-
Jason Garrett-Glaser authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Jason Garrett-Glaser authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 09 Oct, 2013 1 commit
-
-
Henrik Gramner authored
The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 07 Oct, 2013 9 commits
-
-
Henrik Gramner authored
Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Jason Garrett-Glaser authored
Small backports that sneaked into other asm commits in x264. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
This is also a valid value for WIN64. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
For when we want to mix simd sizes within one function. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's OK because we strip them. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 21 Sep, 2013 1 commit
-
-
Alex Smith authored
Because of -Werror=implicit-function-declaration the build will fail. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 29 Aug, 2013 1 commit
-
-
Diego Biurrun authored
-
- 28 Aug, 2013 2 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 17 Jul, 2013 1 commit
-
-
Diego Biurrun authored
-
- 02 Jul, 2013 1 commit
-
-
Loren Merritt authored
Fixes build with yasm-1.1 Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 30 Jun, 2013 1 commit
-
-
Loren Merritt authored
-
- 29 Jun, 2013 2 commits
-
-
Loren Merritt authored
1.5x-1.8x faster on sandybridge Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
Loren Merritt authored
4x-6x faster on sandybridge Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- 04 May, 2013 1 commit
-
-
Diego Biurrun authored
-
- 03 May, 2013 1 commit
-
-
Christophe Gisquet authored
97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.
-
- 10 Apr, 2013 1 commit
-
-
Ronald S. Bultje authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 09 Apr, 2013 1 commit
-
-
Christophe Gisquet authored
cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 27 Mar, 2013 1 commit
-
-
Diego Biurrun authored
-
- 19 Feb, 2013 1 commit
-
-
Ronald S. Bultje authored
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 14 Feb, 2013 2 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 22 Jan, 2013 2 commits
-
-
Ronald S. Bultje authored
This makes the aac decoder and all voice codecs independent of dsputil.
-
Ronald S. Bultje authored
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.
-