- 16 May, 2016 4 commits
-
-
Anton Mitrofanov authored
Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Anton Mitrofanov authored
The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Anton Mitrofanov authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 23 Jan, 2016 8 commits
-
-
Geza Lore authored
Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Makes it possible to use them in arithmetic expressions. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 13 Aug, 2015 1 commit
-
-
Henrik Gramner authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 11 Aug, 2015 6 commits
-
-
Henrik Gramner authored
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Christophe Gisquet authored
Signed-off-by:
Henrik Gramner <henrik@gramner.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Anton Mitrofanov authored
Signed-off-by:
Henrik Gramner <henrik@gramner.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Anton Mitrofanov authored
Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by:
Henrik Gramner <henrik@gramner.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 28 May, 2015 1 commit
-
-
Timothy Gu authored
Silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- 09 Sep, 2014 3 commits
-
-
Henrik Gramner authored
Previously there was a limit of two cpuflags. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Loren Merritt authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Henrik Gramner authored
This makes more sense for future implementations of templates with zmm registers. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 01 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 26 Jan, 2014 1 commit
-
-
Loren Merritt authored
Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 14 Oct, 2013 4 commits
-
-
Jason Garrett-Glaser authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Jason Garrett-Glaser authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 09 Oct, 2013 1 commit
-
-
Henrik Gramner authored
The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 07 Oct, 2013 9 commits
-
-
Henrik Gramner authored
Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Jason Garrett-Glaser authored
Small backports that sneaked into other asm commits in x264. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Derek Buitenhuis authored
This is also a valid value for WIN64. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
For when we want to mix simd sizes within one function. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Henrik Gramner authored
Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
Loren Merritt authored
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's OK because we strip them. Signed-off-by:
Derek Buitenhuis <derek.buitenhuis@gmail.com>
-
- 09 Apr, 2013 1 commit
-
-
Christophe Gisquet authored
cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-