- 14 Sep, 2018 1 commit
-
-
James Almer authored
~3x to 5x faster. Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 01 Aug, 2018 2 commits
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Should fix compilation with old yasm/nasm Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 31 Jul, 2018 1 commit
-
-
Jun Zhao authored
add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2, ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2 use perf record/report profiling, get instructions:u for avx2 sad_32x32: 72.05% pixelutils pixelutils [.] block_sad_32x32_c 18.50% pixelutils pixelutils [.] block_sad_16x16_c 4.78% pixelutils pixelutils [.] block_sad_8x8_c 2.69% pixelutils pixelutils [.] block_sad_4x4_c 0.89% pixelutils pixelutils [.] block_sad_2x2_c 0.16% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_avx2 0.16% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_avx2 0.12% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_avx2 sse2 sad_32x32 instructions:u like: 71.86% pixelutils pixelutils [.] block_sad_32x32_c 18.42% pixelutils pixelutils [.] block_sad_16x16_c 4.81% pixelutils pixelutils [.] block_sad_8x8_c 2.68% pixelutils pixelutils [.] block_sad_4x4_c 0.88% pixelutils pixelutils [.] block_sad_2x2_c 0.29% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_sse2 0.26% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_sse2 0.23% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_sse2 Signed-off-by:
Jun Zhao <mypopydev@gmail.com>
-
- 19 Jul, 2018 1 commit
-
-
alexander schmid authored
-
- 11 Jul, 2018 1 commit
-
-
Jun Zhao authored
Signed-off-by:
Jun Zhao <mypopydev@gmail.com>
-
- 06 Feb, 2018 1 commit
-
-
Diego Biurrun authored
-
- 20 Jan, 2018 5 commits
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
On ELF platforms such symbols needs to be flagged as functions with the correct visibility to please certain linkers in some scenarios.
-
Henrik Gramner authored
The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal.
-
Henrik Gramner authored
There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose.
-
- 25 Dec, 2017 1 commit
-
-
James Almer authored
Fixes compilation of libavresample/x86/audio_mix.asm Reviewed-by: Gramner Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 24 Dec, 2017 4 commits
-
-
Henrik Gramner authored
AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) * AVX-512 Conflict Detection Instructions (CD) * AVX-512 Byte and Word Instructions (BW) * AVX-512 Doubleword and Quadword Instructions (DQ) * AVX-512 Vector Length Extensions (VL) On x86-64 AVX-512 provides 16 additional vector registers, prefer using those over existing ones since it allows us to avoid using `vzeroupper` unless more than 16 vector registers are required. They also happen to be volatile on Windows which means that we don't need to save and restore existing xmm register contents unless more than 22 vector registers are required. Big thanks to Intel for their support.
-
James Darnley authored
-
James Darnley authored
-
James Darnley authored
-
- 02 Dec, 2017 1 commit
-
-
Martin Vignali authored
avutil/x86util : add macro for loading a 128 bits constants in an xmm or in each part of an ymm in order to simplify avx2 asm func
-
- 25 Oct, 2017 1 commit
-
-
Dale Curtis authored
Technically _tzcnt* intrinsics are only available when the BMI instruction set is present. However the instruction encoding degrades to "rep bsf" on older processors. Clang for Windows debatably restricts the _tzcnt* instrinics behind the __BMI__ architecture define, so check for its presence or exclude the usage of these intrinics when clang is present. See also: https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html https://bugs.llvm.org/show_bug.cgi?id=30506 http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.htmlSigned-off-by:
Dale Curtis <dalecurtis@chromium.org> Reviewed-by:
Matt Oliver <protogonoi@gmail.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 09 Oct, 2017 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- 28 Sep, 2017 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 18 Aug, 2017 1 commit
-
-
Ivan Kalvachev authored
Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by:
Ivan Kalvachev <ikalvachev@gmail.com>
-
- 27 Jun, 2017 1 commit
-
-
James Almer authored
Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by:
Clément Bœsch <u@pkh.me> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 21 Jun, 2017 1 commit
-
-
Diego Biurrun authored
None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4) Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 19 Jun, 2017 1 commit
-
-
James Almer authored
About 2x faster than the c version.
-
- 12 Jun, 2017 1 commit
-
-
Henrik Gramner authored
Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2
-
- 09 Jun, 2017 4 commits
-
-
Anton Mitrofanov authored
The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.
-
Henrik Gramner authored
Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.
-
Henrik Gramner authored
There's no point in emitting a rep prefix before ret on modern CPUs.
-
Henrik Gramner authored
We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.
-
- 13 May, 2017 1 commit
-
-
James Almer authored
-
- 12 Apr, 2017 1 commit
-
-
James Almer authored
~20% faster than AVX. Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 10 Apr, 2017 1 commit
-
-
James Almer authored
-
- 21 Mar, 2017 1 commit
-
-
James Almer authored
Reviewed-by:
Henrik Gramner <henrik@gramner.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 14 Mar, 2017 1 commit
-
-
Diego Biurrun authored
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.
-
- 01 Mar, 2017 1 commit
-
-
Diego Biurrun authored
None of them are specific to the YASM assembler.
-
- 18 Feb, 2017 3 commits
-
-
James Darnley authored
x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
-
James Darnley authored
Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
-
James Darnley authored
-
- 03 Feb, 2017 1 commit
-
-
Diego Biurrun authored
-