-
Christophe Gisquet authored
vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
133b3420
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
Makefile | ||
asm.h | ||
bswap.h | ||
cpu.c | ||
cpu.h | ||
cpuid.asm | ||
emms.asm | ||
emms.h | ||
float_dsp.asm | ||
float_dsp_init.c | ||
intreadwrite.h | ||
lls.asm | ||
lls_init.c | ||
timer.h | ||
w64xmmtest.h | ||
x86inc.asm | ||
x86util.asm |