x86/dsputil: fix VECTOR_CLIP_INT32 macro
The inline loop was incrementing and using the value of %%i the wrong way. Disassembly of ff_vector_clip_int32_sse2 before and after this patch: movdqa (%rdx),%xmm0 | movdqa (%rdx),%xmm0 movdqa 0x10(%rdx),%xmm1 | movdqa 0x10(%rdx),%xmm1 movdqa 0x20(%rdx),%xmm2 | movdqa 0x20(%rdx),%xmm2 movdqa 0x30(%rdx),%xmm3 | movdqa 0x30(%rdx),%xmm3 [...] | movdqa %xmm0,(%rcx) | movdqa %xmm0,(%rcx) movdqa %xmm1,0x10(%rcx) | movdqa %xmm1,0x10(%rcx) movdqa %xmm2,0x20(%rcx) | movdqa %xmm2,0x20(%rcx) movdqa %xmm3,0x30(%rcx) | movdqa %xmm3,0x30(%rcx) movdqa (%rdx),%xmm0 | movdqa 0x40(%rdx),%xmm0 movdqa 0x20(%rdx),%xmm1 | movdqa 0x50(%rdx),%xmm1 movdqa 0x40(%rdx),%xmm2 | movdqa 0x60(%rdx),%xmm2 movdqa 0x60(%rdx),%xmm3 | movdqa 0x70(%rdx),%xmm3 [...] | movdqa %xmm0,(%rcx) | movdqa %xmm0,0x40(%rcx) movdqa %xmm1,0x20(%rcx) | movdqa %xmm1,0x50(%rcx) movdqa %xmm2,0x40(%rcx) | movdqa %xmm2,0x60(%rcx) movdqa %xmm3,0x60(%rcx) | movdqa %xmm3,0x70(%rcx) add $0x80,%rdx | add $0x80,%rdx add $0x80,%rcx | add $0x80,%rcx Other versions were unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Showing
Please
register
or
sign in
to comment