SBR DSP x86: implement SSE sbr_sum_square_sse
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
Showing
libavcodec/x86/sbrdsp.asm
0 → 100644
libavcodec/x86/sbrdsp_init.c
0 → 100644
Please
register
or
sign in
to comment