libavcodec/sbrdsp.h · a4e359a3f98650dab3d2e93f067658e20fa9c0d7 · Linshizhi / ffmpeg.wasm-core

SBR DSP x86: implement SSE sbr_sum_square_sse · 34454c76

Christophe GISQUET authored Feb 23, 2012

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

34454c76

sbrdsp.h 1.96 KB

Replace sbrdsp.h