Commit f3e08490 authored by Christophe GISQUET's avatar Christophe GISQUET Committed by Ronald S. Bultje

mpegaudio: replace memcpy by SIMD code

By replacing memcpy with an unrolled loop using the alignment knowledge
it has, some speedup can be obtained.

Before (gcc 4.6.1): ~400 cycles
After: ~370 cycles

Overall, around 2% speed increase when decoding a 2400s mp3 to f32le.
Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
parent ae591aee
...@@ -106,7 +106,26 @@ static void apply_window_mp3(float *in, float *win, int *unused, float *out, ...@@ -106,7 +106,26 @@ static void apply_window_mp3(float *in, float *win, int *unused, float *out,
float sum; float sum;
/* copy to avoid wrap */ /* copy to avoid wrap */
memcpy(in + 512, in, 32 * sizeof(*in)); __asm__ volatile(
"movaps 0(%0), %%xmm0 \n\t" \
"movaps 16(%0), %%xmm1 \n\t" \
"movaps 32(%0), %%xmm2 \n\t" \
"movaps 48(%0), %%xmm3 \n\t" \
"movaps %%xmm0, 0(%1) \n\t" \
"movaps %%xmm1, 16(%1) \n\t" \
"movaps %%xmm2, 32(%1) \n\t" \
"movaps %%xmm3, 48(%1) \n\t" \
"movaps 64(%0), %%xmm0 \n\t" \
"movaps 80(%0), %%xmm1 \n\t" \
"movaps 96(%0), %%xmm2 \n\t" \
"movaps 112(%0), %%xmm3 \n\t" \
"movaps %%xmm0, 64(%1) \n\t" \
"movaps %%xmm1, 80(%1) \n\t" \
"movaps %%xmm2, 96(%1) \n\t" \
"movaps %%xmm3, 112(%1) \n\t"
::"r"(in), "r"(in+512)
:"memory"
);
apply_window(in + 16, win , win + 512, suma, sumc, 16); apply_window(in + 16, win , win + 512, suma, sumc, 16);
apply_window(in + 32, win + 48, win + 640, sumb, sumd, 16); apply_window(in + 32, win + 48, win + 640, sumb, sumd, 16);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment