Commits · 2f02bbcca050936686482453078e83dc25493da0 · Linshizhi / ffmpeg.wasm-core

03 May, 2013 2 commits

sbrdsp: Unroll and use integer operations · 4a7af92c

Christophe Gisquet authored 11 years ago

This patch can be controversial, by assuming floats are IEEE-754 and
particular behaviour of the FPU will get in the way.
Timing on Arrandale and Win32 (thus, x87 FPU is used in the reference).

sbr_qmf_pre_shuffle_c: 115 to 76
sbr_neg_odd_64_c: 84 to 55
sbr_qmf_post_shuffle_c: 112 to 83
Signed-off-by: Diego Biurrun <diego@biurrun.de>

4a7af92c

sbrdsp: Unroll sbr_autocorrelate_c · 8394d9a6
Christophe Gisquet authored 11 years ago
```
1410 cycles to 1148 on Arrandale/Win64
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
8394d9a6

08 Oct, 2012 1 commit
- x86: call most of the x86 dsp init functions under if (ARCH_X86) · f101eab1
  Janne Grunau authored 12 years ago
```
Rename the called dsp init functions to *_init_x86.
```
  f101eab1
07 Mar, 2012 1 commit

SBR DSP: unroll sum_square · dabf8dd3

Christophe GISQUET authored 12 years ago

The length is even, so some unrolling can be performed. Timings are for x86:
- 32bits: 102c -> 82c
- 64bits:  82c -> 69c
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

dabf8dd3

23 Feb, 2012 2 commits

SBR DSP x86: implement SSE sbr_sum_square_sse · 34454c76

Christophe GISQUET authored 12 years ago

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

34454c76

SBR DSP: use intptr_t for the ixh parameter. · 2e74a5ab
Christophe GISQUET authored 12 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
2e74a5ab

28 Jan, 2012 2 commits

aacsbr: ARM NEON optimised sbrdsp functions · be822d77

Mans Rullgard authored 12 years ago

Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>

be822d77

aacsbr: move some simdable loops to function pointers · aac46e08

Mans Rullgard authored 12 years ago

This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>

aac46e08