Commits · 17c45d4d056d0e10ecb88b424ec9e68be398da5e · Linshizhi / ffmpeg.wasm-core

13 Apr, 2014 1 commit

x86/synth_filter: remove the fma3 version ifdefs · 0f524b6c

James Almer authored 10 years ago

This fixes compilation failures with --disable-fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

0f524b6c

04 Apr, 2014 3 commits

x86/synth_filter: add synth_filter_fma3 · c74b8669

James Almer authored 10 years ago

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

c74b8669

x86/synth_filter: add synth_filter_avx · 81e02fae

James Almer authored 10 years ago

Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx

Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

81e02fae

x86/synth_filter: add synth_filter_sse · 2025d802

James Almer authored 10 years ago

Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

2025d802

05 Mar, 2014 1 commit

x86: dcadsp: Fix linking with yasm and optimizations disabled · 3bfdee00

Diego Biurrun authored 10 years ago

Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.

3bfdee00

28 Feb, 2014 3 commits

dcadec: simplify decoding of VQ high frequencies · 4cb69642

Christophe Gisquet authored 11 years ago

The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.

Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.

The decode_hf implementations have following timings:

For x86 Arrandale:
        C  SSE SSE2 SSE4
win32: 260 162  119  104
win64: 242 N/A   89   72

The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.

4cb69642

x86: synth filter float: implement SSE2 version · 08e3ea60

Christophe Gisquet authored 11 years ago

Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

08e3ea60

x86: dcadsp: implement SSE lfe_dir · ad507d79

Christophe Gisquet authored 11 years ago

Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

ad507d79

07 Feb, 2014 1 commit

x86: dcadsp: implement int8x8_fmul_int32 · 5b59a9fc

Christophe Gisquet authored 12 years ago

For the callable function (as opposed to the inline one):
         C  SSE  SSE2  SSE4
Win32:  47   42   29    26
Win64:  30   33   25    23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

5b59a9fc

29 Aug, 2013 1 commit
- x86: avcodec: Use convenience macros to check for CPU flags · 6369ba3c
  Diego Biurrun authored 11 years ago
  
  6369ba3c
17 Jul, 2013 1 commit
- Consistently use "cpu_flags" as variable/parameter name for CPU flags · 3ac7fa81
  Diego Biurrun authored 11 years ago
  
  3ac7fa81
05 Feb, 2013 1 commit
- Add av_cold attributes to arch-specific init functions · c9f933b5
  Diego Biurrun authored 12 years ago
  
  c9f933b5
23 Jan, 2013 1 commit
- vorbisdsp: convert x86 simd functions from inline asm to yasm. · 2e4bb99f
  Ronald S. Bultje authored 12 years ago
  
  2e4bb99f
21 Jan, 2013 1 commit
- vorbisdsp: change block_size type from int to intptr_t. · 1768e43c
  Ronald S. Bultje authored 12 years ago
```
This saves one instruction in the x86-64 assembly.
```
  1768e43c
20 Jan, 2013 1 commit

Move vorbis_inverse_coupling from dsputil to vorbisdspcontext. · fef906c7

Ronald S. Bultje authored 12 years ago

Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.

fef906c7