Commits · 07a7f08b1e4b4679a3f9a60ecc45cae5078bc414 · Linshizhi / ffmpeg.wasm-core

17 Oct, 2016 1 commit

x86: Add missing colons after assembly labels · 6be7944e

Diego Biurrun authored 8 years ago

This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error

6be7944e

05 Jul, 2016 1 commit
- x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32 · 645489cf
  James Almer authored 8 years ago
```
About 10% faster.
Signed-off-by: James Almer <jamrial@gmail.com>
```
  645489cf
23 Feb, 2016 1 commit

x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} · 45d3af90

James Almer authored 8 years ago

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

45d3af90

06 Feb, 2016 1 commit

x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3} · 8ae74479

James Almer authored 9 years ago

Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

8ae74479

31 Jan, 2016 1 commit

avcodec/dca: remove old decoder · 46089967

foo86 authored 9 years ago

Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.

46089967

25 Jan, 2016 1 commit
- avcodec/synth_filter: split off remaining code from dcadec files · 209f50e1
  James Almer authored 9 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  209f50e1
24 Dec, 2015 1 commit
- dca: remove unused decode_hf function and quant_d tables · 2008f760
  Alexandra Hájková authored 9 years ago
```
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
```
  2008f760
11 Aug, 2015 2 commits

x86inc: Drop SECTION_TEXT macro · ab43beef

Henrik Gramner authored 9 years ago

The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>

ab43beef

x86: dcadsp: Avoid SSE2 instructions in SSE functions · 4a53c758
Henrik Gramner authored 9 years ago
```
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
4a53c758

04 Aug, 2015 1 commit

x86inc: Drop SECTION_TEXT macro · f0b7882c

Henrik Gramner authored 9 years ago

The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.

f0b7882c

26 Jul, 2015 1 commit
- avcodec/x86: add missing colon to labels · 844bef57
  James Almer authored 9 years ago
```
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
```
  844bef57
13 Apr, 2014 1 commit

x86/synth_filter: remove the fma3 version ifdefs · 0f524b6c

James Almer authored 10 years ago

This fixes compilation failures with --disable-fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

0f524b6c

06 Apr, 2014 1 commit

dcadsp: fix SSE code to not use SSE2 instructions. · fc7e02f0

Hendrik Leppkes authored 10 years ago

movq from SSE register to memory is an SSE2 instruction.
Instead, use SSE movlps, which does the same thing.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

fc7e02f0

05 Apr, 2014 2 commits

x86/dcadsp: add ff_dca_lfe_fir0_fma3 · a1ac12bd

James Almer authored 10 years ago

~10% faster than the SSE version.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

a1ac12bd

x86/synth_filter: compile avx and fma3 functions unconditionally · 7d2116dd

James Almer authored 10 years ago

Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations"
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

7d2116dd

04 Apr, 2014 4 commits

x86/synth_filter: remove the main loop when it's not needed · dfd865e5
Christophe Gisquet authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
dfd865e5

x86/synth_filter: add synth_filter_fma3 · c74b8669

James Almer authored 10 years ago

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

c74b8669

x86/synth_filter: add synth_filter_avx · 81e02fae

James Almer authored 10 years ago

Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx

Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

81e02fae

x86/synth_filter: add synth_filter_sse · 2025d802

James Almer authored 10 years ago

Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>

2025d802

17 Mar, 2014 1 commit

x86/synth_filter: improve FMA version · aa1f3801

James Almer authored 10 years ago

Replace mulps+subps with fnmaddps, resulting in two less instructions inside the
inner loops.
About 1% faster FMA3 performance.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

aa1f3801

05 Mar, 2014 1 commit

x86/synth_filter: add synth_filter_fma3 · 7fd64e3e

James Almer authored 10 years ago

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

7fd64e3e

02 Mar, 2014 2 commits

x86/synth_filter: Revert the switch to float ops with SSE2 · 884e085d

James Almer authored 10 years ago

This reverts the changes 64672098
and 68c3ed93 did to the SSE2 version,
which generated a hit of about 5 cycles.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

884e085d

x86/synth_filter: add synth_filter_avx · 68c3ed93

James Almer authored 10 years ago

Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

68c3ed93

01 Mar, 2014 1 commit

x86/synth_filter: add synth_filter_sse · 64672098

James Almer authored 10 years ago

Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

64672098

28 Feb, 2014 5 commits

x86: synth filter float: implement SSE2 version · 2cdbcc00

Christophe Gisquet authored 11 years ago

Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2cdbcc00

x86: dcadsp: implement SSE lfe_dir · 16924311

Christophe Gisquet authored 11 years ago

Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

16924311

dcadec: simplify decoding of VQ high frequencies · 4cb69642

Christophe Gisquet authored 11 years ago

The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.

Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.

The decode_hf implementations have following timings:

For x86 Arrandale:
        C  SSE SSE2 SSE4
win32: 260 162  119  104
win64: 242 N/A   89   72

The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.

4cb69642

x86: synth filter float: implement SSE2 version · 08e3ea60

Christophe Gisquet authored 11 years ago

Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

08e3ea60

x86: dcadsp: implement SSE lfe_dir · ad507d79

Christophe Gisquet authored 11 years ago

Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

ad507d79

07 Feb, 2014 1 commit

x86: dcadsp: implement int8x8_fmul_int32 · 5b59a9fc

Christophe Gisquet authored 12 years ago

For the callable function (as opposed to the inline one):
         C  SSE  SSE2  SSE4
Win32:  47   42   29    26
Win64:  30   33   25    23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

5b59a9fc