Commits · 3a71bcc213f223428622ac3750fe1a923f2f3ab4 · Linshizhi / ffmpeg.wasm-core

25 Jan, 2015 1 commit
- x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3} · 449b21bf
  James Almer authored 10 years ago
```
2 to 2.5 times faster.
Signed-off-by: James Almer <jamrial@gmail.com>
```
  449b21bf
15 May, 2014 1 commit

x86: sbrdsp: implement SSE qmf_deint_neg · d1310c59

Christophe Gisquet authored 12 years ago

From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

d1310c59

30 Aug, 2013 1 commit
- Reinstate proper FFmpeg license for all files. · d814a839
  Thilo Borgmann authored 11 years ago
  
  d814a839
17 Jul, 2013 1 commit
- Consistently use "cpu_flags" as variable/parameter name for CPU flags · 3ac7fa81
  Diego Biurrun authored 11 years ago
  
  3ac7fa81
10 May, 2013 1 commit

x86: sbrdsp: implement SSE2 qmf_pre_shuffle · 2c299d41

Christophe Gisquet authored 12 years ago

From 253 to 51 cycles on Arrandale and Win64.
44 cycles on SandyBridge.
Signed-off-by: Anton Khirnov <anton@khirnov.net>

2c299d41

08 May, 2013 1 commit

x86: sbrdsp: force PIC addressing for Win64 · fc37cd43

Christophe Gisquet authored 11 years ago

MSVC complains about the 32bits addressing, while mingw/gcc does not.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

fc37cd43

03 May, 2013 1 commit

x86: sbrdsp: Implement SSE2 qmf_deint_bfly · 5a97469a

Christophe Gisquet authored 11 years ago

Sandybridge: 47 cycles

Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

5a97469a

24 Apr, 2013 1 commit

avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC, Try #2 · fc690333

Michael Niedermayer authored 11 years ago

    This should fix building with MSVC until someone can change the
    code so it works with MSVC
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

fc690333

23 Apr, 2013 1 commit

avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC · 7a617d6c

Michael Niedermayer authored 11 years ago

This should fix building with MSVC until someone can change the
code so it works with MSVC
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

7a617d6c

19 Apr, 2013 1 commit

x86: sbrdsp: implement SSE2 hf_apply_noise · 76c72773

Christophe Gisquet authored 11 years ago

233 to 105 cycles on Arrandale and Win64.
Replacing the multiplication by s_m[m] by a pand and a pxor with
appropriate vectors is slower. Unrolling is a 15 cycles win.
A SSE version was 4 cycles slower.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

76c72773

10 Apr, 2013 1 commit

x86: sbrdsp: implement SSE2 qmf_pre_shuffle · 2383068c

Christophe Gisquet authored 11 years ago

From 253 to 51 cycles on Arrandale and Win64.
44 cycles on SandyBridge.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2383068c

08 Apr, 2013 1 commit

x86: sbrdsp: implement SSE qmf_deint_bfly · e2946e5c

Christophe Gisquet authored 11 years ago

From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64.
Sandybridge: 68/47 cycles.

Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

e2946e5c

05 Apr, 2013 2 commits

x86: sbrdsp: Implement SSE neg_odd_64 · f4b0d12f

Christophe Gisquet authored 12 years ago

Timing on Arrandale:
        C   SSE
Win32:  57   44
Win64:  47   38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

f4b0d12f

x86: sbrdsp: implement SSE neg_odd_64 · 37a97083

Christophe Gisquet authored 11 years ago

Timing on Arrandale:
        C   SSE
Win32:  57   44
Win64:  47   38
Unrolling and not storing mask both save some cycles.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

37a97083

05 Feb, 2013 1 commit
- Add av_cold attributes to arch-specific init functions · c9f933b5
  Diego Biurrun authored 12 years ago
  
  c9f933b5
06 Jan, 2013 2 commits

x86: sbrdsp: Implement SSE qmf_post_shuffle · 4f506466

Christophe Gisquet authored 12 years ago

255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

4f506466

x86: sbrdsp: Implement SSE sum64x5 · 44a0036d

Christophe Gisquet authored 12 years ago

698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

44a0036d

07 Dec, 2012 1 commit

SBR DSP x86: implement SSE sbr_hf_gen · 2aef3d66

Christophe Gisquet authored 12 years ago

Start and end index are multiple of 2, therefore guaranteeing aligned access.
Also, this allows to generate 4 floats per loop, keeping the alignment all
along.

Timing:
- 32 bits: 326c -> 172c
- 64 bits: 323c -> 156c
Signed-off-by: Diego Biurrun <diego@biurrun.de>

2aef3d66

08 Sep, 2012 1 commit

x86: Replace checks for CPU extensions and flags by convenience macros · e0c6cce4

Diego Biurrun authored 12 years ago

This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.

e0c6cce4

23 Feb, 2012 2 commits

SBR DSP x86: implement SSE sbr_hf_g_filt · 2784d187

Christophe GISQUET authored 12 years ago

Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.

Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.

Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

2784d187

SBR DSP x86: implement SSE sbr_sum_square_sse · 34454c76

Christophe GISQUET authored 12 years ago

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

34454c76

19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
09 Sep, 2010 1 commit

Clean up av_get_cpu_flag() · 9275438a

Måns Rullgård authored 14 years ago

Instead of defining functions in per-arch header files included
by the main cpu.c, define them normally and call them from the
generic one.

Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk

9275438a

08 Sep, 2010 1 commit

Move mm_support() from libavcodec to libavutil, make it a public · c6c98d08

Stefano Sabatini authored 14 years ago

function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk

c6c98d08

22 Dec, 2008 1 commit

Rename libavcodec/i386/ --> libavcodec/x86/. · a6493a8f

Diego Biurrun authored 16 years ago

It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.

Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk

a6493a8f

12 Aug, 2008 1 commit

split-radix FFT · 5d0ddd1a

Loren Merritt authored 16 years ago

c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse.

Originally committed as revision 14698 to svn://svn.ffmpeg.org/ffmpeg/trunk

5d0ddd1a

09 May, 2008 1 commit
- Use full path for #includes from another directory. · 245976da
  Diego Biurrun authored 16 years ago
```
Originally committed as revision 13098 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  245976da
08 May, 2008 1 commit

Do not misuse long as the size of a register in x86. · 40d0e665

Ramiro Polla authored 16 years ago

typedef x86_reg as the appropriate size and use it instead.

Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk

40d0e665

16 May, 2007 1 commit

Add libavcodec to compiler include flags in order to simplify header · b550bfaa

Ronald S. Bultje authored 17 years ago

include paths in the source files.
mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net

Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk

b550bfaa

07 Oct, 2006 1 commit
- Change license headers to say 'FFmpeg' instead of 'this program/this library' · b78e7197
  Diego Biurrun authored 18 years ago
```
and fix GPL/LGPL version mismatches.

Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  b78e7197
18 Aug, 2006 1 commit

ff_fft_calc_3dn/3dn2/sse: convert intrinsics to inline asm. · 1e4ecf26

Loren Merritt authored 18 years ago

2.5% faster fft, 0.5% faster vorbis.

Originally committed as revision 6023 to svn://svn.ffmpeg.org/ffmpeg/trunk

1e4ecf26

08 Mar, 2006 1 commit

3DNow! & Extended 3DNow! versions of FFT · 82eb4b0f

Zuxy Meng authored 18 years ago

Patch by Zuxy Meng, zuxy <<dot>> meng >>at<< gmail <<dot>> com
Minor non-functional diff-related fixes by me.

Originally committed as revision 5125 to svn://svn.ffmpeg.org/ffmpeg/trunk

82eb4b0f