Commits · 8e8347b89291ffa65779f1f8c85eed5d57d80a86 · Linshizhi / ffmpeg.wasm-core

22 Mar, 2014 1 commit
- x86: dsputil: Move hpeldsp-related declarations to a separate header · 82dd1026
  Diego Biurrun authored 11 years ago
  
  82dd1026
29 Aug, 2013 1 commit
- x86: avcodec: Consistently structure CPU extension initialization · e998b563
  Diego Biurrun authored 11 years ago
  
  e998b563
28 Aug, 2013 1 commit
- x86: rv40dsp: Move inline assembly optimizations out of YASM init section · cd529172
  Diego Biurrun authored 11 years ago
  
  cd529172
17 Jul, 2013 1 commit
- Consistently use "cpu_flags" as variable/parameter name for CPU flags · 3ac7fa81
  Diego Biurrun authored 11 years ago
  
  3ac7fa81
12 May, 2013 1 commit
- x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h · 1399931d
  Diego Biurrun authored 11 years ago
```
The header is not (anymore) MMX-specific.
```
  1399931d
07 May, 2013 1 commit
- x86: dsputil: Move rv40-specific functions where they belong · 63bac48f
  Diego Biurrun authored 11 years ago
  
  63bac48f
12 Mar, 2013 1 commit
- dsputil: convert remaining functions to use ptrdiff_t strides · a8b60158
  Luca Barbato authored 11 years ago
```
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
```
  a8b60158
06 Feb, 2013 1 commit
- rv34: Drop now unnecessary dsputil dependencies · 82bd04b1
  Diego Biurrun authored 12 years ago
  
  82bd04b1
05 Feb, 2013 1 commit
- Add av_cold attributes to arch-specific init functions · c9f933b5
  Diego Biurrun authored 12 years ago
  
  c9f933b5
13 Nov, 2012 1 commit
- x86: mmx2 ---> mmxext in asm constructs · 26301caa
  Diego Biurrun authored 12 years ago
  
  26301caa
08 Oct, 2012 1 commit
- x86: call most of the x86 dsp init functions under if (ARCH_X86) · f101eab1
  Janne Grunau authored 12 years ago
```
Rename the called dsp init functions to *_init_x86.
```
  f101eab1
08 Sep, 2012 1 commit

x86: Replace checks for CPU extensions and flags by convenience macros · e0c6cce4

Diego Biurrun authored 12 years ago

This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.

e0c6cce4

30 Aug, 2012 2 commits
- x86: Fix linking with some or all of yasm, mmx, optimizations disabled · ec36aa69
  Diego Biurrun authored 12 years ago
```
Some optimized template functions reference optimized symbols, so they
must be explicitly disabled when those symbols are unavailable.
```
  ec36aa69
- x86: cosmetics: Comment some #endifs for better readability · a886b279
  Diego Biurrun authored 12 years ago
  
  a886b279
15 Aug, 2012 1 commit
- Don't include common.h from avutil.h · 1d9c2dc8
  Martin Storsjö authored 12 years ago
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  1d9c2dc8
03 Aug, 2012 1 commit

x86: build: replace mmx2 by mmxext · 239fdf1b

Diego Biurrun authored 12 years ago

Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.

239fdf1b

25 Jul, 2012 1 commit

x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5

Ronald S. Bultje authored 12 years ago

This allows compiling with compilers that don't support gcc-style
inline assembly.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

79195ce5

22 Jun, 2012 1 commit
- cosmetics: do not use full path for local headers · a5a93fa8
  Diego Biurrun authored 12 years ago
  
  a5a93fa8
10 Jun, 2012 1 commit
- libavcodec/x86/rv40dsp_init.c: add missing HAVE_YASM · 3b196bb7
  Michael Niedermayer authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  3b196bb7
15 May, 2012 1 commit
- x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. · 6797d194
  Michael Kostylev authored 12 years ago
  
  6797d194
10 May, 2012 1 commit

rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc

Christophe Gisquet authored 12 years ago

Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

110d0cdc

10 Apr, 2012 1 commit

rv40dsp: implement prescaled versions for biweight. · 272b252c

Christophe GISQUET authored 12 years ago

Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

272b252c

20 Feb, 2012 1 commit

rv34: change most "int stride" into "ptrdiff_t stride". · 3ab9a2a5

Ronald S. Bultje authored 13 years ago

This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.

3ab9a2a5

30 Jan, 2012 3 commits

rv40: x86 SIMD for biweight · e5c9de2a

Christophe Gisquet authored 13 years ago

Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).

*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips

This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

e5c9de2a

x86: Give RV40 init file a more suitable name. · 91bafb52
Diego Biurrun authored 13 years ago

91bafb52
x86: Place mm_flags variable declaration below the appropriate #ifdef. · c30b1983
Diego Biurrun authored 13 years ago
```
This fixes some unused variable warnings with YASM disabled.
```
c30b1983

11 Aug, 2011 1 commit
- Move RV3/4-specific DSP functions into their own context · d241f51e
  Kostya Shishkov authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  d241f51e