Commits · 730734d4f3e0f976b50cae9f94588f55e1845473 · Linshizhi / ffmpeg.wasm-core

04 May, 2016 1 commit
- cosmetics: Fix spelling mistakes · 41ed7ab4
  Vittorio Giovara authored 8 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  41ed7ab4
01 Jul, 2014 1 commit
- Update Fiona's name in copyright statements. · 79793f83
  Diego Biurrun authored 10 years ago
  
  79793f83
13 Mar, 2014 1 commit
- x86: Make function prototype comments in assembly code consistent · 55519926
  Diego Biurrun authored 11 years ago
```
This helps grepping for functions, among other things.
```
  55519926
04 Nov, 2013 1 commit
- x86: rv40dsp: Use PAVGB instruction macro where appropriate · e2b5b097
  Diego Biurrun authored 11 years ago
  
  e2b5b097
30 Aug, 2013 1 commit
- Reinstate proper FFmpeg license for all files. · d814a839
  Thilo Borgmann authored 11 years ago
  
  d814a839
13 Nov, 2012 1 commit
- x86: mmx2 ---> mmxext in asm constructs · 26301caa
  Diego Biurrun authored 12 years ago
  
  26301caa
30 Oct, 2012 2 commits
- x86: yasm: Use complete source path for macro helper %includes · 04581c8c
  Diego Biurrun authored 12 years ago
```
This is more consistent with the way we handle C #includes and
it simplifies the build system.
```
  04581c8c
- x86: include x86inc.asm in x86util.asm · 6860b408
  Diego Biurrun authored 12 years ago
```
This is necessary to allow refactoring some x86util macros with cpuflags.
```
  6860b408
07 Aug, 2012 1 commit

x86: use 32-bit source registers with movd instruction · 2b140a3d

Mans Rullgard authored 12 years ago

yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register.  nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>

2b140a3d

15 May, 2012 1 commit
- x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. · 6797d194
  Michael Kostylev authored 12 years ago
  
  6797d194
10 May, 2012 1 commit

rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc

Christophe Gisquet authored 12 years ago

Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

110d0cdc

10 Apr, 2012 2 commits

rv40dsp x86: use only one register, for both increment and loop counter · 2130bd8f
Christophe GISQUET authored 12 years ago
```
Around 10 cycles faster for luma.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
2130bd8f

rv40dsp: implement prescaled versions for biweight. · 272b252c

Christophe GISQUET authored 12 years ago

Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

272b252c

03 Feb, 2012 1 commit

Fix NASM compilation. · da1ba4e8

Reimar Döffinger authored 13 years ago

movd needs explicit register size prefix for NASM.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>

da1ba4e8

30 Jan, 2012 1 commit

rv40: x86 SIMD for biweight · e5c9de2a

Christophe Gisquet authored 13 years ago

Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).

*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips

This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

e5c9de2a