Commits · 68b9ed83918f96a97ad6dc8860423961b1d4bc0a · Linshizhi / ffmpeg.wasm-core

04 Apr, 2012 1 commit
- vp8dsp x86: perform rounding shift with a single instruction · f9888520
  Christophe GISQUET authored 12 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  f9888520
10 Mar, 2012 2 commits
- vp8: convert mbedge loopfilter x86 assembly to use named arguments. · a928ed37
  Ronald S. Bultje authored 12 years ago
  
  a928ed37
- vp8: convert inner loopfilter x86 assembly to use named arguments. · bee330e3
  Ronald S. Bultje authored 12 years ago
  
  bee330e3
04 Mar, 2012 5 commits
- vp8: convert simple loopfilter x86 assembly to use named arguments. · b4188f0d
  Ronald S. Bultje authored 12 years ago
  
  b4188f0d
- vp8: convert idct x86 assembly to use named arguments. · 8476ca3b
  Ronald S. Bultje authored 12 years ago
  
  8476ca3b
- vp8: convert mc x86 assembly to use named arguments. · 21ffc78f
  Ronald S. Bultje authored 12 years ago
  
  21ffc78f
- vp8: convert loopfilter x86 assembly to use cpuflags(). · 28170f1a
  Ronald S. Bultje authored 12 years ago
  
  28170f1a
- vp8: convert idct/mc x86 assembly to use cpuflags(). · e25be471
  Ronald S. Bultje authored 12 years ago
  
  e25be471
02 Mar, 2012 1 commit
- vp8: disable mmx functions with sse/sse2 counterparts on x86-64. · 45549339
  Ronald S. Bultje authored 12 years ago
```
x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2
functions will never be used in practice.
```
  45549339
15 Aug, 2011 1 commit
- Fix NASM include directive · cc73511e
  Dave Yeo authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  cc73511e
12 Aug, 2011 2 commits
- Move x86util.asm from libavcodec/ to libavutil/. · b2c08787
  Ronald S. Bultje authored 13 years ago
```
This allows using it in swscale also.
```
  b2c08787
- Move x86inc.asm to libavutil/. · 3a39195b
  Ronald S. Bultje authored 13 years ago
```
This allows using it in libswscale/ also.
```
  3a39195b
17 May, 2011 1 commit

Modify x86util.asm to ease transitioning to 10-bit H.264 assembly. · d0005d34

Daniel Kang authored 13 years ago

Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

d0005d34

14 May, 2011 1 commit
- Fix FSF address copy paste error in some license headers. · 888fa31e
  Diego Biurrun authored 13 years ago
  
  888fa31e
19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
05 Sep, 2010 1 commit

Use "d" suffix for general-purpose registers used with movd. · b1c32fb5

Reimar Döffinger authored 14 years ago

This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk

b1c32fb5

24 Aug, 2010 1 commit
- Mark xmm registers as clobbered in simple loopfilter. Should fix the last · 3611c45a
  Ronald S. Bultje authored 14 years ago
```
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  3611c45a
23 Aug, 2010 1 commit
- Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures). · 684d608b
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  684d608b
02 Aug, 2010 1 commit

VP8: move zeroing of luma DC block into the WHT · 827d43bb

Jason Garrett-Glaser authored 14 years ago

Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk

827d43bb

31 Jul, 2010 1 commit

Use word-writing instead of dword-writing (with two cached but otherwise · 6341838f

Ronald S. Bultje authored 14 years ago

unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).

Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk

6341838f

26 Jul, 2010 6 commits

Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. · ab4d0318
Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
ab4d0318

VP8: Much faster SSE2 MC · e25dee60

Jason Garrett-Glaser authored 14 years ago

5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.

Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk

e25dee60

Enable no-loop memory/register saving for ssse3/sse4 also. · 48adb7e7
Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
48adb7e7

Save a register (or regsize of stackspace for x86-32) for the no-loop · 2a180c69

Ronald S. Bultje authored 14 years ago

mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk

2a180c69

Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this · bcd4aa64

Ronald S. Bultje authored 14 years ago

construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk

bcd4aa64

Split pextrw macro-spaghetti into several opt-specific macros, this will make · 2208053b

Ronald S. Bultje authored 14 years ago

future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...

Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk

2208053b

25 Jul, 2010 1 commit
- Fix obvious bug in assignment. Somehow, the test vectors don't test this... · 6de5b7c6
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  6de5b7c6
24 Jul, 2010 1 commit

Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this · e3f7bf77

Ronald S. Bultje authored 14 years ago

splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.

Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk

e3f7bf77

23 Jul, 2010 4 commits

VP8: optimize DC-only chroma case in the same way as luma. · 3ae079a3

Jason Garrett-Glaser authored 14 years ago

Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk

3ae079a3

VP8 asm: cosmetics (spacing) · 51c91564
Jason Garrett-Glaser authored 14 years ago
```
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
51c91564

VP8: 30% faster idct_mb · 8a467b2d

Jason Garrett-Glaser authored 14 years ago

Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk

8a467b2d

VP8: clear DCT blocks in iDCT instead of using clear_blocks. · c25c7767
Jason Garrett-Glaser authored 14 years ago
```
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
c25c7767

22 Jul, 2010 2 commits
- Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on · dc5eec80
  Ronald S. Bultje authored 14 years ago
```
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  dc5eec80
- Fix and enable horizontal >=SSE2 mbedge loopfilter. · 003243c3
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  003243c3
21 Jul, 2010 3 commits

Eliminate one instruction in VP8 dc_add_sse4 · 8731dbd8
Jason Garrett-Glaser authored 14 years ago
```
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
8731dbd8

Various VP8 x86 deblocking speedups · 7dd224a4

Jason Garrett-Glaser authored 14 years ago

SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk

7dd224a4

Make mmx VP8 WHT faster · b8b231b5

Jason Garrett-Glaser authored 14 years ago

Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk

b8b231b5

20 Jul, 2010 2 commits
- VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16) · e9e456d8
  Ronald S. Bultje authored 14 years ago
```
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  e9e456d8
- Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder. · 268821e7
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  268821e7
19 Jul, 2010 1 commit
- Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's · c60ed66d
  Ronald S. Bultje authored 14 years ago
```
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  c60ed66d