Commits · 97aa092997cab8f8f337548a7c85b7fde09b4cdf · Linshizhi / ffmpeg.wasm-core

15 Aug, 2011 1 commit
- Fix NASM include directive · cc73511e
  Dave Yeo authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  cc73511e
12 Aug, 2011 2 commits
- Move x86util.asm from libavcodec/ to libavutil/. · b2c08787
  Ronald S. Bultje authored 13 years ago
```
This allows using it in swscale also.
```
  b2c08787
- Move x86inc.asm to libavutil/. · 3a39195b
  Ronald S. Bultje authored 13 years ago
```
This allows using it in libswscale/ also.
```
  3a39195b
17 May, 2011 1 commit

Modify x86util.asm to ease transitioning to 10-bit H.264 assembly. · d0005d34

Daniel Kang authored 13 years ago

Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

d0005d34

14 May, 2011 1 commit
- Fix FSF address copy paste error in some license headers. · 888fa31e
  Diego Biurrun authored 13 years ago
  
  888fa31e
19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
05 Sep, 2010 1 commit

Use "d" suffix for general-purpose registers used with movd. · b1c32fb5

Reimar Döffinger authored 14 years ago

This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk

b1c32fb5

24 Aug, 2010 1 commit
- Mark xmm registers as clobbered in simple loopfilter. Should fix the last · 3611c45a
  Ronald S. Bultje authored 14 years ago
```
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  3611c45a
23 Aug, 2010 1 commit
- Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures). · 684d608b
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  684d608b
02 Aug, 2010 1 commit

VP8: move zeroing of luma DC block into the WHT · 827d43bb

Jason Garrett-Glaser authored 14 years ago

Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk

827d43bb

31 Jul, 2010 1 commit

Use word-writing instead of dword-writing (with two cached but otherwise · 6341838f

Ronald S. Bultje authored 14 years ago

unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).

Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk

6341838f

26 Jul, 2010 6 commits

Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. · ab4d0318
Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
ab4d0318

VP8: Much faster SSE2 MC · e25dee60

Jason Garrett-Glaser authored 14 years ago

5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.

Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk

e25dee60

Enable no-loop memory/register saving for ssse3/sse4 also. · 48adb7e7
Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
48adb7e7

Save a register (or regsize of stackspace for x86-32) for the no-loop · 2a180c69

Ronald S. Bultje authored 14 years ago

mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk

2a180c69

Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this · bcd4aa64

Ronald S. Bultje authored 14 years ago

construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk

bcd4aa64

Split pextrw macro-spaghetti into several opt-specific macros, this will make · 2208053b

Ronald S. Bultje authored 14 years ago

future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...

Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk

2208053b

25 Jul, 2010 1 commit
- Fix obvious bug in assignment. Somehow, the test vectors don't test this... · 6de5b7c6
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  6de5b7c6
24 Jul, 2010 1 commit

Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this · e3f7bf77

Ronald S. Bultje authored 14 years ago

splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.

Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk

e3f7bf77

23 Jul, 2010 4 commits

VP8: optimize DC-only chroma case in the same way as luma. · 3ae079a3

Jason Garrett-Glaser authored 14 years ago

Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk

3ae079a3

VP8 asm: cosmetics (spacing) · 51c91564
Jason Garrett-Glaser authored 14 years ago
```
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
51c91564

VP8: 30% faster idct_mb · 8a467b2d

Jason Garrett-Glaser authored 14 years ago

Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk

8a467b2d

VP8: clear DCT blocks in iDCT instead of using clear_blocks. · c25c7767
Jason Garrett-Glaser authored 14 years ago
```
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
c25c7767

22 Jul, 2010 2 commits
- Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on · dc5eec80
  Ronald S. Bultje authored 14 years ago
```
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  dc5eec80
- Fix and enable horizontal >=SSE2 mbedge loopfilter. · 003243c3
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  003243c3
21 Jul, 2010 3 commits

Eliminate one instruction in VP8 dc_add_sse4 · 8731dbd8
Jason Garrett-Glaser authored 14 years ago
```
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
8731dbd8

Various VP8 x86 deblocking speedups · 7dd224a4

Jason Garrett-Glaser authored 14 years ago

SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk

7dd224a4

Make mmx VP8 WHT faster · b8b231b5

Jason Garrett-Glaser authored 14 years ago

Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk

b8b231b5

20 Jul, 2010 2 commits
- VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16) · e9e456d8
  Ronald S. Bultje authored 14 years ago
```
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  e9e456d8
- Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder. · 268821e7
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  268821e7
19 Jul, 2010 4 commits

Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's · c60ed66d
Ronald S. Bultje authored 14 years ago
```
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
c60ed66d
Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions. · 1878f685
Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
1878f685
Be more efficient with registers or stack memory. Saves 8/16 bytes stack · fb9bdf04
Ronald S. Bultje authored 14 years ago
```
for x86-32, or 2 MM registers on x86-64.

Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
fb9bdf04

Change function prototypes for width=8 inner and mbedge loopfilter functions · 3facfc99

Ronald S. Bultje authored 14 years ago

so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.

This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.

Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk

3facfc99

16 Jul, 2010 6 commits
- Attempt to fix x86-64 testsuite on fate. · 819b2dd2
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  819b2dd2
- Remove duplicate define. · 6f323f12
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  6f323f12
- Revert 24270, it contained some stuff that shouldn't have been in there. · 889b2c26
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  889b2c26
- Remove duplicate define. · 2356a783
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  2356a783
- Give x86 r%d registers names, this will simplify implementation of the chroma · ede1b966
  Ronald S. Bultje authored 14 years ago
```
inner loopfilter, and it also allows us to save one register on x86-64/sse2.

Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  ede1b966
- Change return statement, the REP_RET is a mistake since the else case (x86-64, · 526e831a
  Ronald S. Bultje authored 14 years ago
```
sse2) doesn't actually loop, so REP_RET isn't necessary.

Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  526e831a