Commits · 12f13ecb2dcddfa3ee930167395370d3c6fff90c · Linshizhi / ffmpeg.wasm-core

13 Mar, 2014 1 commit
- x86: Make function prototype comments in assembly code consistent · 55519926
  Diego Biurrun authored 10 years ago
```
This helps grepping for functions, among other things.
```
  55519926
07 Oct, 2013 1 commit
- x86: h264_idct: Update comments to match 8/10-bit depth optimization split · 6405ca7d
  Diego Biurrun authored 11 years ago
  
  6405ca7d
21 Aug, 2013 1 commit
- x86: h264_idct: Remove incorrect comment · 0b45269c
  Diego Biurrun authored 11 years ago
  
  0b45269c
10 Apr, 2013 1 commit

h264: Integrate clear_blocks calls with IDCT · 62844c3f

Ronald S. Bultje authored 11 years ago

The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>

62844c3f

23 Jan, 2013 1 commit

Drop DCTELEM typedef · 88bd7fdc

Diego Biurrun authored 11 years ago

It does not help as an abstraction and adds dsputil dependencies.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

88bd7fdc

27 Nov, 2012 1 commit
- x86: h264_idct: port to cpuflags · 2e89aeed
  Diego Biurrun authored 12 years ago
  
  2e89aeed
13 Nov, 2012 1 commit
- x86: mmx2 ---> mmxext in asm constructs · 26301caa
  Diego Biurrun authored 12 years ago
  
  26301caa
31 Oct, 2012 1 commit
- x86: MMX2 ---> MMXEXT in macro names · 588fafe7
  Diego Biurrun authored 12 years ago
  
  588fafe7
30 Oct, 2012 2 commits
- x86: yasm: Use complete source path for macro helper %includes · 04581c8c
  Diego Biurrun authored 12 years ago
```
This is more consistent with the way we handle C #includes and
it simplifies the build system.
```
  04581c8c
- x86: include x86inc.asm in x86util.asm · 6860b408
  Diego Biurrun authored 12 years ago
```
This is necessary to allow refactoring some x86util macros with cpuflags.
```
  6860b408
07 Aug, 2012 1 commit

x86: add colons after labels · a3df4781

Mans Rullgard authored 12 years ago

nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>

a3df4781

05 Aug, 2012 1 commit
- x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2 · 20968575
  Diego Biurrun authored 12 years ago
  
  20968575
11 Apr, 2012 1 commit

x86inc improvements for 64-bit · 729f90e2

Henrik Gramner authored 12 years ago

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>

729f90e2

08 Feb, 2012 1 commit

h264: manually save/restore XMM registers for functions using INIT_MMX. · ce1e250e

Ronald S. Bultje authored 12 years ago

On Win64, these registers are callee-save, so not saving/restoring them
correctly is a violation of ABI and can lead to crashes or corrupt data.

ce1e250e

27 Jan, 2012 1 commit
- config.asm: change %ifdef directives to %if directives. · 3b15a6d7
  Ronald S. Bultje authored 12 years ago
```
This allows combining multiple conditionals in a single statement.
```
  3b15a6d7
15 Aug, 2011 1 commit
- Fix NASM include directive · cc73511e
  Dave Yeo authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  cc73511e
12 Aug, 2011 2 commits
- Move x86util.asm from libavcodec/ to libavutil/. · b2c08787
  Ronald S. Bultje authored 13 years ago
```
This allows using it in swscale also.
```
  b2c08787
- Move x86inc.asm to libavutil/. · 3a39195b
  Ronald S. Bultje authored 13 years ago
```
This allows using it in libswscale/ also.
```
  3a39195b
29 Jul, 2011 1 commit
- H.264: tweak some other x86 asm for Atom · a3bf7b86
  Jason Garrett-Glaser authored 13 years ago
  
  a3bf7b86
14 Jun, 2011 1 commit

4:4:4 H.264 decoding support · c90b9442

Jason Garrett-Glaser authored 13 years ago

Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.

c90b9442

13 Jun, 2011 2 commits
- Roll back 4:4:4 H.264 for now · 504811ba
  Jason Garrett-Glaser authored 13 years ago
```
Needs some ARM/PPC asm modifications.
```
  504811ba
- 4:4:4 H.264 decoding support · c9c49387
  Jason Garrett-Glaser authored 13 years ago
```
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
```
  c9c49387
31 May, 2011 1 commit
- Update 8-bit H.264 IDCT function names to reflect bit-depth. · 348493db
  Daniel Kang authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
```
  348493db
17 May, 2011 1 commit

Modify x86util.asm to ease transitioning to 10-bit H.264 assembly. · d0005d34

Daniel Kang authored 13 years ago

Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

d0005d34

14 May, 2011 1 commit
- Fix FSF address copy paste error in some license headers. · 888fa31e
  Diego Biurrun authored 13 years ago
  
  888fa31e
19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
14 Jan, 2011 1 commit

H.264: split luma dc idct out and implement MMX/SSE2 versions · 19fb234e

Jason Garrett-Glaser authored 13 years ago

About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk

19fb234e

26 Sep, 2010 1 commit
- Add d suffix to movd target register to make it work with nasm. · 02b424d9
  Reimar Döffinger authored 14 years ago
```
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  02b424d9
24 Sep, 2010 2 commits

Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this · ae112918

Ronald S. Bultje authored 14 years ago

inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk

ae112918

Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the · 4bca6774

Ronald S. Bultje authored 14 years ago

code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk

4bca6774

14 Sep, 2010 1 commit

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from · 1d16a1cf

Ronald S. Bultje authored 14 years ago

h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d16a1cf