Commits · fe40dc1cecdf152ffacff6df1d9c5f0c7daced85 · Linshizhi / ffmpeg.wasm-core

03 Aug, 2012 1 commit

x86: build: replace mmx2 by mmxext · 239fdf1b

Diego Biurrun authored 12 years ago

Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.

239fdf1b

02 Aug, 2012 1 commit
- x86: h264dsp: K&R formatting cosmetics · 81905088
  Diego Biurrun authored 12 years ago
  
  81905088
31 Jul, 2012 3 commits
- x86: h264dsp: Remove unused variable ff_pb_3_1 · 6376a3ad
  Diego Biurrun authored 12 years ago
  
  6376a3ad
- x86: h264dsp: Adjust YASM #ifdefs · 8728b381
  Diego Biurrun authored 12 years ago
```
This fixes compilation with YASM disabled.
```
  8728b381
- h264: convert loop filter strength dsp function to yasm. · b829b4ce
  Ronald S. Bultje authored 12 years ago
```
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
```
  b829b4ce
28 Jul, 2012 1 commit
- h264_loopfilter: port x86 simd to cpuflags. · a5bbb124
  Ronald S. Bultje authored 12 years ago
  
  a5bbb124
23 Jun, 2012 1 commit
- x86: Only use optimizations with cmov if the CPU supports the instruction · fe07c9c6
  Diego Biurrun authored 12 years ago
  
  fe07c9c6
10 Jun, 2012 1 commit
- libavcodec/x86/h264dsp_mmx.c: add forgotten HAVE_YASM · 915ec91e
  Michael Niedermayer authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  915ec91e
12 Feb, 2012 1 commit

Detect and check for CMOV. · b2230355

Reimar Döffinger authored 13 years ago

Some MMX-only CPUs do not have support for CMOV.
All SSE/MMX2 CPUs should be fine, thus no check was
added to those functions.
See also https://sourceforge.net/tracker/?func=detail&aid=3358347&group_id=205275&atid=992986Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>

b2230355

21 Oct, 2011 3 commits
- H264: change weight/biweight functions to take a height argument. · c2d33742
  Ronald S. Bultje authored 13 years ago
```
Neon parts by Mans Rullgard <mans@mansr.com>.
```
  c2d33742
- Support for lossless and inter H264 4:2:2. · 229d263c
  Ronald S. Bultje authored 13 years ago
  
  229d263c
- h264: 4:2:2 intra decoding support · 76741b0e
  Baptiste Coudurier authored 13 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  76741b0e
14 Aug, 2011 1 commit
- h264dec: h264: 4:2:2 intra decoding · 231a6df9
  Baptiste Coudurier authored 13 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  231a6df9
11 Jul, 2011 1 commit
- H.264: add filter_mb_fast support for >8-bit decoding · b5bbc84f
  Jason Garrett-Glaser authored 13 years ago
```
Much faster high bit depth deblocking.
```
  b5bbc84f
21 Jun, 2011 1 commit
- h264: Add x86 assembly for 10-bit weight/biweight H.264 functions. · 84e70ef0
  Daniel Kang authored 13 years ago
```
Mainly ported from 8-bit H.264 weight/biweight.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  84e70ef0
16 Jun, 2011 1 commit
- Fix compilation with old yasm. · 5fb67d80
  Carl Eugen Hoyos authored 13 years ago
  
  5fb67d80
01 Jun, 2011 1 commit

h264/10bit: add HAVE_ALIGNED_STACK checks. · f3aa65af

Daniel Kang authored 13 years ago

Fixes regression in 836f47d3 in ICC-10.x,
since ICC<=11.0 doesn't align stack upon function calls.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

f3aa65af

31 May, 2011 2 commits

Update 8-bit H.264 IDCT function names to reflect bit-depth. · 348493db
Daniel Kang authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
```
348493db

Add IDCT functions for 10-bit H.264. · 836f47d3

Daniel Kang authored 13 years ago

Ports the majority of IDCT functions for 10-bit H.264.

Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>

836f47d3

16 May, 2011 1 commit

h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64. · 257de5fb

Gil Pedersen authored 13 years ago

This fixes linking errors due to undefined symbols on x86_64 OS X.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

257de5fb

11 May, 2011 3 commits
- 10-bit H.264 x86 chroma v loopfilter asm · 5705b020
  Jason Garrett-Glaser authored 13 years ago
```
Also delete some unused deblock asm macros.
```
  5705b020
- Port x86 10-bit H.264 deblock asm from x264 · 9f3d6ca4
  Jason Garrett-Glaser authored 13 years ago
  
  9f3d6ca4
- Update x86 H.264 deblock asm · 8ad77b65
  Jason Garrett-Glaser authored 13 years ago
```
Includes AVX versions from x264.
```
  8ad77b65
10 May, 2011 2 commits

h264dsp_mmx: place bracket outside #if/#endif block. · 86b29553
Ronald S. Bultje authored 13 years ago
```
Should fix compile on systems missing yasm/nasm.
```
86b29553

Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. · 19a0729b

Oskar Arvidsson authored 13 years ago

This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

19a0729b

12 Apr, 2011 1 commit
- Fix compilation with --disable-yasm. · 5c006875
  Carl Eugen Hoyos authored 13 years ago
  
  5c006875
10 Apr, 2011 1 commit

Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. · 8dbe5856

Oskar Arvidsson authored 13 years ago

This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

8dbe5856

19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
14 Jan, 2011 1 commit

H.264: split luma dc idct out and implement MMX/SSE2 versions · 19fb234e

Jason Garrett-Glaser authored 14 years ago

About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk

19fb234e

29 Sep, 2010 6 commits

Move static inline function to a macro, so that constant propagation in · a52ffc3f

Ronald S. Bultje authored 14 years ago

inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk

a52ffc3f

Merge b_idx and edge variables, and optimize the ASM to directly load variables · cd17285e

Ronald S. Bultje authored 14 years ago

from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().

Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk

cd17285e

Remove mv_mask variable. Replace the related pand -1/0 instructions by either · 0cc8a5d0

Ronald S. Bultje authored 14 years ago

a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk

0cc8a5d0

Remove d_idx as a variable, and instead load it as a constant in the asm. · c0673f2c

Ronald S. Bultje authored 14 years ago

This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk

c0673f2c

Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid · 2c3135f6

Ronald S. Bultje authored 14 years ago

of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk

2c3135f6

Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows · 4b81511c

Ronald S. Bultje authored 14 years ago

inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk

4b81511c

24 Sep, 2010 1 commit
- Remove unused variable. · 7e117771
  Ronald S. Bultje authored 14 years ago
```
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  7e117771
21 Sep, 2010 1 commit
- x86: disable SSE functions using stack when stack is not aligned · c0bc8b9a
  Måns Rullgård authored 14 years ago
```
This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  c0bc8b9a
18 Sep, 2010 1 commit
- x86: remove hack disabling sse2 h264 loop filter with 32-bit icc · f41237c9
  Måns Rullgård authored 14 years ago
```
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  f41237c9
14 Sep, 2010 1 commit

Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from · 1d16a1cf

Ronald S. Bultje authored 14 years ago

h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk

1d16a1cf

10 Sep, 2010 1 commit

LGPL SSE2 H.264 iDCT · 8acb554a

Jason Garrett-Glaser authored 14 years ago

This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk

8acb554a