Commits · 1d0817d56b66797118880358ea7d7a2acfdca429 · Linshizhi / ffmpeg.wasm-core

15 May, 2017 5 commits
- avcodec/h264: add sse2 versions of previous idct functions · 7aa90b4e
  James Darnley authored 7 years ago
```
Kaby Lake Pentium:
 - ff_h264_idct_add_8_sse2:    ~1.18x faster than mmxext
 - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
```
  7aa90b4e
- avcodec/h264: add avx 8-bit h264_idct_dc_add · 27460dfe
  James Darnley authored 7 years ago
```
Haswell:
 - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext

Skylake-U:
 - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
```
  27460dfe
- avcodec/h264: add avx 8-bit h264_idct_add · f61d454c
  James Darnley authored 7 years ago
```
Haswell:
 - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext

Skylake-U:
 - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
```
  f61d454c
- avcodec/h264: use some 3 operand forms · b5325c67
  James Darnley authored 7 years ago
  
  b5325c67
- avcodec/h264: change RETs into REP_RETs where appropriate · 060ba9e5
  James Darnley authored 7 years ago
  
  060ba9e5
14 Mar, 2017 1 commit
- x86: h264: Simplify DEQUANT macro with cpuflags · e9bb77fb
  Diego Biurrun authored 7 years ago
  
  e9bb77fb
30 Nov, 2016 1 commit
- avcodec/h264: mmx 4:2:2 idct add8 function · 1dae7ffa
  James Darnley authored 8 years ago
```
2.87 times faster (1830 vs. 638 cycles)
```
  1dae7ffa
16 Jun, 2016 1 commit
- x86: Add missing movsxd for the int stride parameter · f1a9eee4
  Martin Storsjö authored 8 years ago
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  f1a9eee4
13 Mar, 2014 1 commit
- x86: Make function prototype comments in assembly code consistent · 55519926
  Diego Biurrun authored 11 years ago
```
This helps grepping for functions, among other things.
```
  55519926
07 Oct, 2013 1 commit
- x86: h264_idct: Update comments to match 8/10-bit depth optimization split · 6405ca7d
  Diego Biurrun authored 11 years ago
  
  6405ca7d
21 Aug, 2013 1 commit
- x86: h264_idct: Remove incorrect comment · 0b45269c
  Diego Biurrun authored 11 years ago
  
  0b45269c
10 Apr, 2013 1 commit

h264: Integrate clear_blocks calls with IDCT · 62844c3f

Ronald S. Bultje authored 12 years ago

The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>

62844c3f

19 Feb, 2013 1 commit

h264: integrate clear_blocks calls with IDCT. · 1acd7d59

Ronald S. Bultje authored 12 years ago

The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

1acd7d59

23 Jan, 2013 1 commit

Drop DCTELEM typedef · 88bd7fdc

Diego Biurrun authored 12 years ago

It does not help as an abstraction and adds dsputil dependencies.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

88bd7fdc

27 Nov, 2012 1 commit
- x86: h264_idct: port to cpuflags · 2e89aeed
  Diego Biurrun authored 12 years ago
  
  2e89aeed
13 Nov, 2012 1 commit
- x86: mmx2 ---> mmxext in asm constructs · 26301caa
  Diego Biurrun authored 12 years ago
  
  26301caa
31 Oct, 2012 1 commit
- x86: MMX2 ---> MMXEXT in macro names · 588fafe7
  Diego Biurrun authored 12 years ago
  
  588fafe7
30 Oct, 2012 2 commits
- x86: yasm: Use complete source path for macro helper %includes · 04581c8c
  Diego Biurrun authored 12 years ago
```
This is more consistent with the way we handle C #includes and
it simplifies the build system.
```
  04581c8c
- x86: include x86inc.asm in x86util.asm · 6860b408
  Diego Biurrun authored 12 years ago
```
This is necessary to allow refactoring some x86util macros with cpuflags.
```
  6860b408
07 Aug, 2012 1 commit

x86: add colons after labels · a3df4781

Mans Rullgard authored 12 years ago

nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>

a3df4781

05 Aug, 2012 1 commit
- x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2 · 20968575
  Diego Biurrun authored 12 years ago
  
  20968575
11 Apr, 2012 1 commit

x86inc improvements for 64-bit · 729f90e2

Henrik Gramner authored 12 years ago

Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>

729f90e2

08 Feb, 2012 1 commit

h264: manually save/restore XMM registers for functions using INIT_MMX. · ce1e250e

Ronald S. Bultje authored 13 years ago

On Win64, these registers are callee-save, so not saving/restoring them
correctly is a violation of ABI and can lead to crashes or corrupt data.

ce1e250e

27 Jan, 2012 1 commit
- config.asm: change %ifdef directives to %if directives. · 3b15a6d7
  Ronald S. Bultje authored 13 years ago
```
This allows combining multiple conditionals in a single statement.
```
  3b15a6d7
19 Oct, 2011 1 commit
- Move x264asm to libavutil. · b1766c17
  Kieran Kunhya authored 13 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  b1766c17
15 Aug, 2011 1 commit
- Fix NASM include directive · cc73511e
  Dave Yeo authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  cc73511e
12 Aug, 2011 2 commits
- Move x86util.asm from libavcodec/ to libavutil/. · b2c08787
  Ronald S. Bultje authored 13 years ago
```
This allows using it in swscale also.
```
  b2c08787
- Move x86inc.asm to libavutil/. · 3a39195b
  Ronald S. Bultje authored 13 years ago
```
This allows using it in libswscale/ also.
```
  3a39195b
29 Jul, 2011 1 commit
- H.264: tweak some other x86 asm for Atom · a3bf7b86
  Jason Garrett-Glaser authored 13 years ago
  
  a3bf7b86
14 Jun, 2011 1 commit

4:4:4 H.264 decoding support · c90b9442

Jason Garrett-Glaser authored 13 years ago

Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.

c90b9442

13 Jun, 2011 2 commits
- Roll back 4:4:4 H.264 for now · 504811ba
  Jason Garrett-Glaser authored 13 years ago
```
Needs some ARM/PPC asm modifications.
```
  504811ba
- 4:4:4 H.264 decoding support · c9c49387
  Jason Garrett-Glaser authored 13 years ago
```
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
```
  c9c49387
31 May, 2011 1 commit
- Update 8-bit H.264 IDCT function names to reflect bit-depth. · 348493db
  Daniel Kang authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
```
  348493db
17 May, 2011 1 commit

Modify x86util.asm to ease transitioning to 10-bit H.264 assembly. · d0005d34

Daniel Kang authored 13 years ago

Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

d0005d34

14 May, 2011 1 commit
- Fix FSF address copy paste error in some license headers. · 888fa31e
  Diego Biurrun authored 13 years ago
  
  888fa31e
19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
14 Jan, 2011 1 commit

H.264: split luma dc idct out and implement MMX/SSE2 versions · 19fb234e

Jason Garrett-Glaser authored 14 years ago

About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk

19fb234e

26 Sep, 2010 1 commit
- Add d suffix to movd target register to make it work with nasm. · 02b424d9
  Reimar Döffinger authored 14 years ago
```
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  02b424d9
24 Sep, 2010 2 commits

Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this · ae112918

Ronald S. Bultje authored 14 years ago

inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk

ae112918

Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the · 4bca6774

Ronald S. Bultje authored 14 years ago

code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk

4bca6774