Commits · a715e5a276c03e0a3a47531d382106ec3390c756 · Linshizhi / ffmpeg.wasm-core

21 Mar, 2017 1 commit
- avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same · d8962ffb
  James Almer authored 7 years ago
```
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  d8962ffb
18 Feb, 2017 3 commits

avcodec/h264: sse2, avx h luma mbaff deblock/loop filter · 53368878

James Darnley authored 7 years ago

x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)

53368878

x86util: import MOVHL macro · 7627df15

James Darnley authored 7 years ago

Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.

x86: Avoid some bypass delays and false dependencies

A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.

7627df15

avcodec/x86: deduplicate PASS8ROWS macro · 9d815b74
James Darnley authored 7 years ago

9d815b74

19 Sep, 2016 1 commit
- x86util: Document SBUTTERFLY macro · 07e1f99a
  Alexandra Hájková authored 8 years ago
```
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
```
  07e1f99a
18 Jul, 2016 1 commit

x86util: Extend SPLATW for avx2 · fd5e6a09

James Almer authored 8 years ago

Integration to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>

fd5e6a09

11 Jul, 2016 1 commit

vp9: add 16x16 idct avx2 (8-bit). · f0a2b624

Ronald S. Bultje authored 8 years ago

checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:

nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4

f0a2b624

08 Jun, 2016 2 commits
- x86/showcqt: use three operand format for some instructions · 172af208
  James Almer authored 8 years ago
```
Fixes failures with yasm 1.1.0 and older
Signed-off-by: James Almer <jamrial@gmail.com>
```
  172af208
- avutil/x86util: move haddps sse emulation from showcqt · 99b89948
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  99b89948
12 Sep, 2015 1 commit

x86: port PSIGNW to cpuflags · d5f8a642

James Almer authored 9 years ago

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

d5f8a642

03 Aug, 2015 1 commit

x86: move XOP emulation code back to x86inc · 5750d6c5

James Almer authored 9 years ago

Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

5750d6c5

31 Dec, 2014 1 commit

x86/swr: add SSE2/AVX pack_8ch functions · 37b35feb

James Almer authored 10 years ago

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

37b35feb

05 Dec, 2014 1 commit

v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 9a738c27

Kieran Kunhya authored 10 years ago

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>

9a738c27

26 Nov, 2014 1 commit
- v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 36091742
  Kieran Kunhya authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  36091742
03 Aug, 2014 1 commit

x86/hevc_deblock: improve 8bit transpose store macros · d0f56ca0

James Almer authored 10 years ago

Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

d0f56ca0

26 Jul, 2014 1 commit

x86/hevc_idct: replace old and unused idct functions · 1ace9573

James Almer authored 10 years ago

Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

1ace9573

15 Jun, 2014 1 commit

x86util: add and use RSHIFT/LSHIFT macros · 91076128

Christophe Gisquet authored 10 years ago

Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

91076128

29 May, 2014 1 commit
- x86: hpeldsp: better factorization · 22670039
  Christophe Gisquet authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  22670039
28 May, 2014 1 commit
- x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} · 561bfc85
  James Almer authored 10 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  561bfc85
17 Apr, 2014 1 commit

x86: move horizontal add macros to x86util · 76ed71a7

James Almer authored 10 years ago

Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

76ed71a7

24 Feb, 2014 1 commit

x86: Move XOP emulation to x86util · 3f3d748c

James Almer authored 10 years ago

We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

3f3d748c

14 Oct, 2013 2 commits

x86inc: FMA3/4 Support · c6908d6b
Jason Garrett-Glaser authored 12 years ago
```
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
```
c6908d6b

x86inc: Remove our FMA4 support · 20689570

Derek Buitenhuis authored 11 years ago

This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

20689570

18 Jan, 2013 2 commits

x86inc: Add cvisible macro for C functions with public prefix · d633d12b
Diego Biurrun authored 11 years ago
```
This allows defining externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
d633d12b

x86inc: Rename "program_name" to "private_prefix" · ef5d41a5

Diego Biurrun authored 11 years ago

The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

ef5d41a5

15 Jan, 2013 3 commits
- x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags · dae1d507
  Diego Biurrun authored 12 years ago
  
  dae1d507
- x86: ABSB2: port to cpuflags · 320e1d0d
  Diego Biurrun authored 12 years ago
  
  320e1d0d
- x86: ABSB: port to cpuflags · 094a7405
  Diego Biurrun authored 12 years ago
  
  094a7405
14 Jan, 2013 1 commit
- x86: ABS2: port to cpuflags · 51969a65
  Diego Biurrun authored 12 years ago
  
  51969a65
06 Jan, 2013 1 commit
- x86: ABS1: port to cpuflags · 5b4dfbff
  Diego Biurrun authored 12 years ago
  
  5b4dfbff
05 Dec, 2012 1 commit
- float_dsp: add vector_dmul_scalar() to multiply a vector of doubles · ac7eb4cb
  Justin Ruggles authored 12 years ago
```
Include x86-optimized versions for SSE2 and AVX.
```
  ac7eb4cb
18 Nov, 2012 1 commit
- x86: SPLATD: port to cpuflags · 87af05c5
  Diego Biurrun authored 12 years ago
  
  87af05c5
13 Nov, 2012 1 commit
- x86: mmx2 ---> mmxext in asm constructs · 26301caa
  Diego Biurrun authored 12 years ago
  
  26301caa
11 Nov, 2012 1 commit
- x86inc: Set program_name outside of x86inc.asm · f0d124f0
  Diego Biurrun authored 12 years ago
```
This reduces the local difference to the x264 upstream version.
```
  f0d124f0
09 Nov, 2012 1 commit
- x86: PALIGNR: port to cpuflags · 4b60fac4
  Diego Biurrun authored 12 years ago
  
  4b60fac4
05 Nov, 2012 1 commit
- x86: PABSW: port to cpuflags · dbb37e77
  Diego Biurrun authored 12 years ago
  
  dbb37e77
02 Nov, 2012 3 commits
- x86: Refactor PSWAPD fallback implementations and port to cpuflags · 0a7a94f2
  Diego Biurrun authored 12 years ago
  
  0a7a94f2
- x86: PMINUB: port to cpuflags · 26f01bd1
  Diego Biurrun authored 12 years ago
  
  26f01bd1
- x86util: Add cpuflags_mmxext alias for cpuflags_mmx2 · 61bc2bc7
  Diego Biurrun authored 12 years ago
```
"mmxext" is a more sensible name and more common in outside projects.
```
  61bc2bc7
31 Oct, 2012 1 commit

x86: Fix assembly with NASM · 264f1234

Dave Yeo authored 12 years ago

Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

264f1234