Commits · e0c973e5bea94dc70baf20d5a36e123b1ca1f901 · Linshizhi / ffmpeg.wasm-core

14 Sep, 2018 1 commit
- x86/float_dsp: add ff_vector_dmul_{sse2,avx} · 9d002d78
  James Almer authored 6 years ago
```
~3x to 5x faster.
Signed-off-by: James Almer <jamrial@gmail.com>
```
  9d002d78
01 Aug, 2018 2 commits
- x86/pixelutils: don't use the AVX2 functions on CPUs known to be slow with them · 481741ec
  James Almer authored 6 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  481741ec
- x86/pixelutils: add missing preprocessor wrapper to the AVX2 functions · d5b3077e
  James Almer authored 6 years ago
```
Should fix compilation with old yasm/nasm
Signed-off-by: James Almer <jamrial@gmail.com>
```
  d5b3077e
31 Jul, 2018 1 commit

avutil/pixelutils: sad_32x32 sse2/avx2 optimizations. · d36b8394

Jun Zhao authored 6 years ago

add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2,
ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2

use perf record/report profiling, get instructions:u for avx2 sad_32x32:

  72.05%  pixelutils  pixelutils     [.] block_sad_32x32_c
  18.50%  pixelutils  pixelutils     [.] block_sad_16x16_c
   4.78%  pixelutils  pixelutils     [.] block_sad_8x8_c
   2.69%  pixelutils  pixelutils     [.] block_sad_4x4_c
   0.89%  pixelutils  pixelutils     [.] block_sad_2x2_c
   0.16%  pixelutils  pixelutils     [.] ff_pixelutils_sad_32x32_avx2
   0.16%  pixelutils  pixelutils     [.] ff_pixelutils_sad_u_32x32_avx2
   0.12%  pixelutils  pixelutils     [.] ff_pixelutils_sad_a_32x32_avx2

sse2 sad_32x32 instructions:u like:

  71.86%  pixelutils  pixelutils     [.] block_sad_32x32_c
  18.42%  pixelutils  pixelutils     [.] block_sad_16x16_c
   4.81%  pixelutils  pixelutils     [.] block_sad_8x8_c
   2.68%  pixelutils  pixelutils     [.] block_sad_4x4_c
   0.88%  pixelutils  pixelutils     [.] block_sad_2x2_c
   0.29%  pixelutils  pixelutils     [.] ff_pixelutils_sad_32x32_sse2
   0.26%  pixelutils  pixelutils     [.] ff_pixelutils_sad_u_32x32_sse2
   0.23%  pixelutils  pixelutils     [.] ff_pixelutils_sad_a_32x32_sse2
Signed-off-by: Jun Zhao <mypopydev@gmail.com>

d36b8394

19 Jul, 2018 1 commit
- lavu/x86/cpu: Fix aesni detection · b23c4a9d
  alexander schmid authored 6 years ago
  
  b23c4a9d
11 Jul, 2018 1 commit
- avutil/pixelutils: correct the function name in comments · 09628cb1
  Jun Zhao authored 6 years ago
```
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
```
  09628cb1
06 Feb, 2018 1 commit
- Drop some unnecessary config.h #includes · 4cf84e25
  Diego Biurrun authored 7 years ago
  
  4cf84e25
20 Jan, 2018 5 commits

x86inc: Drop cpuflags_slowctz · 6f62b0bd
Henrik Gramner authored 7 years ago

6f62b0bd
x86inc: Correctly set mmreg variables · eb5f063e
Henrik Gramner authored 7 years ago

eb5f063e

x86inc: Support creating global symbols from local labels · 6b6edd12

Henrik Gramner authored 7 years ago

On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.

6b6edd12

x86inc: Use .rdata instead of .rodata on Windows · 9e4b3675

Henrik Gramner authored 7 years ago

The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.

9e4b3675

x86inc: Enable AVX emulation for floating-point pseudo-instructions · 3a02cbe3

Henrik Gramner authored 7 years ago

There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.

3a02cbe3

25 Dec, 2017 1 commit
- x86inc: set the correct amount of simd regs in x86_64 when avx512 is enabled but not used · 90d216cb
  James Almer authored 7 years ago
```
Fixes compilation of libavresample/x86/audio_mix.asm

Reviewed-by: Gramner
Signed-off-by: James Almer <jamrial@gmail.com>
```
  90d216cb
24 Dec, 2017 4 commits

x86inc: AVX-512 support · f7197f68

Henrik Gramner authored 7 years ago

AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
 * AVX-512 Foundation (F)
 * AVX-512 Conflict Detection Instructions (CD)
 * AVX-512 Byte and Word Instructions (BW)
 * AVX-512 Doubleword and Quadword Instructions (DQ)
 * AVX-512 Vector Length Extensions (VL)

On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.

Big thanks to Intel for their support.

f7197f68

avutil: add alignment needed for AVX-512 · e2218ed8
James Darnley authored 7 years ago

e2218ed8
avutil: detect when AVX-512 is available · 4783a01c
James Darnley authored 7 years ago

4783a01c
avutil: add AVX-512 flags · 8b81eabe
James Darnley authored 7 years ago

8b81eabe

02 Dec, 2017 1 commit

avutil/x86util : add macro for loading a 128 bits constants in an xmm or in... · b37196ad

Martin Vignali authored 7 years ago

avutil/x86util : add macro for loading a 128 bits constants in an xmm or in each part of an ymm in order to simplify avx2 asm func

b37196ad

25 Oct, 2017 1 commit

Don't use _tzcnt instrinics with clang for windows w/o BMI. · 50e30d9b

Dale Curtis authored 7 years ago

Technically _tzcnt* intrinsics are only available when the BMI
instruction set is present. However the instruction encoding
degrades to "rep bsf" on older processors.

Clang for Windows debatably restricts the _tzcnt* instrinics behind
the __BMI__ architecture define, so check for its presence or
exclude the usage of these intrinics when clang is present.

See also:
https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html
https://bugs.llvm.org/show_bug.cgi?id=30506
http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.htmlSigned-off-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

50e30d9b

09 Oct, 2017 1 commit
- cpu: split flag checks per arch in av_cpu_max_align() · 3d828c9f
  James Almer authored 7 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
```
  3d828c9f
28 Sep, 2017 1 commit
- avutil/cpu: split flag checks per arch in av_cpu_max_align() · 3b345d38
  James Almer authored 7 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  3b345d38
18 Aug, 2017 1 commit

Add macros to x86util.asm . · 30ae07d7

Ivan Kalvachev authored 7 years ago

Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>

30ae07d7

27 Jun, 2017 1 commit

x86inc: don't use read-only data sections on COFF targets · 4d62ee67

James Almer authored 7 years ago

Yasm:
src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections
src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align'

Nasm:
src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification
src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here
Tested-by: Clément Bœsch <u@pkh.me>
Signed-off-by: James Almer <jamrial@gmail.com>

4d62ee67

21 Jun, 2017 1 commit

build: Generalize yasm/nasm-related variable names · fd502f4f

Diego Biurrun authored 8 years ago

None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4)
Signed-off-by: James Almer <jamrial@gmail.com>

fd502f4f

19 Jun, 2017 1 commit
- x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} · e229df94
  James Almer authored 7 years ago
```
About 2x faster than the c version.
```
  e229df94
12 Jun, 2017 1 commit

x86inc: Add some additional cpuflag relations · aad1b678

Henrik Gramner authored 7 years ago

Simplifies writing assembly code that depends on available instructions.

LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2

aad1b678

09 Jun, 2017 4 commits

x86inc: Remove argument from WIN64_RESTORE_XMM · d991b3e8

Anton Mitrofanov authored 7 years ago

The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.

d991b3e8

x86inc: Prefer r14/r15 over r12/r13 on x86-64 · cd4ca824

Henrik Gramner authored 7 years ago

Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.

r14 and r15 doesn't have that issue, so prefer using them.

cd4ca824

x86inc: Make REP_RET identical to RET in SSSE3+ functions · 88dcdfad
Henrik Gramner authored 7 years ago
```
There's no point in emitting a rep prefix before ret on modern CPUs.
```
88dcdfad

x86inc: Fix call with memory operands · 406e0ddc

Henrik Gramner authored 7 years ago

We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.

406e0ddc

13 May, 2017 1 commit
- x86/float_dsp: remove usage of integer instructions · 0fbc7a21
  James Almer authored 7 years ago
  
  0fbc7a21
12 Apr, 2017 1 commit
- x86/float_dsp: add ff_vector_fmul_reverse_avx2 · f1d80bc6
  James Almer authored 7 years ago
```
~20% faster than AVX.
Signed-off-by: James Almer <jamrial@gmail.com>
```
  f1d80bc6
10 Apr, 2017 1 commit
- x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3} · ed9b25a1
  James Almer authored 7 years ago
  
  ed9b25a1
21 Mar, 2017 1 commit
- avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same · d8962ffb
  James Almer authored 7 years ago
```
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  d8962ffb
14 Mar, 2017 1 commit

x86util: Port all macros to cpuflags · 994c4bc1

Diego Biurrun authored 12 years ago

Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.

994c4bc1

01 Mar, 2017 1 commit
- build: Generalize yasm/nasm-related variable names · 39e208f4
  Diego Biurrun authored 8 years ago
```
None of them are specific to the YASM assembler.
```
  39e208f4
18 Feb, 2017 3 commits

avcodec/h264: sse2, avx h luma mbaff deblock/loop filter · 53368878

James Darnley authored 8 years ago

x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)

53368878

x86util: import MOVHL macro · 7627df15

James Darnley authored 8 years ago

Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.

x86: Avoid some bypass delays and false dependencies

A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.

7627df15

avcodec/x86: deduplicate PASS8ROWS macro · 9d815b74
James Darnley authored 8 years ago

9d815b74

03 Feb, 2017 1 commit
- asm: Consistently uppercase SECTION markers · 7abdd026
  Diego Biurrun authored 8 years ago
  
  7abdd026