Commits · ddb9a24a7f11c448263ef9b163e8425832cb089c · Linshizhi / ffmpeg.wasm-core

05 Dec, 2014 1 commit

v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 9a738c27

Kieran Kunhya authored 10 years ago

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>

9a738c27

26 Nov, 2014 1 commit
- v210enc: Add SIMD optimised 8-bit and 10-bit encoders · 36091742
  Kieran Kunhya authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  36091742
28 Sep, 2014 1 commit
- avutil/lls: Make unchanged function arguments const · 579a0fdc
  Michael Niedermayer authored 10 years ago
```
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  579a0fdc
27 Sep, 2014 1 commit
- avutil/x86/cpu: fix cpuid sub-leaf selection · e58fc446
  lvqcl authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  e58fc446
09 Sep, 2014 3 commits
- x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags · f629705b
  Henrik Gramner authored 10 years ago
```
Previously there was a limit of two cpuflags.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  f629705b
- x86inc: Free up variable name "n" in global namespace · ec217218
  Loren Merritt authored 10 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  ec217218
- x86inc: Make ym# behave the same way as xm# · 176a0fca
  Henrik Gramner authored 10 years ago
```
This makes more sense for future implementations of templates with zmm registers.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  176a0fca
05 Sep, 2014 1 commit
- x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags · 428aa14a
  Henrik Gramner authored 10 years ago
```
Previously there was a limit of two cpuflags.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  428aa14a
04 Sep, 2014 2 commits
- x86inc: Make ym# behave the same way as xm# · 720c21d1
  Henrik Gramner authored 10 years ago
```
This makes more sense for future implementations of templates with zmm registers.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  720c21d1
- x86inc: free up variable name "n" in global namespace · a4dbabc8
  Loren Merritt authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  a4dbabc8
23 Aug, 2014 2 commits

avutil/pixelutils: faster pixelutils_sad_16x16 · 554d8190
Clément Bœsch authored 10 years ago
```
501 to 439 decicycles.

See 45c7f399.
```
554d8190

avutil/pixelutils: faster pixelutils_sad_[au]_16x16 · 45c7f399

Clément Bœsch authored 10 years ago

~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.

45c7f399

09 Aug, 2014 1 commit
- drop LLS1, rename LLS2 to LLS · 70b8668f
  Michael Niedermayer authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  70b8668f
05 Aug, 2014 1 commit
- avutil: add pixelutils API · 28a2107a
  Clément Bœsch authored 10 years ago
  
  28a2107a
03 Aug, 2014 1 commit

x86/hevc_deblock: improve 8bit transpose store macros · d0f56ca0

James Almer authored 10 years ago

Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

d0f56ca0

26 Jul, 2014 1 commit

x86/hevc_idct: replace old and unused idct functions · 1ace9573

James Almer authored 10 years ago

Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

1ace9573

01 Jul, 2014 1 commit
- Update Fiona's name in copyright statements. · 79793f83
  Diego Biurrun authored 10 years ago
  
  79793f83
15 Jun, 2014 1 commit

x86util: add and use RSHIFT/LSHIFT macros · 91076128

Christophe Gisquet authored 10 years ago

Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

91076128

08 Jun, 2014 3 commits

x86/float_dsp: add missing femms · 85065d2a

James Almer authored 10 years ago

It was lost during the port.
Should fix fate on 3dnowext machines.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

85065d2a

x86/float_dsp: port vector_fmul_window to yasm · dcaf9660

James Almer authored 10 years ago

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

dcaf9660

x86/vp9: inital AVX2 intra_pred · fc8db12a

James Almer authored 10 years ago

tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz

1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips

3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips

1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips

2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips

3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips

1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

fc8db12a

29 May, 2014 1 commit
- x86: hpeldsp: better factorization · 22670039
  Christophe Gisquet authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  22670039
28 May, 2014 1 commit
- x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} · 561bfc85
  James Almer authored 10 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  561bfc85
07 May, 2014 1 commit
- inline asm: fix arrays as named constraints. · 1898c2f4
  Matt Oliver authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  1898c2f4
19 Apr, 2014 1 commit

x86/float_dsp: remove duplicated code from vector_dmul_scalar · 3b06208a

James Almer authored 10 years ago

Use the xm# and ym# aliases as they remain in sync with m# after a SWAP.
No actual changes to the assembly.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

3b06208a

17 Apr, 2014 1 commit

x86: move horizontal add macros to x86util · 76ed71a7

James Almer authored 10 years ago

Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

76ed71a7

16 Apr, 2014 2 commits

x86/float_dsp: unroll loop in vector_fmac_scalar · 11b36b1e

James Almer authored 10 years ago

~6% faster SSE2 performance. AVX/FMA3 are unaffected.
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

11b36b1e

x86/float_dsp: use SWAP in vector_fmac_scalar Win64 · 3b808900

James Almer authored 10 years ago

The mova is unnecessary
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

3b808900

25 Mar, 2014 1 commit

x86/cpu: check for OS support before enabling AVX2 · 2d9821a2

James Almer authored 11 years ago

AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2d9821a2

18 Mar, 2014 1 commit

Automatically change MANGLE() into named inline asm operands when direct... · 82367475

Matt Oliver authored 11 years ago

Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.

This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

82367475

13 Mar, 2014 1 commit

x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3 · 7d7487e8

James Almer authored 11 years ago

~7% faster than AVX
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

7d7487e8

09 Mar, 2014 1 commit
- avutil/timer: Fix units for x86 after c708b540 · 4159f702
  Michael Niedermayer authored 11 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  4159f702
24 Feb, 2014 1 commit

x86: Move XOP emulation to x86util · 3f3d748c

James Almer authored 11 years ago

We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

3f3d748c

23 Feb, 2014 3 commits
- x86: add detection for Bit Manipulation Instruction sets · d59fcdaf
  James Almer authored 11 years ago
```
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
```
  d59fcdaf
- x86: add detection for FMA3 instruction set · 1b932eb1
  James Almer authored 11 years ago
```
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
```
  1b932eb1
- x86: add missing XOP checks and macros · 10b0161d
  James Almer authored 11 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  10b0161d
22 Feb, 2014 2 commits

x86: add detection for Bit Manipulation Instruction sets · 0bc3de19

James Almer authored 11 years ago

Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

0bc3de19

x86: add detection for FMA3 instruction set · a2af8edd

James Almer authored 11 years ago

Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

a2af8edd

20 Feb, 2014 1 commit

x86: float dsp: unroll SSE versions · 996697e2

Christophe Gisquet authored 11 years ago

vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>

996697e2

15 Feb, 2014 1 commit

x86: float dsp: unroll SSE versions · 133b3420

Christophe Gisquet authored 11 years ago

vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

133b3420