Commits · 45838561f2f14339acdf53ffa3adbfe8e6db7514 · Linshizhi / ffmpeg.wasm-core

28 Jul, 2012 1 commit
- h264_chromamc_10bit: port x86 simd to cpuflags. · d07ff3cd
  Ronald S. Bultje authored 12 years ago
  
  d07ff3cd
25 Jul, 2012 2 commits

x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5

Ronald S. Bultje authored 12 years ago

This allows compiling with compilers that don't support gcc-style
inline assembly.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

79195ce5

dsputil_mmx: fix incorrect assembly code · 845e92fd

Yang Wang authored 12 years ago

In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.

From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.

This will cause a segmentation fault.

This error was fixed in the second block of the assembly code, but not in
the unrolled loop.

How to reproduce:
    This error is exposed when we build using Intel C++ Compiler, with
    IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

845e92fd

19 Jul, 2012 1 commit
- x86: dsputil: drop some unused CPU flag debug code · 9f97af26
  Diego Biurrun authored 12 years ago
  
  9f97af26
18 Jul, 2012 1 commit

vp3: move idct and loop filter pointers to new vp3dsp context · 28f9ab70

Mans Rullgard authored 12 years ago

This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context.  There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>

28f9ab70

23 Jun, 2012 2 commits
- x86: Only use optimizations with cmov if the CPU supports the instruction · fe07c9c6
  Diego Biurrun authored 12 years ago
  
  fe07c9c6
- x86: move some inline asm macros to the only places they are used · 685f5438
  Mans Rullgard authored 12 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  685f5438
08 Jun, 2012 1 commit
- Add a float DSP framework to libavutil · d5a7229b
  Justin Ruggles authored 12 years ago
```
Move vector_fmul() from DSPContext to AVFloatDSPContext.
```
  d5a7229b
21 May, 2012 1 commit
- Convert vector_fmul range of functions to YASM and add AVX versions · 5ff01259
  Kieran Kunhya authored 12 years ago
```
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
```
  5ff01259
10 May, 2012 1 commit

rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc

Christophe Gisquet authored 12 years ago

Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

110d0cdc

28 Apr, 2012 1 commit
- dsputil x86: revert a test back to its previous value · e75d1d4f
  Christophe GISQUET authored 12 years ago
```
Commit 356ee8d7 caused the initial inversion.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  e75d1d4f
21 Apr, 2012 2 commits

Remove lowres video decoding · 2bcbd984

Mans Rullgard authored 12 years ago

This feature is complex, of questionable utility, and slows down
normal decoding.
Signed-off-by: Mans Rullgard <mans@mansr.com>

2bcbd984

avcodec: remove AVCodecContext.dsp_mask · 95510be8

Mans Rullgard authored 12 years ago

This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump.  It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>

95510be8

04 Apr, 2012 1 commit
- dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype · cd88105f
  Christophe GISQUET authored 12 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  cd88105f
25 Mar, 2012 4 commits
- x86: dsputil: prettyprint gcc inline asm · 62ce9def
  Diego Biurrun authored 12 years ago
  
  62ce9def
- x86: K&R prettyprinting cosmetics for dsputil_mmx.c · 3b549121
  Diego Biurrun authored 12 years ago
  
  3b549121
- x86: conditionally compile H.264 QPEL optimizations · 915a2a0a
  Diego Biurrun authored 13 years ago
  
  915a2a0a
- dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. · 3816642e
  Diego Biurrun authored 12 years ago
```
This makes them safe to use in non-fully braced if-blocks and similar.
```
  3816642e
05 Mar, 2012 1 commit

x86: clean up ff_dsputil_init_mmx() · 356ee8d7

Mans Rullgard authored 12 years ago

This splits ff_dsputil_init_mmx() into multiple functions, one for
each MMX/SSE level, somewhat simplifying the nested conditions.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>

356ee8d7

15 Feb, 2012 1 commit
- dsputil: Add ff_ prefix to the dsputil*_init* functions · 9cf0841e
  Martin Storsjö authored 13 years ago
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  9cf0841e
30 Jan, 2012 1 commit

x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf · 6b039003

Christophe Gisquet authored 13 years ago

While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.

For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2:                        ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads:   ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>

6b039003

29 Jan, 2012 1 commit
- png: move DSP functions to their own DSP context. · e9200351
  Ronald S. Bultje authored 13 years ago
  
  e9200351
25 Jan, 2012 1 commit

dsputil: use vertical component for drawing bottom edge. · c3af52fa

Ronald S. Bultje authored 13 years ago

Current code only writes 8 pixels of vertical edge for YUV422, which
causes MC artifacts when subsequent frames use data from that edge.

c3af52fa

14 Dec, 2011 1 commit
- build: conditionally compile x86 H.264 chroma optimizations · 88b97357
  Diego Biurrun authored 13 years ago
  
  88b97357
22 Nov, 2011 1 commit
- dsputil: use movups instead of movdqu in ff_emu_edge_core_sse() · 395f2e70
  Justin Ruggles authored 13 years ago
```
This allows emulated_edge_mc_sse() and gmc_sse() to be used under
AV_CPU_FLAG_SSE.
```
  395f2e70
11 Nov, 2011 1 commit
- twinvq: add SSE/AVX optimized sum/difference stereo interleaving · 9d06037d
  Justin Ruggles authored 13 years ago
  
  9d06037d
07 Nov, 2011 1 commit
- dsputil: use cpuflags in x86 versions of vector_clip_int32() · b8f02f5b
  Justin Ruggles authored 13 years ago
  
  b8f02f5b
26 Oct, 2011 1 commit
- H.264: Cometics to dsputil_mmx.c · ded3e9f0
  Daniel Kang authored 13 years ago
```
Add whitespace.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  ded3e9f0
11 Oct, 2011 1 commit
- prores: idct sse2/sse4 optimizations. · e3f530fe
  Ronald S. Bultje authored 13 years ago
```
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
```
  e3f530fe
15 Aug, 2011 1 commit
- dsputil_mmx: Honor HAVE_AMD3DNOW · 48f7163f
  Alex Converse authored 13 years ago
  
  48f7163f
11 Aug, 2011 1 commit
- Move RV3/4-specific DSP functions into their own context · d241f51e
  Kostya Shishkov authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  d241f51e
29 Jul, 2011 1 commit
- H.264: tweak some other x86 asm for Atom · a3bf7b86
  Jason Garrett-Glaser authored 13 years ago
  
  a3bf7b86
21 Jul, 2011 1 commit
- dsputil: update per-arch init funcs for non-h264 high bit depth · a617c6aa
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  a617c6aa
20 Jul, 2011 1 commit
- simple_idct: add 10-bit version · e7a972e1
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  e7a972e1
18 Jul, 2011 1 commit
- dsputil: remove disabled code · 65083b49
  Diego Biurrun authored 13 years ago
  
  65083b49
10 Jul, 2011 1 commit

dsputil: remove ff_emulated_edge_mc macro used in one place · 710b8df9

Mans Rullgard authored 13 years ago

This macro can cause problems in conjunction with the bitdepth
template expansion.  It was presumably added to keep source
compatibility when high bitdepth support was added.  However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.
Signed-off-by: Mans Rullgard <mans@mansr.com>

710b8df9

08 Jul, 2011 1 commit

H.264: Add x86 assembly for 10-bit H.264 predict functions · c0483d0c

Daniel Kang authored 13 years ago

Mainly ported from 8-bit H.264 predict.

Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

c0483d0c

04 Jul, 2011 2 commits
- YASM: Shut up unused variable compiler warning with --disable-yasm. · 3c7c16fd
  Daniel Kang authored 13 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  3c7c16fd
- Fix build with --disable-yasm. · 58f7aad0
  Daniel Kang authored 13 years ago
```
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
```
  58f7aad0
03 Jul, 2011 1 commit

H.264: Add x86 assembly for 10-bit H.264 qpel functions. · 9bfa5363

Daniel Kang authored 13 years ago

Mainly ported from 8-bit H.264 qpel.

Some code ported from x264. LGPL ok by author.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

9bfa5363