Commits · a37a28177826f3ee1be1762b96b54012060917ba · Linshizhi / ffmpeg.wasm-core

28 Mar, 2012 7 commits
- cabac: add overread protection to BRANCHLESS_GET_CABAC(). · a9401981
  Ronald S. Bultje authored 12 years ago
```
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
```
  a9401981
- cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC(). · 448dc425
  Ronald S. Bultje authored 12 years ago
  
  448dc425
- cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE(). · 16f6e83f
  Ronald S. Bultje authored 12 years ago
  
  16f6e83f
- cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC(). · 951014e5
  Ronald S. Bultje authored 12 years ago
  
  951014e5
- h264: add overread protection to get_cabac_bypass_sign_x86(). · a0bdcb01
  Ronald S. Bultje authored 12 years ago
  
  a0bdcb01
- h264: reindent get_cabac_bypass_sign_x86(). · 95bfa4ea
  Ronald S. Bultje authored 12 years ago
  
  95bfa4ea
- h264: use struct offsets in get_cabac_bypass_sign_x86(). · db025929
  Ronald S. Bultje authored 12 years ago
  
  db025929
26 Mar, 2012 1 commit
- build: prettyprinting cosmetics · ad0e31f1
  Diego Biurrun authored 13 years ago
  
  ad0e31f1
25 Mar, 2012 4 commits
- x86: dsputil: prettyprint gcc inline asm · 62ce9def
  Diego Biurrun authored 12 years ago
  
  62ce9def
- x86: K&R prettyprinting cosmetics for dsputil_mmx.c · 3b549121
  Diego Biurrun authored 12 years ago
  
  3b549121
- x86: conditionally compile H.264 QPEL optimizations · 915a2a0a
  Diego Biurrun authored 13 years ago
  
  915a2a0a
- dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. · 3816642e
  Diego Biurrun authored 12 years ago
```
This makes them safe to use in non-fully braced if-blocks and similar.
```
  3816642e
24 Mar, 2012 1 commit
- Fix linking without yasm. · 5cddfc58
  Carl Eugen Hoyos authored 12 years ago
  
  5cddfc58
23 Mar, 2012 2 commits

aacsbr: handle m_max values smaller than 4. · 71ea2681

Ronald S. Bultje authored 12 years ago

Prevents a signflip in the counter, and a subsequent crash because of
overreads/overwrites.

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org

71ea2681

VC1: restore optimizations broken in . · adb98a3d

Reimar Döffinger authored 12 years ago

They were moved into code under HAVE_YASM and most of them
even into completely disabled code with no reason given
for that in the commit message.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>

adb98a3d

22 Mar, 2012 1 commit

Replace SSE2 instruction in scalarproduct_float_sse() by SSE equivalent. · f6b78638

ami_stuff authored 12 years ago

Fixes an AAC decoding issue with the sample from ticket #213 on machines
with SSE but without SSE2.
Based on 89411a by Reimar.

f6b78638

21 Mar, 2012 1 commit

Replace SSE2 instruction by SSE equivalent. · 89411ae6

Reimar Döffinger authored 12 years ago

This is even potentially faster in this use-case.
Should fix AAC SBR decoding on machines with SSE but not
SSE2, fixing track issue #1041.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>

89411ae6

17 Mar, 2012 1 commit
- dsp: fix diff_bytes_mmx() with small width · 219a6fb6
  Michael Niedermayer authored 12 years ago
```
Fixes Ticket1068
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  219a6fb6
15 Mar, 2012 2 commits
- dsputil: mark source of diff_bytes as const. · dd2631a6
  Michael Niedermayer authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  dd2631a6
- dirac: mark some variables const. · 1bc85fb3
  Michael Niedermayer authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  1bc85fb3
12 Mar, 2012 1 commit

Move struc FFTContext below SECTION_RODATA · 599888a4

Nico Weber authored 12 years ago

Yasm creates an implicit unaligned text section if "struc" is used
outside of any section:
http://tortall.lighthouseapp.com/projects/78676-yasm/tickets/247

Since yasm only honors the "align" annotation on the first declaration
of a section, this implicit text section causes all text section
alignments to be ignored. Also fixes a yasm warning about it agnoring
alignment.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

599888a4

10 Mar, 2012 2 commits
- vp8: convert mbedge loopfilter x86 assembly to use named arguments. · a928ed37
  Ronald S. Bultje authored 12 years ago
  
  a928ed37
- vp8: convert inner loopfilter x86 assembly to use named arguments. · bee330e3
  Ronald S. Bultje authored 12 years ago
  
  bee330e3
07 Mar, 2012 3 commits

sbrdsp.asm: convert all instructions to float/SSE ones. · 6eda85e1

Reimar Döffinger authored 12 years ago

Since the values are floats, using the float operations
makes sense, improves performance on some CPUs and
makes the code SSE compatible instead of needing SSE2.

Based on suggestion by Jason.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

6eda85e1

dsputil: remove shift parameter from scalarproduct_int16 · 7e1ce6a6

Christophe GISQUET authored 12 years ago

There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.

The x86 and neon versions are modified accordingly.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

7e1ce6a6

x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c. · 1e9d55e4
Diego Biurrun authored 12 years ago

1e9d55e4

06 Mar, 2012 1 commit

SBR DSP: fix SSE code to not use SSE2 instructions. · b5161908

Reimar Döffinger authored 12 years ago

movq from SSE register _to_ memory is an SSE2 instruction.
Use the SSE movlps function instead that does the same thing.
Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

b5161908

05 Mar, 2012 1 commit

x86: clean up ff_dsputil_init_mmx() · 356ee8d7

Mans Rullgard authored 12 years ago

This splits ff_dsputil_init_mmx() into multiple functions, one for
each MMX/SSE level, somewhat simplifying the nested conditions.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>

356ee8d7

04 Mar, 2012 5 commits
- vp8: convert simple loopfilter x86 assembly to use named arguments. · b4188f0d
  Ronald S. Bultje authored 12 years ago
  
  b4188f0d
- vp8: convert idct x86 assembly to use named arguments. · 8476ca3b
  Ronald S. Bultje authored 12 years ago
  
  8476ca3b
- vp8: convert mc x86 assembly to use named arguments. · 21ffc78f
  Ronald S. Bultje authored 12 years ago
  
  21ffc78f
- vp8: convert loopfilter x86 assembly to use cpuflags(). · 28170f1a
  Ronald S. Bultje authored 12 years ago
  
  28170f1a
- vp8: convert idct/mc x86 assembly to use cpuflags(). · e25be471
  Ronald S. Bultje authored 12 years ago
  
  e25be471
02 Mar, 2012 3 commits

h264: change underread for 10bit QPEL to overread. · 291c9b62

Ronald S. Bultje authored 12 years ago

This prevents us from reading before the start of the buffer, and thus
prevents crashes resulting from this behaviour. Fixes bug 237.

291c9b62

vp8: disable mmx functions with sse/sse2 counterparts on x86-64. · 45549339
Ronald S. Bultje authored 12 years ago
```
x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2
functions will never be used in practice.
```
45549339
vp8: change int stride to ptrdiff_t stride. · bd66f073
Ronald S. Bultje authored 12 years ago
```
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
```
bd66f073

27 Feb, 2012 1 commit
- h264: fix mmxext chroma deblock to use correct TC values. · b0c4f043
  Ronald S. Bultje authored 12 years ago
  
  b0c4f043
23 Feb, 2012 2 commits

SBR DSP x86: implement SSE sbr_hf_g_filt · 2784d187

Christophe GISQUET authored 12 years ago

Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.

Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.

Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

2784d187

SBR DSP x86: implement SSE sbr_sum_square_sse · 34454c76

Christophe GISQUET authored 12 years ago

The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

34454c76

20 Feb, 2012 1 commit

rv34: change most "int stride" into "ptrdiff_t stride". · 3ab9a2a5

Ronald S. Bultje authored 13 years ago

This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.

3ab9a2a5