Commits · 6517177d975b5be7a049c903190eff6ff7c6a864 · Linshizhi / ffmpeg.wasm-core

03 Jul, 2016 1 commit
- avcodec: add missing xmm/neon clobber test wrappers for the new decode API · 293484fa
  James Almer authored 8 years ago
```
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
```
  293484fa
25 Jun, 2016 1 commit
- lavc/neontest: fix constness in arm/aarch64 avcodec_open2() wrappers · dfd0c0f9
  Clément Bœsch authored 8 years ago
  
  dfd0c0f9
11 May, 2016 1 commit
- aarch64/synth_filter: fix compilation · c8c14d0f
  James Almer authored 8 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  c8c14d0f
04 May, 2016 1 commit
- cosmetics: Fix spelling mistakes · 41ed7ab4
  Vittorio Giovara authored 8 years ago
```
Signed-off-by: Diego Biurrun <diego@biurrun.de>
```
  41ed7ab4
07 Apr, 2016 1 commit

build: miscellaneous cosmetics · 01621202

Diego Biurrun authored 9 years ago

Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.

01621202

26 Mar, 2016 1 commit

aarch64: Make transpose_4x4H do a regular transpose · cdb1665f

Martin Storsjö authored 8 years ago

Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.

This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.

Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d, the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).

This runs with the same number of cycles as before.
Signed-off-by: Martin Storsjö <martin@martin.st>

cdb1665f

01 Mar, 2016 1 commit
- fft: Split MDCT bits off from FFT · 1a094af6
  Diego Biurrun authored 9 years ago
  
  1a094af6
26 Feb, 2016 1 commit
- fft: arm: Drop unnecessary #include, add missing ones · 97aec6e7
  Diego Biurrun authored 9 years ago
  
  97aec6e7
31 Jan, 2016 2 commits
- avcodec/dca: add new decoder based on libdcadec · ae5b2c52
  foo86 authored 9 years ago
  
  ae5b2c52
- avcodec/dca: remove old decoder · 46089967
  foo86 authored 9 years ago
```
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
```
  46089967
25 Jan, 2016 1 commit
- avcodec/synth_filter: split off remaining code from dcadec files · 209f50e1
  James Almer authored 9 years ago
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  209f50e1
24 Dec, 2015 1 commit
- dca: remove unused decode_hf function and quant_d tables · 2008f760
  Alexandra Hájková authored 9 years ago
```
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
```
  2008f760
21 Dec, 2015 1 commit
- arm64: fix inverted register order in transpose_4x4H · cc29d96d
  Janne Grunau authored 9 years ago
```
Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
```
  cc29d96d
19 Dec, 2015 1 commit

avcodec/arm64: fix inverted register order in transpose_4x4H · 2dba0407

Janne Grunau authored 9 years ago

Fix related register order issue in ff_h264_idct_add_neon.
Found-by: zjh8890 <243186085@qq.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

2dba0407

17 Dec, 2015 1 commit
- Revert "avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H" · 95b59bfb
  Michael Niedermayer authored 9 years ago
```
The change was not correct and broke H264

This reverts commit cd83f899c94f691b045697d12efa21f83eb2329f.
```
  95b59bfb
14 Dec, 2015 3 commits

arm64: int32_to_float_fmul neon asm · a0fc780a

Janne Grunau authored 9 years ago

3% faster dts decoding on a cortex-a57.

                                 cortex-a57   cortex-a53
int32_to_float_fmul_array8_c:    1270.9       4475.6
int32_to_float_fmul_array8_neon:  328.6        569.2
int32_to_float_fmul_scalar_c:     928.5       4119.6
int32_to_float_fmul_scalar_neon:  309.1        524.1

a0fc780a

arm64: port synth_filter_float_neon from arm · 705f5e5e

Janne Grunau authored 9 years ago

~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().

                         cortex-a57   cortex-a53
synth_filter_float_c:    1866.2       3490.9
synth_filter_float_neon:  915.0       1531.5

With fftc.imdct_half forced to imdct_half_neon:
                         cortex-a57   cortex-a53
synth_filter_float_c:    1718.4       3025.3
synth_filter_float_neon:  926.2       1530.1

705f5e5e

arm64: convert dcadsp neon asm from arm · c33c1fa8

Janne Grunau authored 9 years ago

~2% faster dts decoding overall.

                    cortex-a57   cortex-a53
dca_decode_hf_c:    474.8        1659.9
dca_decode_hf_neon: 225.2         301.1
dca_lfe_fir0_c:     913.2        1537.7
dca_lfe_fir0_neon:  286.8         451.9
dca_lfe_fir1_c:     848.7        1711.5
dca_lfe_fir1_neon:  387.1         506.4

c33c1fa8

12 Dec, 2015 1 commit

avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H · c18176bd

zjh8890 authored 9 years ago

The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.

c18176bd

20 Jul, 2015 1 commit
- h264: aarch64: intra prediction optimisations · f56d8d8d
  Janne Grunau authored 9 years ago
  
  f56d8d8d
24 Jun, 2015 1 commit
- arm64: constify src in h264qpel dsp function definitions · c2de2cf0
  Janne Grunau authored 9 years ago
  
  c2de2cf0
02 Feb, 2015 1 commit
- opus: Factor out imdct15 into a standalone component · 3d5d4623
  Diego Biurrun authored 10 years ago
```
It will be reused by the AAC decoder.
```
  3d5d4623
31 Jan, 2015 1 commit
- lavc/aarch64: Do not use the neon horizontal chroma loop filter for H.264 4:2:2. · 4faea46b
  Carl Eugen Hoyos authored 10 years ago
  
  4faea46b
09 Dec, 2014 1 commit

aarch64: Use .data.rel.ro for const data with relocations · 780cd20b

Martin Storsjö authored 10 years ago

This reverts commit c00365b4
in addition to using a different section.
Signed-off-by: Martin Storsjö <martin@martin.st>

780cd20b

15 Nov, 2014 1 commit

aarch64: Make the function pointer tables position independent · c00365b4

Martin Storsjö authored 10 years ago

This allows running the code on android, where 64 bit binaries with
text relocations aren't allowed to be loaded.
Signed-off-by: Martin Storsjö <martin@martin.st>

c00365b4

30 Aug, 2014 1 commit
- avcodec/aarch64/h264qpel_init_aarch64: mark src as const · e16b7338
  Michael Niedermayer authored 10 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  e16b7338
03 Aug, 2014 1 commit

aarch64: add ',' between assembler macro arguments where missing · ac6b95db

Janne Grunau authored 10 years ago

llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.

ac6b95db

23 Jun, 2014 1 commit
- h264: avoid using uninitialized memory in NEON chroma mc · f23d26a6
  Janne Grunau authored 10 years ago
```
Adapt commit 982b596e for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
```
  f23d26a6
15 May, 2014 1 commit
- aarch64: opus NEON iMDCT and FFT · d3f5b947
  Janne Grunau authored 10 years ago
```
Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on
Apple's A7.
```
  d3f5b947
13 May, 2014 1 commit
- aarch64: assembler in clang-3.4 ignores the division by two · 9aa45920
  Janne Grunau authored 10 years ago
```
Values are positive powers of two, so just replace it with right shift.
```
  9aa45920
22 Apr, 2014 4 commits

aarch64: NEON vorbis_inverse_coupling · 3956a5e0

Janne Grunau authored 10 years ago

From the ARMv7 NEON version. 16 times faster as the C version, overall
more than 12% faster vorbis decoding on Apple's A7.

3956a5e0

aarch64: NEON fixed/floating point MPADSP apply_window · 8f9fe6ae

Janne Grunau authored 10 years ago

30%/25% (fixed/float) faster mp3 decoding on Apple's A7. The floating
point decoder is approximately 7% faster.

8f9fe6ae

aarch64: NEON float (i)MDCT · ee2bc597
Janne Grunau authored 10 years ago
```
Approximately as fast as the ARM NEON version on Apple's A7.
```
ee2bc597
aarch64: NEON float FFT · 650c4300
Janne Grunau authored 10 years ago
```
Approximately as fast as the ARM NEON version on Apple's A7.
```
650c4300

06 Apr, 2014 1 commit
- aarch64: implement videodsp.prefetch · d3789eee
  Janne Grunau authored 10 years ago
```
8% faster h264 decoding on Apple A7.
```
  d3789eee
20 Mar, 2014 1 commit
- build: Group general components separate from de/encoders in arch Makefiles · 0e083d7e
  Diego Biurrun authored 11 years ago
```
This is in line with how the top-level libavcodec Makefile is structured.
```
  0e083d7e
08 Mar, 2014 1 commit

aarch64: get_cabac inline asm · dfe224f3

Janne Grunau authored 11 years ago

Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets
approximately 20% faster (no cycle counts available) compared to clang
from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall
speedup might be explained by additional inlining of get_cabac().

dfe224f3

20 Feb, 2014 1 commit
- aarch64: use EXTERN_ASM consistently for exported symbols · 9c029f67
  Janne Grunau authored 11 years ago
```
Based on e3fec3f0 for arm.
```
  9c029f67
15 Jan, 2014 2 commits
- aarch64: port neon clobber test from arm · fe96769b
  Janne Grunau authored 11 years ago
  
  fe96769b
- aarch64: h264 (bi)weight NEON optimizations · f896bca0
  Janne Grunau authored 11 years ago
```
Ported from ARMv7 NEON.
```
  f896bca0