- 03 Jul, 2016 1 commit
-
-
James Almer authored
Reviewed-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 25 Jun, 2016 1 commit
-
-
Clément Bœsch authored
-
- 11 May, 2016 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 04 May, 2016 1 commit
-
-
Vittorio Giovara authored
Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 07 Apr, 2016 1 commit
-
-
Diego Biurrun authored
Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
-
- 26 Mar, 2016 1 commit
-
-
Martin Storsjö authored
Previously, ff_h264_idct_add_neon (originally in the arm version) used a non-regular transpose in order to be able to use more instructions that deal with registers as 128 bit register pairs. The aarch64 translation doesn't do it to the same extent, but brought along the same structure since it was a straight translation. This reshuffles ff_h264_idct_add_neon, bringing it closer to the C implementation, making the transpose_4x4H macro do a regular transpose, usable for other algorithms as well. Previously, the third and fourth output from transpose_4x4H were swapped, and prior to cc29d96d, the same inputs as well. In addition to just swapping the outputs, also renumber the intermediate registers for better readability (making the register order match transpose_4x8B). This runs with the same number of cycles as before. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 01 Mar, 2016 1 commit
-
-
Diego Biurrun authored
-
- 26 Feb, 2016 1 commit
-
-
Diego Biurrun authored
-
- 31 Jan, 2016 2 commits
- 25 Jan, 2016 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 24 Dec, 2015 1 commit
-
-
Alexandra Hájková authored
They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
-
- 21 Dec, 2015 1 commit
-
-
Janne Grunau authored
Fix related register order issue in ff_h264_idct_add_neon. Found-by:
zjh8890 <243186085@qq.com>
-
- 19 Dec, 2015 1 commit
-
-
Janne Grunau authored
Fix related register order issue in ff_h264_idct_add_neon. Found-by:
zjh8890 <243186085@qq.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 17 Dec, 2015 1 commit
-
-
Michael Niedermayer authored
The change was not correct and broke H264 This reverts commit cd83f899c94f691b045697d12efa21f83eb2329f.
-
- 14 Dec, 2015 3 commits
-
-
Janne Grunau authored
3% faster dts decoding on a cortex-a57. cortex-a57 cortex-a53 int32_to_float_fmul_array8_c: 1270.9 4475.6 int32_to_float_fmul_array8_neon: 328.6 569.2 int32_to_float_fmul_scalar_c: 928.5 4119.6 int32_to_float_fmul_scalar_neon: 309.1 524.1
-
Janne Grunau authored
~25% faster dts decoding overall. The checkasm CPU cycles numbers are not that useful since synth_filter_float() calls FFTContext.imdct_half(). cortex-a57 cortex-a53 synth_filter_float_c: 1866.2 3490.9 synth_filter_float_neon: 915.0 1531.5 With fftc.imdct_half forced to imdct_half_neon: cortex-a57 cortex-a53 synth_filter_float_c: 1718.4 3025.3 synth_filter_float_neon: 926.2 1530.1
-
Janne Grunau authored
~2% faster dts decoding overall. cortex-a57 cortex-a53 dca_decode_hf_c: 474.8 1659.9 dca_decode_hf_neon: 225.2 301.1 dca_lfe_fir0_c: 913.2 1537.7 dca_lfe_fir0_neon: 286.8 451.9 dca_lfe_fir1_c: 848.7 1711.5 dca_lfe_fir1_neon: 387.1 506.4
-
- 12 Dec, 2015 1 commit
-
-
zjh8890 authored
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong, this bug waste me much time while I make aarch64 arm instruction which used the function.
-
- 20 Jul, 2015 1 commit
-
-
Janne Grunau authored
-
- 24 Jun, 2015 1 commit
-
-
Janne Grunau authored
-
- 02 Feb, 2015 1 commit
-
-
Diego Biurrun authored
It will be reused by the AAC decoder.
-
- 31 Jan, 2015 1 commit
-
-
Carl Eugen Hoyos authored
-
- 09 Dec, 2014 1 commit
-
-
Martin Storsjö authored
This reverts commit c00365b4 in addition to using a different section. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 15 Nov, 2014 1 commit
-
-
Martin Storsjö authored
This allows running the code on android, where 64 bit binaries with text relocations aren't allowed to be loaded. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 30 Aug, 2014 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 03 Aug, 2014 1 commit
-
-
Janne Grunau authored
llvm's integrated assembler does not accept spaces as macro argument delimiter when targeting darwin. Using a explicit delimiter is a good idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2' clear.
-
- 23 Jun, 2014 1 commit
-
-
Janne Grunau authored
Adapt commit 982b596e for the arm and aarch64 NEON asm. 5-10% faster on Cortex-A9.
-
- 15 May, 2014 1 commit
-
-
Janne Grunau authored
Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on Apple's A7.
-
- 13 May, 2014 1 commit
-
-
Janne Grunau authored
Values are positive powers of two, so just replace it with right shift.
-
- 22 Apr, 2014 4 commits
-
-
Janne Grunau authored
From the ARMv7 NEON version. 16 times faster as the C version, overall more than 12% faster vorbis decoding on Apple's A7.
-
Janne Grunau authored
30%/25% (fixed/float) faster mp3 decoding on Apple's A7. The floating point decoder is approximately 7% faster.
-
Janne Grunau authored
Approximately as fast as the ARM NEON version on Apple's A7.
-
Janne Grunau authored
Approximately as fast as the ARM NEON version on Apple's A7.
-
- 06 Apr, 2014 1 commit
-
-
Janne Grunau authored
8% faster h264 decoding on Apple A7.
-
- 20 Mar, 2014 1 commit
-
-
Diego Biurrun authored
This is in line with how the top-level libavcodec Makefile is structured.
-
- 08 Mar, 2014 1 commit
-
-
Janne Grunau authored
Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets approximately 20% faster (no cycle counts available) compared to clang from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall speedup might be explained by additional inlining of get_cabac().
-
- 20 Feb, 2014 1 commit
-
-
Janne Grunau authored
Based on e3fec3f0 for arm.
-
- 15 Jan, 2014 2 commits
-
-
Janne Grunau authored
-
Janne Grunau authored
Ported from ARMv7 NEON.
-