- 17 Oct, 2016 1 commit
-
-
Diego Biurrun authored
This fixes many warnings of the sort warning: label alone on a line without a colon might be in error
-
- 05 Jul, 2016 1 commit
-
-
James Almer authored
About 10% faster. Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 23 Feb, 2016 1 commit
-
-
James Almer authored
Reviewed-by:
Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 06 Feb, 2016 1 commit
-
-
James Almer authored
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by:
Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 31 Jan, 2016 1 commit
-
-
foo86 authored
Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.
-
- 25 Jan, 2016 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 24 Dec, 2015 1 commit
-
-
Alexandra Hájková authored
They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
-
- 11 Aug, 2015 2 commits
-
-
Henrik Gramner authored
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Henrik Gramner authored
Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 04 Aug, 2015 1 commit
-
-
Henrik Gramner authored
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
-
- 26 Jul, 2015 1 commit
-
-
James Almer authored
Silences warnings with Nasm Signed-off-by:
James Almer <jamrial@gmail.com>
-
- 13 Apr, 2014 1 commit
-
-
James Almer authored
This fixes compilation failures with --disable-fma3 Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 06 Apr, 2014 1 commit
-
-
Hendrik Leppkes authored
movq from SSE register to memory is an SSE2 instruction. Instead, use SSE movlps, which does the same thing. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 05 Apr, 2014 2 commits
-
-
James Almer authored
~10% faster than the SSE version. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Almer authored
Fixes compilation failures with "--disable-{avx,fma3} --disable-optimizations" Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 04 Apr, 2014 4 commits
-
-
Christophe Gisquet authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
James Almer authored
Sandy Bridge Win64: 180 cycles in ff_synth_filter_inner_sse2 150 cycles in ff_synth_filter_inner_avx Also switch some instructions to a three operand format to avoid assembly errors with Yasm 1.1.0 or older. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
James Almer authored
Build only on x86_32 targets. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- 17 Mar, 2014 1 commit
-
-
James Almer authored
Replace mulps+subps with fnmaddps, resulting in two less instructions inside the inner loops. About 1% faster FMA3 performance. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 05 Mar, 2014 1 commit
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 02 Mar, 2014 2 commits
-
-
James Almer authored
This reverts the changes 64672098 and 68c3ed93 did to the SSE2 version, which generated a hit of about 5 cycles. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Almer authored
Sandy Bridge Win64: 180 cycles on ff_synth_filter_inner_sse2 150 cycles on ff_synth_filter_inner_avx Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 01 Mar, 2014 1 commit
-
-
James Almer authored
Build only on x86_32 targets. Signed-off-by:
James Almer <jamrial@gmail.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 28 Feb, 2014 5 commits
-
-
Christophe Gisquet authored
Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Christophe Gisquet authored
Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Christophe Gisquet authored
The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.
-
Christophe Gisquet authored
Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
-
Christophe Gisquet authored
Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
-
- 07 Feb, 2014 1 commit
-
-
Christophe Gisquet authored
For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
-