- 28 Feb, 2014 3 commits
-
-
Janne Grunau authored
-
Christophe Gisquet authored
The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.
-
Christophe Gisquet authored
The scaling factor is constant so it is faster to scale the FIR coefficients in the tables during compilation. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
-
- 15 Feb, 2014 1 commit
-
-
Peter Ross authored
Signed-off-by: Peter Ross <pross@xvid.org> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 13 Feb, 2014 2 commits
-
-
James Darnley authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Michael Niedermayer authored
This assumes the array is sufficiently padded with 0 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 09 Feb, 2014 1 commit
-
-
Martin Storsjö authored
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
-
- 08 Feb, 2014 2 commits
-
-
Janne Grunau authored
-
Christophe Gisquet authored
The x86 runs short on registers because numerous elements are not static. In addition, splitting them allows more optimized code, at least for x86. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 07 Feb, 2014 4 commits
-
-
Christophe Gisquet authored
It is currently declared as a macro who is set to inlinable functions, among which a Neon and a default C implementations. Add a DSP parameter to each inline function, unused except by the default C implementation which calls a function from the DSP context. On an Arrandale CPU, gain for an inlined SSE2 function vs. a call: - Win32: 29 to 26 cycles - Win64: 25 to 23 cycles Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Christophe Gisquet authored
The x86 runs short on registers because numerous elements are not static. In addition, splitting them allows more optimized code, at least for x86. Arm asm changes by Janne Grunau. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
-
Christophe Gisquet authored
It is currently declared as a macro who is set to inlinable functions, among which a Neon and a default C implementations. Add a DSP parameter to each inline function, unused except by the default C implementation which calls a function from the DSP context. On an Arrandale CPU, gain for an inlined SSE2 function vs. a call: - Win32: 29 to 26 cycles - Win64: 25 to 23 cycles Signed-off-by: Janne Grunau <janne-libav@jannau.net>
-
Martin Storsjö authored
Don't rely on the fact that an unprefixed label currently exists. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 06 Feb, 2014 1 commit
-
-
Martin Storsjö authored
Based on a patch by Ronald S. Bultje. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 13 Jan, 2014 1 commit
-
-
Diego Biurrun authored
-
- 10 Jan, 2014 1 commit
-
-
Martin Storsjö authored
This is pretty much based on the same test for XMM registers. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 07 Jan, 2014 4 commits
-
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
The function macro always sets .align 2 before declaring the function label (since 5c5e1ea3) and always sets the section to .text (since 278caa6a). The .align 5 before certain functions, added in fc252eba, were added before .text and .align were added to the function macro and thus became useless/unused when the function macro got them. This restores the original intention, to align the loop entry points. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This file no longer uses the pld instruction at all, all such uses have been split into hpeldsp_arm.S. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 06 Jan, 2014 2 commits
-
-
Diego Biurrun authored
The define does not originate from configure, so it should not have a name that is CONFIG_-prefixed.
-
Anton Khirnov authored
Fixes invalid memory access. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC:libav-stable@libav.org
-
- 04 Jan, 2014 2 commits
-
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 20 Dec, 2013 3 commits
-
-
Martin Storsjö authored
q4-q7/d8-d15 are supposed to not be clobbered by the callee. CC: libav-stable@libav.org Signed-off-by: Martin Storsjö <martin@martin.st>
-
Mason Carter authored
Apply David Conrad's old patch to the modern codebase. http://ffmpeg.org/pipermail/ffmpeg-devel/2009-April/059877.htmlSigned-off-by: Martin Storsjö <martin@martin.st>
-
Mason Carter authored
For: ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon ff_put_pixels8x8_neon ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00) Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans Rullgard. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 08 Dec, 2013 1 commit
-
-
Diego Biurrun authored
The (optimized) functions are used nowhere else.
-
- 29 Sep, 2013 1 commit
-
-
Ronald S. Bultje authored
-
- 30 Aug, 2013 1 commit
-
-
Thilo Borgmann authored
-
- 29 Aug, 2013 2 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 23 Aug, 2013 2 commits
-
-
Diego Biurrun authored
The functions are used by all codecs that enable the h264chroma component and the file is already compiled conditional on h264chroma being enabled.
-
Diego Biurrun authored
Most of our VP56 optimizations are VP6-only and will stay that way. So avoid compiling them for VP5-only builds.
-
- 08 Aug, 2013 1 commit
-
-
Ben Avison authored
Before After Mean StdDev Mean StdDev Change This function 508.8 23.4 185.4 9.0 +174.4% Overall 3068.5 31.7 2752.1 29.4 +11.5% In combination with the preceding patch: Before After Mean StdDev Mean StdDev Change Overall 2925.6 26.2 2752.1 29.4 +6.3% Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 24 Jul, 2013 1 commit
-
-
Martin Storsjö authored
When building for iOS in thumb mode, gas-preprocessor.pl doesn't mark unused labels as thumb functions (as it does for other local labels, where it can figure out that they are functions due to being referenced in branch instructions). This leads to linker warnings for some of those local labels, such as: ld: warning: ARM function not 4-byte aligned: __a_evaluation from libavcodec/libavcodec.a(simple_idct_arm.o) Therefore, comment them out since they don't have any function. They do still have a value in documenting key points in the assembly source though. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 22 Jul, 2013 4 commits
-
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Reviewed-by: Kostya Shishkov Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ben Avison authored
Before After Mean StdDev Mean StdDev Change This function 1323.0 98.0 746.2 60.6 +77.3% Overall 15400.0 336.4 14147.5 288.4 +8.9% Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Before After Mean StdDev Mean StdDev Change This function 1389.3 4.2 967.8 35.1 +43.6% Overall 15577.5 83.2 15400.0 336.4 +1.2% Signed-off-by: Martin Storsjö <martin@martin.st>
-