- 25 Sep, 2014 1 commit
-
-
James Almer authored
Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
-
- 23 Sep, 2014 1 commit
-
-
Bernd Kuhls authored
Since these commits http://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=adf8227cf4e7b4fccb2ad88e1e09b6dc00dd00ed http://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=db7f1c7c5a1d37e7f4da64a79a97bea1c4b6e9f8 compilation on arm4/arm5 fails: libavcodec/libavcodec.so: undefined reference to `ff_startcode_find_candidate_armv6' Because libavcodec/arm/Makefile contains ARMV6-OBJS-$(CONFIG_STARTCODE) += arm/startcode_armv6.o function ff_startcode_find_candidate_armv6 is not included for older ARM archs. The bug was found during automatic buildroot builds: http://autobuild.buildroot.net/results/ec7/ec71e4f16ee9106747dff5f15999cbd17903e76f//build-end.log Quote from configure summary: ARCH arm (armv4t) big-endian no runtime cpu detection yes ARMv5TE enabled no ARMv6 enabled no ARMv6T2 enabled no http://autobuild.buildroot.net/results/be7/be72eb182eaccf0064a32c9dfc2ac1c0d6555506/build-end.log ARCH arm (armv5te) big-endian no runtime cpu detection yes ARMv5TE enabled yes ARMv6 enabled no ARMv6T2 enabled no This patch provides the necessary #if clauses as discussed with Michael: https://ffmpeg.org/pipermail/ffmpeg-devel/2014-September/163329.htmlSigned-off-by: Bernd Kuhls <bernd.kuhls@t-online.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 02 Sep, 2014 1 commit
-
-
Diego Biurrun authored
These function pointers already existed in the ARM code. Adding them globally allows calls to the function pointers to access arch-optimized versions of the functions transparently.
-
- 15 Aug, 2014 2 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 12 Aug, 2014 1 commit
-
-
James Almer authored
This reduces code duplication and differences with the fork. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 06 Aug, 2014 1 commit
-
-
Michael Niedermayer authored
Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 04 Aug, 2014 2 commits
-
-
Ben Avison authored
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
Ben Avison authored
This permits re-use with parsers for codecs which use similar start codes. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
-
- 27 Jul, 2014 1 commit
-
-
Michael Niedermayer authored
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 25 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 21 Jul, 2014 2 commits
-
-
Ben Avison authored
Signed-off-by: Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
-
- 20 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 18 Jul, 2014 4 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
Also rename the enum values to be consistent with other DCT permutations.
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 17 Jul, 2014 3 commits
-
-
Ben Avison authored
The previous implementation targeted DTS Coherent Acoustics, which only requires nbits == 4 (fft16()). This case was (and still is) linked directly rather than being indirected through ff_fft_calc_vfp(), but now the full range from radix-4 up to radix-65536 is available. This benefits other codecs such as AAC and AC3. The implementaion is based upon the C version, with each routine larger than radix-16 calling a hierarchy of smaller FFT functions, then performing a post-processing pass. This pass benefits a lot from loop unrolling to counter the long pipelines in the VFP. A relaxed calling standard also reduces the overhead of the call hierarchy, and avoiding the excessive inlining performed by GCC probably helps with I-cache utilisation too. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in the FFT routines (fft4() to fft512() and pass()) for the same sample AAC stream: Before After Mean StdDev Mean StdDev Confidence Change Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4% FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2% Signed-off-by: Martin Storsjö <martin@martin.st>
-
Ben Avison authored
The previous implementation targeted DTS Coherent Acoustics, which only requires mdct_bits == 6. This relatively small size lent itself to unrolling the loops a small number of times, and encoding offsets calculated at assembly time within the load/store instructions of each iteration. In the more general case (codecs such as AAC and AC3) much larger arrays are used - mdct_bits == [8, 9, 11]. The old method does not scale for these cases, so more integer registers are used with non-unrolled versions of the loops (and with some stack spillage). The postrotation filter loop is still unrolled by a factor of 2 to permit the double-buffering of some VFP registers to facilitate overlap of neighbouring iterations. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same example AAC stream: Before After Mean StdDev Mean StdDev Confidence Change aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8% ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1% Signed-off-by: Martin Storsjö <martin@martin.st>
-
Diego Biurrun authored
-
- 16 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 13 Jul, 2014 1 commit
-
-
Ben Avison authored
The previous implementation targeted DTS Coherent Acoustics, which only requires mdct_bits == 6. This relatively small size lent itself to unrolling the loops a small number of times, and encoding offsets calculated at assembly time within the load/store instructions of each iteration. In the more general case (codecs such as AAC and AC3) much larger arrays are used - mdct_bits == [8, 9, 11]. The old method does not scale for these cases, so more integer registers are used with non-unrolled versions of the loops (and with some stack spillage). The postrotation filter loop is still unrolled by a factor of 2 to permit the double-buffering of some VFP registers to facilitate overlap of neighbouring iterations. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same example AAC stream: Before After Mean StdDev Mean StdDev Confidence Change aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8% ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1% Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 11 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 09 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 08 Jul, 2014 1 commit
-
-
Martin Storsjö authored
This instruction is deprecated on ARMv8, and it is serializing on some ARMv7 cores as well [1]. [1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293 CC: libav-stable@libav.org Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 06 Jul, 2014 1 commit
-
-
Diego Biurrun authored
-
- 30 Jun, 2014 1 commit
-
-
Diego Biurrun authored
-
- 23 Jun, 2014 1 commit
-
-
Janne Grunau authored
Adapt commit 982b596e for the arm and aarch64 NEON asm. 5-10% faster on Cortex-A9.
-
- 22 Jun, 2014 1 commit
-
-
Diego Biurrun authored
-
- 19 Jun, 2014 1 commit
-
-
Michael Niedermayer authored
This will pick the "best" simple idct compatible idct Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 18 Jun, 2014 1 commit
-
-
Diego Biurrun authored
-
- 05 Jun, 2014 1 commit
-
-
Christophe Gisquet authored
APE is not the sole codec using scalarproduct_and_madd_int16. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 03 Jun, 2014 1 commit
-
-
Janne Grunau authored
Move the GNU as check before the arch specific asm checks since the .dn check requires gas compatible assembler. Disable the VC-1 motion compensation NEON asm which is the only part using that directive. The integrated assembler in the upcoming clang 3.5 does not support .dn/.qn without plans to change that. Too much effort to implement it while it is rarely used. http://llvm.org/bugs/show_bug.cgi?id=18199.
-
- 29 May, 2014 1 commit
-
-
Diego Biurrun authored
-
- 29 Apr, 2014 1 commit
-
-
Anton Khirnov authored
This should reduce the frequency with which the offsets need to be updated.
-
- 25 Apr, 2014 2 commits
-
-
Ben Avison authored
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
Ben Avison authored
This permits re-use with parsers for codecs which use similar start codes. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-
- 24 Apr, 2014 1 commit
-
-
Janne Grunau authored
-
- 20 Apr, 2014 1 commit
-
-
Michael Niedermayer authored
The original patch seems to be missing a 16x16 function though Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
-