- 24 Jan, 2017 1 commit
-
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. The plain pixel put/copy functions are used from the 8 bit version, for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new copy128 is added. Compared with the 8 bit version, the filters can no longer use the trick to accumulate in 16 bit with only saturation at the end, but now the accumulators need to be 32 bit. This avoids the need to keep track of which filter index is the largest though, reducing the size of the executable code for these filters. For the horizontal filters, we only do 4 or 8 pixels wide in parallel (while doing two rows at a time), since we don't have enough register space to filter 16 pixels wide. For the vertical filters, we still do 4 and 8 pixels in parallel just as in the 8 bit case, but we need to store the output after every 2 rows instead of after every 4 rows. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16 vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50 vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72 vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42 vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70 vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75 vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32 vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31 vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87 vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47 vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79 vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71 vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10 vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41 vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33 vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84 vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89 vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30 vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66 vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63 vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21 vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21 vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63 vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25 vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89 vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49 vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63 vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92 For the larger 8tap filters, the speedup vs C code is around 4-14x. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 10 Mar, 2014 1 commit
-
-
Anton Khirnov authored
The values of {FLT,DBL}_{MAX,MIN} macros on some systems (older musl libc, some BSD flavours) are not exactly representable, i.e. (double)DBL_MAX == DBL_MAX is false This violates (at least some interpretations of) the C99 standard and breaks code (e.g. in vf_fps) like double f = DBL_MAX; [...] if (f == DBL_MAX) { // f has not been changed yet [....] }
-
- 07 Jan, 2014 1 commit
-
-
Michael Niedermayer authored
This matches FFT_FLOAT Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 06 Jan, 2014 1 commit
-
-
Diego Biurrun authored
The define does not originate from configure, so it should not have a name that is CONFIG_-prefixed.
-
- 21 Nov, 2013 1 commit
-
-
Diego Biurrun authored
-
- 04 Aug, 2013 1 commit
-
-
Nedeljko Babic authored
Iterative implementation of 32 bit fixed point split-radix FFT. Max FFT that can be calculated currently is 2^12. Signed-off-by:
Nedeljko Babic <nbabic@mips.com> Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- 31 Mar, 2011 1 commit
-
-
Mans Rullgard authored
-
- 20 Mar, 2011 1 commit
-
-
Mans Rullgard authored
These windows do not really belong in fft/mdct files and were easily confused with the similarly named tables used by rdft. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 09 Sep, 2010 1 commit
-
-
Måns Rullgård authored
Instead of defining functions in per-arch header files included by the main cpu.c, define them normally and call them from the generic one. Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 Sep, 2010 1 commit
-
-
Stefano Sabatini authored
function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 22 Dec, 2008 1 commit
-
-
Diego Biurrun authored
It contains optimizations that are not specific to i386 and libavutil uses this naming scheme already. Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 12 Aug, 2008 1 commit
-
-
Loren Merritt authored
c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse. Originally committed as revision 14698 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 09 May, 2008 1 commit
-
-
Diego Biurrun authored
Originally committed as revision 13098 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 May, 2008 1 commit
-
-
Ramiro Polla authored
typedef x86_reg as the appropriate size and use it instead. Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 16 May, 2007 1 commit
-
-
Ronald S. Bultje authored
include paths in the source files. mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 07 Oct, 2006 1 commit
-
-
Diego Biurrun authored
and fix GPL/LGPL version mismatches. Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 18 Aug, 2006 1 commit
-
-
Loren Merritt authored
2.5% faster fft, 0.5% faster vorbis. Originally committed as revision 6023 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 08 Mar, 2006 1 commit
-
-
Zuxy Meng authored
Patch by Zuxy Meng, zuxy <<dot>> meng >>at<< gmail <<dot>> com Minor non-functional diff-related fixes by me. Originally committed as revision 5125 to svn://svn.ffmpeg.org/ffmpeg/trunk
-