- 28 Mar, 2017 1 commit
-
-
Ronald S. Bultje authored
The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.
-
- 27 Mar, 2017 1 commit
-
-
Clément Bœsch authored
This is following Libav layout to ease merges.
-
- 24 Jan, 2017 2 commits
-
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. This has mostly got the same differences to the 8 bit version as in the arm version. For the horizontal filters, we do 16 pixels in parallel as well. For the 8 pixel wide vertical filters, we can accumulate 4 rows before storing, just as in the 8 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_10bpp_neon: 35.7 30.7 vp9_avg8_10bpp_neon: 93.5 84.7 vp9_avg16_10bpp_neon: 324.4 296.6 vp9_avg32_10bpp_neon: 1236.5 1148.2 vp9_avg64_10bpp_neon: 4639.6 4571.1 vp9_avg_8tap_smooth_4h_10bpp_neon: 130.0 128.0 vp9_avg_8tap_smooth_4hv_10bpp_neon: 440.0 440.5 vp9_avg_8tap_smooth_4v_10bpp_neon: 114.0 105.5 vp9_avg_8tap_smooth_8h_10bpp_neon: 327.0 314.0 vp9_avg_8tap_smooth_8hv_10bpp_neon: 918.7 865.4 vp9_avg_8tap_smooth_8v_10bpp_neon: 330.0 300.2 vp9_avg_8tap_smooth_16h_10bpp_neon: 1187.5 1155.5 vp9_avg_8tap_smooth_16hv_10bpp_neon: 2663.1 2591.0 vp9_avg_8tap_smooth_16v_10bpp_neon: 1107.4 1078.3 vp9_avg_8tap_smooth_64h_10bpp_neon: 17754.6 17454.7 vp9_avg_8tap_smooth_64hv_10bpp_neon: 33285.2 33001.5 vp9_avg_8tap_smooth_64v_10bpp_neon: 16066.9 16048.6 vp9_put4_10bpp_neon: 25.5 21.7 vp9_put8_10bpp_neon: 56.0 52.0 vp9_put16_10bpp_neon/armv8: 183.0 163.1 vp9_put32_10bpp_neon/armv8: 678.6 563.1 vp9_put64_10bpp_neon/armv8: 2679.9 2195.8 vp9_put_8tap_smooth_4h_10bpp_neon: 120.0 118.0 vp9_put_8tap_smooth_4hv_10bpp_neon: 435.2 435.0 vp9_put_8tap_smooth_4v_10bpp_neon: 107.0 98.2 vp9_put_8tap_smooth_8h_10bpp_neon: 303.0 290.0 vp9_put_8tap_smooth_8hv_10bpp_neon: 893.7 828.7 vp9_put_8tap_smooth_8v_10bpp_neon: 305.5 263.5 vp9_put_8tap_smooth_16h_10bpp_neon: 1089.1 1059.2 vp9_put_8tap_smooth_16hv_10bpp_neon: 2578.8 2452.4 vp9_put_8tap_smooth_16v_10bpp_neon: 1009.5 933.5 vp9_put_8tap_smooth_64h_10bpp_neon: 16223.4 15918.6 vp9_put_8tap_smooth_64hv_10bpp_neon: 32153.0 31016.2 vp9_put_8tap_smooth_64v_10bpp_neon: 14516.5 13748.1 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is around 4-9x. Signed-off-by: Martin Storsjö <martin@martin.st>
-
Martin Storsjö authored
This work is sponsored by, and copyright, Google. The plain pixel put/copy functions are used from the 8 bit version, for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new copy128 is added. Compared with the 8 bit version, the filters can no longer use the trick to accumulate in 16 bit with only saturation at the end, but now the accumulators need to be 32 bit. This avoids the need to keep track of which filter index is the largest though, reducing the size of the executable code for these filters. For the horizontal filters, we only do 4 or 8 pixels wide in parallel (while doing two rows at a time), since we don't have enough register space to filter 16 pixels wide. For the vertical filters, we still do 4 and 8 pixels in parallel just as in the 8 bit case, but we need to store the output after every 2 rows instead of after every 4 rows. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16 vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50 vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72 vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42 vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70 vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75 vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32 vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31 vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87 vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47 vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79 vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71 vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10 vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41 vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33 vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84 vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89 vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30 vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66 vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63 vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21 vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21 vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63 vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25 vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89 vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49 vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63 vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92 For the larger 8tap filters, the speedup vs C code is around 4-14x. Signed-off-by: Martin Storsjö <martin@martin.st>
-
- 05 May, 2013 1 commit
-
-
Carl Eugen Hoyos authored
Fixes ticket #2533.
-
- 19 Mar, 2011 1 commit
-
-
Mans Rullgard authored
Signed-off-by: Mans Rullgard <mans@mansr.com>
-
- 21 Mar, 2009 1 commit
-
-
Justin Ruggles authored
decoder Originally committed as revision 18089 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 26 Feb, 2009 1 commit
-
-
Justin Ruggles authored
muxer. Originally committed as revision 17606 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 17 Feb, 2009 1 commit
-
-
Aurelien Jacobs authored
Originally committed as revision 17396 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 31 Aug, 2008 1 commit
-
-
Stefano Sabatini authored
Consistently apply this rule: the guard name is obtained from the filename by stripping the leading "lib", converting '/' and '.' to '_' and uppercasing the resulting name. Guard names in the root directory have to be prefixed by "FFMPEG_". Originally committed as revision 15120 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 23 Aug, 2008 2 commits
-
-
Vladimir Voroshilov authored
Originally committed as revision 14916 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Vladimir Voroshilov authored
Originally committed as revision 14915 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 17 Aug, 2008 1 commit
-
-
Vladimir Voroshilov authored
(just skeleton, contains only parts, explicitly ok'ed by Michael) Originally committed as revision 14800 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 30 Oct, 2007 1 commit
-
-
Luca Abeni authored
Originally committed as revision 10877 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 17 Oct, 2007 1 commit
-
-
Diego Biurrun authored
Originally committed as revision 10765 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 17 Jun, 2007 2 commits
-
-
Guillaume Poirier authored
Originally committed as revision 9356 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Måns Rullgård authored
Originally committed as revision 9345 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 16 Jun, 2007 1 commit
-
-
Måns Rullgård authored
Originally committed as revision 9344 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 19 Mar, 2007 1 commit
-
-
Luca Barbato authored
Originally committed as revision 8448 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 28 Feb, 2007 1 commit
-
-
Luca Barbato authored
Originally committed as revision 8158 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 07 Oct, 2006 1 commit
-
-
Diego Biurrun authored
and fix GPL/LGPL version mismatches. Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 12 Jan, 2006 1 commit
-
-
Diego Biurrun authored
Originally committed as revision 4842 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 17 Dec, 2005 1 commit
-
-
Diego Biurrun authored
Originally committed as revision 4749 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 25 Oct, 2003 1 commit
-
-
Roman Shaposhnik authored
Originally committed as revision 2430 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 23 Oct, 2003 1 commit
-
-
Michael Niedermayer authored
Originally committed as revision 2420 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 22 Oct, 2003 1 commit
-
-
Michael Niedermayer authored
Originally committed as revision 2415 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 03 Mar, 2003 1 commit
-
-
Michael Niedermayer authored
bitexact cleanup Originally committed as revision 1617 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 11 Feb, 2003 1 commit
-
-
Zdenek Kabelac authored
Originally committed as revision 1578 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 20 Nov, 2002 1 commit
-
-
Zdenek Kabelac authored
Originally committed as revision 1249 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 19 Nov, 2002 2 commits
-
-
Zdenek Kabelac authored
- put FF_LIBMPEG2_IDCT_PERM into CVS - so it will work for now Originally committed as revision 1227 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
Zdenek Kabelac authored
Originally committed as revision 1225 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 25 Oct, 2002 1 commit
-
-
Michael Niedermayer authored
Originally committed as revision 1071 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 06 Oct, 2002 1 commit
-
-
Michael Niedermayer authored
Originally committed as revision 1006 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 25 May, 2002 1 commit
-
-
Fabrice Bellard authored
Originally committed as revision 599 to svn://svn.ffmpeg.org/ffmpeg/trunk
-
- 13 Aug, 2001 1 commit
-
-
Fabrice Bellard authored
Originally committed as revision 79 to svn://svn.ffmpeg.org/ffmpeg/trunk
-