Commits · 78b86c30d3860135042505dd4a9cbd95c4e6257d · Linshizhi / ffmpeg.wasm-core

24 Jan, 2017 1 commit

arm: Add NEON optimizations for 10 and 12 bit vp9 MC · a4d4bad7

Martin Storsjö authored 8 years ago

This work is sponsored by, and copyright, Google.

The plain pixel put/copy functions are used from the 8 bit version,
for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
copy128 is added.

Compared with the 8 bit version, the filters can no longer use the
trick to accumulate in 16 bit with only saturation at the end, but now
the accumulators need to be 32 bit. This avoids the need to keep track
of which filter index is the largest though, reducing the size of the
executable code for these filters.

For the horizontal filters, we only do 4 or 8 pixels wide in parallel
(while doing two rows at a time), since we don't have enough register
space to filter 16 pixels wide.

For the vertical filters, we still do 4 and 8 pixels in parallel just
as in the 8 bit case, but we need to store the output after every 2
rows instead of after every 4 rows.

Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16
vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50
vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72
vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42
vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70
vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75
vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32
vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31
vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87
vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47
vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79
vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71
vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10
vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41
vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33
vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84
vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89
vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30
vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66
vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63
vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21
vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21
vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63
vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25
vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89
vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49
vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63
vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92

For the larger 8tap filters, the speedup vs C code is around 4-14x.
Signed-off-by: Martin Storsjö <martin@martin.st>

a4d4bad7

10 Mar, 2014 1 commit

Work around broken floating point limits on some systems. · e854b8f9

Anton Khirnov authored 11 years ago

The values of {FLT,DBL}_{MAX,MIN} macros on some systems (older musl
libc, some BSD flavours) are not exactly representable, i.e.
(double)DBL_MAX == DBL_MAX is false
This violates (at least some interpretations of) the C99 standard and
breaks code (e.g. in vf_fps) like
double f = DBL_MAX;
[...]
if (f == DBL_MAX) { // f has not been changed yet
    [....]
}

e854b8f9

07 Jan, 2014 1 commit
- rename CONFIG_FFT_FIXED_32 -> FFT_FIXED_32 · 99b6357f
  Michael Niedermayer authored 11 years ago
```
This matches FFT_FLOAT
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
  99b6357f
06 Jan, 2014 1 commit

Rename CONFIG_FFT_FLOAT ---> FFT_FLOAT · 794fcf79

Diego Biurrun authored 11 years ago

The define does not originate from configure, so it should not
have a name that is CONFIG_-prefixed.

794fcf79

21 Nov, 2013 1 commit
- dct/fft: Give consistent names to fixed/float template files · ac0e03ba
  Diego Biurrun authored 11 years ago
  
  ac0e03ba
04 Aug, 2013 1 commit

libavcodec: Implementation of 32 bit fixed point FFT · 18d7074b

Nedeljko Babic authored 11 years ago

Iterative implementation of 32 bit fixed point split-radix FFT.
Max FFT that can be calculated currently is 2^12.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

18d7074b

31 Mar, 2011 1 commit
- Fixed-point FFT and MDCT · 7087ce08
  Mans Rullgard authored 13 years ago
  
  7087ce08
20 Mar, 2011 1 commit

Move sine windows to a separate file · 4538729a

Mans Rullgard authored 13 years ago

These windows do not really belong in fft/mdct files and were
easily confused with the similarly named tables used by rdft.
Signed-off-by: Mans Rullgard <mans@mansr.com>

4538729a

19 Mar, 2011 1 commit
- Replace FFmpeg with Libav in licence headers · 2912e87a
  Mans Rullgard authored 13 years ago
```
Signed-off-by: Mans Rullgard <mans@mansr.com>
```
  2912e87a
09 Sep, 2010 1 commit

Clean up av_get_cpu_flag() · 9275438a

Måns Rullgård authored 14 years ago

Instead of defining functions in per-arch header files included
by the main cpu.c, define them normally and call them from the
generic one.

Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk

9275438a

08 Sep, 2010 1 commit

Move mm_support() from libavcodec to libavutil, make it a public · c6c98d08

Stefano Sabatini authored 14 years ago

function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk

c6c98d08

22 Dec, 2008 1 commit

Rename libavcodec/i386/ --> libavcodec/x86/. · a6493a8f

Diego Biurrun authored 16 years ago

It contains optimizations that are not specific to i386 and
libavutil uses this naming scheme already.

Originally committed as revision 16270 to svn://svn.ffmpeg.org/ffmpeg/trunk

a6493a8f

12 Aug, 2008 1 commit

split-radix FFT · 5d0ddd1a

Loren Merritt authored 16 years ago

c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse.

Originally committed as revision 14698 to svn://svn.ffmpeg.org/ffmpeg/trunk

5d0ddd1a

09 May, 2008 1 commit
- Use full path for #includes from another directory. · 245976da
  Diego Biurrun authored 16 years ago
```
Originally committed as revision 13098 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  245976da
08 May, 2008 1 commit

Do not misuse long as the size of a register in x86. · 40d0e665

Ramiro Polla authored 16 years ago

typedef x86_reg as the appropriate size and use it instead.

Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk

40d0e665

16 May, 2007 1 commit

Add libavcodec to compiler include flags in order to simplify header · b550bfaa

Ronald S. Bultje authored 17 years ago

include paths in the source files.
mostly from a patch by Ronald S. Bultje, rbultje ronald.bitfreak net

Originally committed as revision 9034 to svn://svn.ffmpeg.org/ffmpeg/trunk

b550bfaa

07 Oct, 2006 1 commit
- Change license headers to say 'FFmpeg' instead of 'this program/this library' · b78e7197
  Diego Biurrun authored 18 years ago
```
and fix GPL/LGPL version mismatches.

Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk
```
  b78e7197
18 Aug, 2006 1 commit

ff_fft_calc_3dn/3dn2/sse: convert intrinsics to inline asm. · 1e4ecf26

Loren Merritt authored 18 years ago

2.5% faster fft, 0.5% faster vorbis.

Originally committed as revision 6023 to svn://svn.ffmpeg.org/ffmpeg/trunk

1e4ecf26

08 Mar, 2006 1 commit

3DNow! & Extended 3DNow! versions of FFT · 82eb4b0f

Zuxy Meng authored 18 years ago

Patch by Zuxy Meng, zuxy <<dot>> meng >>at<< gmail <<dot>> com
Minor non-functional diff-related fixes by me.

Originally committed as revision 5125 to svn://svn.ffmpeg.org/ffmpeg/trunk

82eb4b0f