Commits · 94cdf82d53fe9be260dc6106634a9f9b218211bd · Linshizhi / ffmpeg.wasm-core

06 Jan, 2020 1 commit

Silence "string-plus-int" warning shown by clang. · 96fab29e

libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]

96fab29e

04 Jan, 2020 1 commit

swscale/aarch64: use multiply accumulate and shift-right narrow · c3a17fff

Sebastian Pop authored 5 years ago

This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right instructions and
performing the shift with the shift-right narrow instructions.

I see 8% difference on an m6g instance with neoverse-n1 CPUs:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818

Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c3a17fff

31 Dec, 2019 1 commit
- swscale/utils: remove access of AV_PIX_FMT_NB · 1e3e547a
  Zhao Zhili authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  1e3e547a
17 Dec, 2019 1 commit

swscale/aarch64: use multiply accumulate and increase vector factor to 4 · bd831912

Sebastian Pop authored 5 years ago

This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

bd831912

10 Dec, 2019 1 commit
- swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper · 8558c231
  Limin Wang authored 5 years ago
```
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  8558c231
06 Dec, 2019 1 commit

libswscale/swscale_unscaled.c: remove redundant code · 039a0ebe

Ting Fu authored 5 years ago

Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

039a0ebe

01 Nov, 2019 1 commit

swscale/swscale_unscaled: fix gbrap10be md5 different on big endian system · a5e24be5

Limin Wang authored 5 years ago

You can reproduce it by below command:
./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \
    -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact  \
    -frames:v 1 -f nut md5:

little-endian:
f91e2edd8098276579c1929e5e160416
big-endian:
ba4d011dbbdc78ccbf6cc7d698630929
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a5e24be5

16 Oct, 2019 3 commits
- swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template() · d2606210
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  d2606210
- swscale/output: Correct Alpha in yuv2ya16_X_c_template() · 3e668293
  Michael Niedermayer authored 5 years ago
```
Untested, no testcase
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  3e668293
- swscale/output: Implement Luma computation from yuv2ya16_X_c_template() without 64bit · 4f4ca675
  Michael Niedermayer authored 5 years ago
```
This also reverts 21838cad
The revert is in this commit to avoid 2 fate updates
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  4f4ca675
04 Oct, 2019 2 commits

swscale: Fix AltiVec/VSX build with recent GCC · e6625ca4

Daniel Kolesa authored 5 years ago

The argument to vec_splat_u16 must be a literal. By making the
function always inline and marking the arguments const, gcc can
turn those into literals, and avoid build errors like:

swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal

Fixes #7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>

e6625ca4

swscale: Replace illegal vector keyword usage in altivec code · 1bdb47b7

Daniel Kolesa authored 5 years ago

While this technically compiles in current ffmpeg, this is only
because ffmpeg is compiled in strict ISO C mode, which disables
the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
replaced with a macro inside altivec.h, which defines vector to
be actually __vector, which accepts random types.

Normally, the vector keyword should be used only with plain
scalar non-typedef types, such as unsigned int. But we have the
vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
in util_altivec.h in libavutil.

This is also consistent with other AltiVec/VSX code elsewhere in
the tree.

Fixes #7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>

1bdb47b7

28 Sep, 2019 2 commits

swscale/utils: Fix invalid left shifts of negative numbers · e2646e23

Andreas Rheinhardt authored 5 years ago

Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411,
vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

e2646e23

swscale/x86/swscale: Fix undefined left shifts of negative numbers · 736c7c20

Andreas Rheinhardt authored 5 years ago

This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

736c7c20

27 Sep, 2019 1 commit

swscale/swscale: cosmetics · cde1d70a

Limin Wang authored 5 years ago

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

cde1d70a

26 Sep, 2019 1 commit
- swscale/output: fix signed integer overflow for ya16 · 21838cad
  Paul B Mahol authored 5 years ago
```
Fixes #7666.
```
  21838cad
09 Sep, 2019 1 commit

swscale/swscale: delete unwanted assignments · 29bde4b3

Limin Wang authored 5 years ago

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

29bde4b3

06 Sep, 2019 1 commit

swscale/output: fix some code indentations · ef134265

Linjie Fu authored 5 years ago

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

ef134265

13 Aug, 2019 1 commit

lsws/ppc/yuv2rgb_altivec: Replace vec_lvsl/vec_perm with vec_xl · 3a557c5d

Chip Kerchner authored 5 years ago

gcc 6.x and 7.x generate wrong code for little endian machines
for the vec_lvsl/vec_perm instruction combos in some cases.
The bug was fixed in version 8.x
If these instructions are replaced with vec_xl, the problem goes
away for all versions of the compilers.

Fixes ticket #7124.

3a557c5d

21 Jul, 2019 2 commits
- Bump minor versions again on master to keep 4.2 versions separate from master · 80bb65fa
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  80bb65fa
- Bump minor versions to separate 4.2 from master · 22db337a
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  22db337a
13 May, 2019 2 commits

swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes · 9d269301
Michael Niedermayer authored 5 years ago
```
Some formats use longer names than 12.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
9d269301

libswcale: Fix possible string overflow in test. · b8ed4930

Adam Richter authored 5 years ago

In libswcale/tests/swcale.c, the function fileTest() calls sscanf in
an argument of "%12s" on character srcStr[] and dstStr[], which are
only 12 bytes.  So, if the input string is 12 characters, a
terminating null byte can be written past the end of these arrays.

This bug was found by cppcheck.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b8ed4930

12 May, 2019 2 commits

swscale: Add test for isSemiPlanarYUV to pixdesc_query · 4fa4f1d7

Philip Langdale authored 5 years ago

Lauri had asked me what the semi planar formats were and that reminded
me that we could add it to pixdesc_query so we know exactly what the
list is.

4fa4f1d7

swscale: Add support for NV24 and NV42 · cd483180

Philip Langdale authored 5 years ago

The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.

Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.

cd483180

07 May, 2019 4 commits

swscale/ppc: Shorten power8 tests via a var · e25bddf5
Lauri Kasanen authored 5 years ago

e25bddf5

swscale/ppc: VSX-optimize hScale16To* · a2a16206

Lauri Kasanen authored 5 years ago

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw

32-bit mul, power8 only

2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
  30896 UNITS in hscale,    8192 runs,      0 skips
  63956 UNITS in hscale,    8192 runs,      0 skips

2.06 for hScale16To15_vsx:
  30531 UNITS in hscale,    8192 runs,      0 skips
  63161 UNITS in hscale,    8192 runs,      0 skips

a2a16206

swscale/ppc: Indent · 3437111f
Lauri Kasanen authored 5 years ago

3437111f

swscale/ppc: VSX-optimize hScale8To19 · 9456adc2

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

2.26 speedup (x86 SSE2 is 2.32):
  23772 UNITS in hscale,    4096 runs,      0 skips
  53862 UNITS in hscale,    4096 runs,      0 skips

9456adc2

30 Apr, 2019 1 commit

swscale/ppc: VSX-optimize hscale_fast · d0e4d042

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw

4.27 speedup for hyscale_fast:
  24796 UNITS in hyscale_fast,    4096 runs,      0 skips
   5797 UNITS in hyscale_fast,    4096 runs,      0 skips

4.48 speedup for hcscale_fast:
  19911 UNITS in hcscale_fast,    4095 runs,      1 skips
   4437 UNITS in hcscale_fast,    4096 runs,      0 skips

d0e4d042

11 Apr, 2019 1 commit

swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 · ce92ee4b

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

~2x speedup:

rgb24
  24431 UNITS in yuv2packed2,   16384 runs,      0 skips
  13783 UNITS in yuv2packed2,   16383 runs,      1 skips
bgr24
  24396 UNITS in yuv2packed2,   16384 runs,      0 skips
  14059 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  26815 UNITS in yuv2packed2,   16383 runs,      1 skips
  12797 UNITS in yuv2packed2,   16383 runs,      1 skips
bgra
  27060 UNITS in yuv2packed2,   16384 runs,      0 skips
  13138 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  26998 UNITS in yuv2packed2,   16384 runs,      0 skips
  12728 UNITS in yuv2packed2,   16381 runs,      3 skips
bgra
  26651 UNITS in yuv2packed2,   16384 runs,      0 skips
  13124 UNITS in yuv2packed2,   16384 runs,      0 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.

ce92ee4b

07 Apr, 2019 3 commits

swscale/ppc: VSX-optimize yuv2rgb_full_X · 8607e29f

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

32-bit mul, power8 only.

~6.4x speedup:

rgb24
 214278 UNITS in yuv2packedX,   16384 runs,      0 skips
  33249 UNITS in yuv2packedX,   16384 runs,      0 skips
bgr24
 214616 UNITS in yuv2packedX,   16384 runs,      0 skips
  33233 UNITS in yuv2packedX,   16384 runs,      0 skips
rgba
 214517 UNITS in yuv2packedX,   16384 runs,      0 skips
  33271 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214973 UNITS in yuv2packedX,   16384 runs,      0 skips
  33397 UNITS in yuv2packedX,   16384 runs,      0 skips
argb
 214613 UNITS in yuv2packedX,   16384 runs,      0 skips
  33310 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214637 UNITS in yuv2packedX,   16384 runs,      0 skips
  33330 UNITS in yuv2packedX,   16384 runs,      0 skips

8607e29f

swscale/ppc: VSX-optimize yuv2rgb_full_2 · 3256e949

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

32-bit mul, power8 only.

~4x speedup:

rgb24
  52763 UNITS in yuv2packed2,   16384 runs,      0 skips
  13453 UNITS in yuv2packed2,   16384 runs,      0 skips
bgr24
  53144 UNITS in yuv2packed2,   16384 runs,      0 skips
  13616 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  52796 UNITS in yuv2packed2,   16384 runs,      0 skips
  12904 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52732 UNITS in yuv2packed2,   16384 runs,      0 skips
  13262 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  52661 UNITS in yuv2packed2,   16384 runs,      0 skips
  12879 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52662 UNITS in yuv2packed2,   16384 runs,      0 skips
  12932 UNITS in yuv2packed2,   16384 runs,      0 skips

3256e949

swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1 · 50e672bc

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.

50e672bc

31 Mar, 2019 3 commits

swscale/ppc: VSX-optimize yuv2422_X · 7adce3e6

Lauri Kasanen authored 6 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
          -cpuflags 0 -v error -

7.2x speedup:

yuyv422
 126354 UNITS in yuv2packedX,   16384 runs,      0 skips
  16383 UNITS in yuv2packedX,   16382 runs,      2 skips
yvyu422
 117669 UNITS in yuv2packedX,   16384 runs,      0 skips
  16271 UNITS in yuv2packedX,   16379 runs,      5 skips
uyvy422
 117310 UNITS in yuv2packedX,   16384 runs,      0 skips
  16226 UNITS in yuv2packedX,   16382 runs,      2 skips

7adce3e6

swscale/ppc: VSX-optimize yuv2422_2 · 9a2db4dc

Lauri Kasanen authored 6 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

5.1x speedup:

yuyv422
  19339 UNITS in yuv2packed2,   16384 runs,      0 skips
   3718 UNITS in yuv2packed2,   16383 runs,      1 skips
yvyu422
  19438 UNITS in yuv2packed2,   16384 runs,      0 skips
   3800 UNITS in yuv2packed2,   16380 runs,      4 skips
uyvy422
  19128 UNITS in yuv2packed2,   16384 runs,      0 skips
   3721 UNITS in yuv2packed2,   16380 runs,      4 skips

9a2db4dc

swscale/ppc: VSX-optimize yuv2422_1 · a6a31ca3

Lauri Kasanen authored 6 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
            -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

15.3x speedup:

yuyv422
  14513 UNITS in yuv2packed1,   32768 runs,      0 skips
    949 UNITS in yuv2packed1,   32767 runs,      1 skips
yvyu422
  14516 UNITS in yuv2packed1,   32767 runs,      1 skips
    943 UNITS in yuv2packed1,   32767 runs,      1 skips
uyvy422
  14530 UNITS in yuv2packed1,   32767 runs,      1 skips
    941 UNITS in yuv2packed1,   32766 runs,      2 skips

a6a31ca3

28 Mar, 2019 2 commits

swscale/swscale_unscaled: Fix chroma slice height · 8865ae95
Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
8865ae95

swscale/swscale_unscaled: fixed the issue that when width/height is not... · c47fada2

Dong, Jerry authored 6 years ago

swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c47fada2

27 Mar, 2019 1 commit

swscale/ppc: VSX-optimize yuv2rgb_full · 681957b8

Lauri Kasanen authored 6 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

This uses 32-bit mul, so POWER8 only.

The following output formats get about 4.5x speedup:

rgb24
  39980 UNITS in yuv2packed1,   32768 runs,      0 skips
   8774 UNITS in yuv2packed1,   32768 runs,      0 skips
bgr24
  40069 UNITS in yuv2packed1,   32768 runs,      0 skips
   8772 UNITS in yuv2packed1,   32766 runs,      2 skips
rgba
  39759 UNITS in yuv2packed1,   32768 runs,      0 skips
   8681 UNITS in yuv2packed1,   32767 runs,      1 skips
bgra
  39729 UNITS in yuv2packed1,   32768 runs,      0 skips
   8696 UNITS in yuv2packed1,   32766 runs,      2 skips
argb
  39766 UNITS in yuv2packed1,   32768 runs,      0 skips
   8672 UNITS in yuv2packed1,   32766 runs,      2 skips
bgra
  39784 UNITS in yuv2packed1,   32768 runs,      0 skips
   8659 UNITS in yuv2packed1,   32767 runs,      1 skips

681957b8