Commits · 67114195694df1057f34d9bb056d6fbac52b4017 · Linshizhi / ffmpeg.wasm-core

21 Jul, 2019 2 commits
- Bump minor versions again on master to keep 4.2 versions separate from master · 80bb65fa
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  80bb65fa
- Bump minor versions to separate 4.2 from master · 22db337a
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  22db337a
13 May, 2019 2 commits

swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes · 9d269301
Michael Niedermayer authored 5 years ago
```
Some formats use longer names than 12.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
9d269301

libswcale: Fix possible string overflow in test. · b8ed4930

Adam Richter authored 5 years ago

In libswcale/tests/swcale.c, the function fileTest() calls sscanf in
an argument of "%12s" on character srcStr[] and dstStr[], which are
only 12 bytes.  So, if the input string is 12 characters, a
terminating null byte can be written past the end of these arrays.

This bug was found by cppcheck.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b8ed4930

12 May, 2019 2 commits

swscale: Add test for isSemiPlanarYUV to pixdesc_query · 4fa4f1d7

Philip Langdale authored 5 years ago

Lauri had asked me what the semi planar formats were and that reminded
me that we could add it to pixdesc_query so we know exactly what the
list is.

4fa4f1d7

swscale: Add support for NV24 and NV42 · cd483180

Philip Langdale authored 5 years ago

The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.

Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.

cd483180

07 May, 2019 4 commits

swscale/ppc: Shorten power8 tests via a var · e25bddf5
Lauri Kasanen authored 5 years ago

e25bddf5

swscale/ppc: VSX-optimize hScale16To* · a2a16206

Lauri Kasanen authored 5 years ago

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw

32-bit mul, power8 only

2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
  30896 UNITS in hscale,    8192 runs,      0 skips
  63956 UNITS in hscale,    8192 runs,      0 skips

2.06 for hScale16To15_vsx:
  30531 UNITS in hscale,    8192 runs,      0 skips
  63161 UNITS in hscale,    8192 runs,      0 skips

a2a16206

swscale/ppc: Indent · 3437111f
Lauri Kasanen authored 5 years ago

3437111f

swscale/ppc: VSX-optimize hScale8To19 · 9456adc2

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

2.26 speedup (x86 SSE2 is 2.32):
  23772 UNITS in hscale,    4096 runs,      0 skips
  53862 UNITS in hscale,    4096 runs,      0 skips

9456adc2

30 Apr, 2019 1 commit

swscale/ppc: VSX-optimize hscale_fast · d0e4d042

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw

4.27 speedup for hyscale_fast:
  24796 UNITS in hyscale_fast,    4096 runs,      0 skips
   5797 UNITS in hyscale_fast,    4096 runs,      0 skips

4.48 speedup for hcscale_fast:
  19911 UNITS in hcscale_fast,    4095 runs,      1 skips
   4437 UNITS in hcscale_fast,    4096 runs,      0 skips

d0e4d042

11 Apr, 2019 1 commit

swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 · ce92ee4b

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

~2x speedup:

rgb24
  24431 UNITS in yuv2packed2,   16384 runs,      0 skips
  13783 UNITS in yuv2packed2,   16383 runs,      1 skips
bgr24
  24396 UNITS in yuv2packed2,   16384 runs,      0 skips
  14059 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  26815 UNITS in yuv2packed2,   16383 runs,      1 skips
  12797 UNITS in yuv2packed2,   16383 runs,      1 skips
bgra
  27060 UNITS in yuv2packed2,   16384 runs,      0 skips
  13138 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  26998 UNITS in yuv2packed2,   16384 runs,      0 skips
  12728 UNITS in yuv2packed2,   16381 runs,      3 skips
bgra
  26651 UNITS in yuv2packed2,   16384 runs,      0 skips
  13124 UNITS in yuv2packed2,   16384 runs,      0 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.

ce92ee4b

07 Apr, 2019 3 commits

swscale/ppc: VSX-optimize yuv2rgb_full_X · 8607e29f

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

32-bit mul, power8 only.

~6.4x speedup:

rgb24
 214278 UNITS in yuv2packedX,   16384 runs,      0 skips
  33249 UNITS in yuv2packedX,   16384 runs,      0 skips
bgr24
 214616 UNITS in yuv2packedX,   16384 runs,      0 skips
  33233 UNITS in yuv2packedX,   16384 runs,      0 skips
rgba
 214517 UNITS in yuv2packedX,   16384 runs,      0 skips
  33271 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214973 UNITS in yuv2packedX,   16384 runs,      0 skips
  33397 UNITS in yuv2packedX,   16384 runs,      0 skips
argb
 214613 UNITS in yuv2packedX,   16384 runs,      0 skips
  33310 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214637 UNITS in yuv2packedX,   16384 runs,      0 skips
  33330 UNITS in yuv2packedX,   16384 runs,      0 skips

8607e29f

swscale/ppc: VSX-optimize yuv2rgb_full_2 · 3256e949

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

32-bit mul, power8 only.

~4x speedup:

rgb24
  52763 UNITS in yuv2packed2,   16384 runs,      0 skips
  13453 UNITS in yuv2packed2,   16384 runs,      0 skips
bgr24
  53144 UNITS in yuv2packed2,   16384 runs,      0 skips
  13616 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  52796 UNITS in yuv2packed2,   16384 runs,      0 skips
  12904 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52732 UNITS in yuv2packed2,   16384 runs,      0 skips
  13262 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  52661 UNITS in yuv2packed2,   16384 runs,      0 skips
  12879 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52662 UNITS in yuv2packed2,   16384 runs,      0 skips
  12932 UNITS in yuv2packed2,   16384 runs,      0 skips

3256e949

swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1 · 50e672bc

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.

50e672bc

31 Mar, 2019 3 commits

swscale/ppc: VSX-optimize yuv2422_X · 7adce3e6

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
          -cpuflags 0 -v error -

7.2x speedup:

yuyv422
 126354 UNITS in yuv2packedX,   16384 runs,      0 skips
  16383 UNITS in yuv2packedX,   16382 runs,      2 skips
yvyu422
 117669 UNITS in yuv2packedX,   16384 runs,      0 skips
  16271 UNITS in yuv2packedX,   16379 runs,      5 skips
uyvy422
 117310 UNITS in yuv2packedX,   16384 runs,      0 skips
  16226 UNITS in yuv2packedX,   16382 runs,      2 skips

7adce3e6

swscale/ppc: VSX-optimize yuv2422_2 · 9a2db4dc

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

5.1x speedup:

yuyv422
  19339 UNITS in yuv2packed2,   16384 runs,      0 skips
   3718 UNITS in yuv2packed2,   16383 runs,      1 skips
yvyu422
  19438 UNITS in yuv2packed2,   16384 runs,      0 skips
   3800 UNITS in yuv2packed2,   16380 runs,      4 skips
uyvy422
  19128 UNITS in yuv2packed2,   16384 runs,      0 skips
   3721 UNITS in yuv2packed2,   16380 runs,      4 skips

9a2db4dc

swscale/ppc: VSX-optimize yuv2422_1 · a6a31ca3

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
            -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

15.3x speedup:

yuyv422
  14513 UNITS in yuv2packed1,   32768 runs,      0 skips
    949 UNITS in yuv2packed1,   32767 runs,      1 skips
yvyu422
  14516 UNITS in yuv2packed1,   32767 runs,      1 skips
    943 UNITS in yuv2packed1,   32767 runs,      1 skips
uyvy422
  14530 UNITS in yuv2packed1,   32767 runs,      1 skips
    941 UNITS in yuv2packed1,   32766 runs,      2 skips

a6a31ca3

28 Mar, 2019 2 commits

swscale/swscale_unscaled: Fix chroma slice height · 8865ae95
Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
8865ae95

swscale/swscale_unscaled: fixed the issue that when width/height is not... · c47fada2

Dong, Jerry authored 5 years ago

swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c47fada2

27 Mar, 2019 2 commits

swscale/ppc: VSX-optimize yuv2rgb_full · 681957b8

Lauri Kasanen authored 5 years ago

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

This uses 32-bit mul, so POWER8 only.

The following output formats get about 4.5x speedup:

rgb24
  39980 UNITS in yuv2packed1,   32768 runs,      0 skips
   8774 UNITS in yuv2packed1,   32768 runs,      0 skips
bgr24
  40069 UNITS in yuv2packed1,   32768 runs,      0 skips
   8772 UNITS in yuv2packed1,   32766 runs,      2 skips
rgba
  39759 UNITS in yuv2packed1,   32768 runs,      0 skips
   8681 UNITS in yuv2packed1,   32767 runs,      1 skips
bgra
  39729 UNITS in yuv2packed1,   32768 runs,      0 skips
   8696 UNITS in yuv2packed1,   32766 runs,      2 skips
argb
  39766 UNITS in yuv2packed1,   32768 runs,      0 skips
   8672 UNITS in yuv2packed1,   32766 runs,      2 skips
bgra
  39784 UNITS in yuv2packed1,   32768 runs,      0 skips
   8659 UNITS in yuv2packed1,   32767 runs,      1 skips

681957b8

swscale: Remove duplicated code · 81a4719d

Lauri Kasanen authored 5 years ago

In this function, the exact same clamping happens both in the if and unconditionally.

81a4719d

20 Mar, 2019 2 commits
- swscale/ppc: Add av_unused to template vars only used in one includer · 6b5ea90e
  Lauri Kasanen authored 5 years ago
  
  6b5ea90e
- swscale/ppc: Clean up some mixed decl warnings · ac3062f1
  Lauri Kasanen authored 5 years ago
  
  ac3062f1
05 Feb, 2019 1 commit

libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX · 8522d219

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

With TIMER_REPORT skips disabled:
yuv420p9le
  12412 UNITS in planarX,  131072 runs,      0 skips
  73136 UNITS in planarX,  131072 runs,      0 skips
yuv420p9be
  12481 UNITS in planarX,  131072 runs,      0 skips
  73410 UNITS in planarX,  131072 runs,      0 skips
yuv420p10le
  12322 UNITS in planarX,  131072 runs,      0 skips
  72546 UNITS in planarX,  131072 runs,      0 skips
yuv420p10be
  12291 UNITS in planarX,  131072 runs,      0 skips
  72935 UNITS in planarX,  131072 runs,      0 skips
yuv420p12le
  12316 UNITS in planarX,  131072 runs,      0 skips
  72708 UNITS in planarX,  131072 runs,      0 skips
yuv420p12be
  12319 UNITS in planarX,  131072 runs,      0 skips
  72577 UNITS in planarX,  131072 runs,      0 skips
yuv420p14le
  12259 UNITS in planarX,  131072 runs,      0 skips
  72516 UNITS in planarX,  131072 runs,      0 skips
yuv420p14be
  12440 UNITS in planarX,  131072 runs,      0 skips
  72962 UNITS in planarX,  131072 runs,      0 skips
yuv420p16le
  10548 UNITS in planarX,  131072 runs,      0 skips
  73429 UNITS in planarX,  131072 runs,      0 skips
yuv420p16be
  10634 UNITS in planarX,  131072 runs,      0 skips
 150959 UNITS in planarX,  131072 runs,      0 skips
Signed-off-by: Lauri Kasanen <cand@gmx.com>

8522d219

01 Jan, 2019 1 commit
- swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables() · fe17f9b9
  Michael Niedermayer authored 6 years ago
```
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  fe17f9b9
26 Dec, 2018 1 commit

swscale/output: Altivec-optimize float yuv2plane1 · 8dd9df9e

Lauri Kasanen authored 6 years ago

This function wouldn't benefit from VSX instructions, so I put it
under altivec.

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -

3743 UNITS in planar1,   65495 runs,     41 skips

-cpuflags 0

23511 UNITS in planar1,   65530 runs,      6 skips

grayf32be

4647 UNITS in planar1,   65449 runs,     87 skips

-cpuflags 0

28608 UNITS in planar1,   65530 runs,      6 skips

The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

8dd9df9e

14 Dec, 2018 1 commit

swscale/output: VSX-optimize 16-bit yuv2plane1 · b4c8c03b

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,    143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs,     24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b4c8c03b

12 Dec, 2018 1 commit

swscale/output: VSX-optimize nbps yuv2plane1 · 1046cba2

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -

Speedups:
yuv2plane1_9BE_vsx	11.2042
yuv2plane1_9LE_vsx	11.156
yuv2plane1_10BE_vsx	9.89428
yuv2plane1_10LE_vsx	10.3637
yuv2plane1_12BE_vsx	9.71923
yuv2plane1_12LE_vsx	11.0404
yuv2plane1_14BE_vsx	10.1763
yuv2plane1_14LE_vsx	11.2728

Fate passes, each format tested with an image to video conversion.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

1046cba2

04 Dec, 2018 1 commit

swscale/ppc: Move VSX-using code to its own file · 78c7ff7d

Lauri Kasanen authored 6 years ago

Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

78c7ff7d

26 Nov, 2018 1 commit

swscale/output: Altivec-optimize yuv2plane1_8 · 46c5693e

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,      8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,      3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

46c5693e

24 Nov, 2018 1 commit
- swscale : add support for YUVA444P12 and YUVA422P12 · 86e6f0db
  Martin Vignali authored 6 years ago
  
  86e6f0db
06 Nov, 2018 1 commit

swscale: Add GRAY10 · f149a4a5

Carl Eugen Hoyos authored 6 years ago

Based on ab839054 by Luca Barbato.
Signed-off-by: James Almer <jamrial@gmail.com>

f149a4a5

01 Nov, 2018 2 commits
- Bump minor version for master after 4.1 branchpoint · 517573a6
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  517573a6
- Bump minor versions for branching 4.1 · 780d5e30
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  780d5e30
24 Oct, 2018 3 commits
- swscale/swscale_unscaled : rename packed_16bpc_bswap · 156120fc
  Martin Vignali authored 6 years ago
```
is used for packed and planar format
```
  156120fc
- swscale/unscaled : add grayf32 le to be · 26bf4a40
  Martin Vignali authored 6 years ago
  
  26bf4a40
- swscale/utils : simplify unscaled initial test for float pixfmt · 3db33b44
  Martin Vignali authored 6 years ago
  
  3db33b44
18 Oct, 2018 2 commits
- swscale : add YA16 LE/BE output · db4771af
  Martin Vignali authored 6 years ago
  
  db4771af
- swscale/x86/rgb2rgb.asm : add Ivo Van Poorten name to the top of the file · 658bbc00
  Martin Vignali authored 6 years ago
```
suggested by Carl Eugen Hoyos
```
  658bbc00