Commits · d6fc5dc24aa09e026c6271a7565e63798dfe46f3 · Linshizhi / ffmpeg.wasm-core

05 Feb, 2019 1 commit

libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX · 8522d219

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

With TIMER_REPORT skips disabled:
yuv420p9le
  12412 UNITS in planarX,  131072 runs,      0 skips
  73136 UNITS in planarX,  131072 runs,      0 skips
yuv420p9be
  12481 UNITS in planarX,  131072 runs,      0 skips
  73410 UNITS in planarX,  131072 runs,      0 skips
yuv420p10le
  12322 UNITS in planarX,  131072 runs,      0 skips
  72546 UNITS in planarX,  131072 runs,      0 skips
yuv420p10be
  12291 UNITS in planarX,  131072 runs,      0 skips
  72935 UNITS in planarX,  131072 runs,      0 skips
yuv420p12le
  12316 UNITS in planarX,  131072 runs,      0 skips
  72708 UNITS in planarX,  131072 runs,      0 skips
yuv420p12be
  12319 UNITS in planarX,  131072 runs,      0 skips
  72577 UNITS in planarX,  131072 runs,      0 skips
yuv420p14le
  12259 UNITS in planarX,  131072 runs,      0 skips
  72516 UNITS in planarX,  131072 runs,      0 skips
yuv420p14be
  12440 UNITS in planarX,  131072 runs,      0 skips
  72962 UNITS in planarX,  131072 runs,      0 skips
yuv420p16le
  10548 UNITS in planarX,  131072 runs,      0 skips
  73429 UNITS in planarX,  131072 runs,      0 skips
yuv420p16be
  10634 UNITS in planarX,  131072 runs,      0 skips
 150959 UNITS in planarX,  131072 runs,      0 skips
Signed-off-by: Lauri Kasanen <cand@gmx.com>

8522d219

01 Jan, 2019 1 commit
- swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables() · fe17f9b9
  Michael Niedermayer authored 6 years ago
```
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  fe17f9b9
26 Dec, 2018 1 commit

swscale/output: Altivec-optimize float yuv2plane1 · 8dd9df9e

Lauri Kasanen authored 6 years ago

This function wouldn't benefit from VSX instructions, so I put it
under altivec.

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -

3743 UNITS in planar1,   65495 runs,     41 skips

-cpuflags 0

23511 UNITS in planar1,   65530 runs,      6 skips

grayf32be

4647 UNITS in planar1,   65449 runs,     87 skips

-cpuflags 0

28608 UNITS in planar1,   65530 runs,      6 skips

The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

8dd9df9e

14 Dec, 2018 1 commit

swscale/output: VSX-optimize 16-bit yuv2plane1 · b4c8c03b

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,    143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs,     24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

b4c8c03b

12 Dec, 2018 1 commit

swscale/output: VSX-optimize nbps yuv2plane1 · 1046cba2

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -

Speedups:
yuv2plane1_9BE_vsx	11.2042
yuv2plane1_9LE_vsx	11.156
yuv2plane1_10BE_vsx	9.89428
yuv2plane1_10LE_vsx	10.3637
yuv2plane1_12BE_vsx	9.71923
yuv2plane1_12LE_vsx	11.0404
yuv2plane1_14BE_vsx	10.1763
yuv2plane1_14LE_vsx	11.2728

Fate passes, each format tested with an image to video conversion.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

1046cba2

04 Dec, 2018 1 commit

swscale/ppc: Move VSX-using code to its own file · 78c7ff7d

Lauri Kasanen authored 6 years ago

Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

78c7ff7d

26 Nov, 2018 1 commit

swscale/output: Altivec-optimize yuv2plane1_8 · 46c5693e

Lauri Kasanen authored 6 years ago

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,      8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,      3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

46c5693e

24 Nov, 2018 1 commit
- swscale : add support for YUVA444P12 and YUVA422P12 · 86e6f0db
  Martin Vignali authored 6 years ago
  
  86e6f0db
01 Nov, 2018 2 commits
- Bump minor version for master after 4.1 branchpoint · 517573a6
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  517573a6
- Bump minor versions for branching 4.1 · 780d5e30
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  780d5e30
24 Oct, 2018 3 commits
- swscale/swscale_unscaled : rename packed_16bpc_bswap · 156120fc
  Martin Vignali authored 6 years ago
```
is used for packed and planar format
```
  156120fc
- swscale/unscaled : add grayf32 le to be · 26bf4a40
  Martin Vignali authored 6 years ago
  
  26bf4a40
- swscale/utils : simplify unscaled initial test for float pixfmt · 3db33b44
  Martin Vignali authored 6 years ago
  
  3db33b44
18 Oct, 2018 2 commits
- swscale : add YA16 LE/BE output · db4771af
  Martin Vignali authored 6 years ago
  
  db4771af
- swscale/x86/rgb2rgb.asm : add Ivo Van Poorten name to the top of the file · 658bbc00
  Martin Vignali authored 6 years ago
```
suggested by Carl Eugen Hoyos
```
  658bbc00
13 Oct, 2018 2 commits
- swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove inline asm version · 296609f8
  Martin Vignali authored 6 years ago
  
  296609f8
- swscale/x86/rgb2rgb : remove mmx version for shuffle2103 · 04afdbb5
  Martin Vignali authored 6 years ago
  
  04afdbb5
09 Sep, 2018 1 commit
- swscale/swscale_unscaled: add gbrap -> packed rgb path · 931e7c05
  Paul B Mahol authored 6 years ago
  
  931e7c05
22 Aug, 2018 2 commits

swscale/swscale : small cosmetic · bdd67546
Martin Vignali authored 6 years ago

bdd67546

swscale : treat float input data as uint 16bpc · 3af1c4ea

Martin Vignali authored 6 years ago

Currently float are converted to 16b uint in input part
using src depth (32 bits) in hScale16To19 and hScale16to15,
make an invalid shift for the data

So shift the value when using float input
like 16 bpc uint.

3af1c4ea

14 Aug, 2018 1 commit
- libswscale: Adds conversions from/to float gray format. · 582bc5a3
  Sergey Lavrushkin authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  582bc5a3
10 Jun, 2018 1 commit

lsws/rgb2rgb_template: Do not compile unneeded shuffle functions on big-endian. · 3a56ade1

Carl Eugen Hoyos authored 6 years ago

Fixes the following warnings:
In file included from libswscale/rgb2rgb.c:128:0:
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used

3a56ade1

05 May, 2018 1 commit
- swscale: add gray14 support · b9dd058f
  Paul B Mahol authored 6 years ago
```
Signed-off-by: Paul B Mahol <onemda@gmail.com>
```
  b9dd058f
22 Apr, 2018 1 commit
- swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422 · 07a566e7
  Martin Vignali authored 6 years ago
```
and checkasm test
```
  07a566e7
16 Apr, 2018 2 commits
- Bump minor versions after release/4.0 branching · 3c1ecb05
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  3c1ecb05
- Bump minor versions for branching release/4.0 · 7e3a070d
  Michael Niedermayer authored 6 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  7e3a070d
03 Apr, 2018 1 commit

avutil/pixdesc: deprecate AV_PIX_FMT_FLAG_PSEUDOPAL · d6fc031c

wm4 authored 7 years ago

PSEUDOPAL pixel formats are not paletted, but carried a palette with the
intention of allowing code to treat unpaletted formats as paletted. The
palette simply mapped the byte values to the resulting RGB values,
making it some sort of LUT for RGB conversion.

It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8,
GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap
formats. The last one, GRAY8, is more common, but its treatment is
grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming
from typical Y video planes was not mapped to the correct RGB values.
This cannot be fixed, because AVFrame.color_range can be freely changed
at runtime, and there is nothing to ensure the pseudo palette is
updated.

Also, nothing actually used the PSEUDOPAL palette data, except xwdenc
(trivially changed in the previous commit). All other code had to treat
it as a special case, just to ignore or to propagate palette data.

In conclusion, this was just a very strange old mechnaism that has no
real justification to exist anymore (although it may have been nice and
useful in the past). Now it's an artifact that makes the API harder to
use: API users who allocate their own pixel data have to be aware that
they need to allocate the palette, or FFmpeg will crash on them in
_some_ situations. On top of this, there was no API to allocate the
pseuo palette outside of av_frame_get_buffer().

This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes
the pseudo palette optional. Nothing accesses it anymore, though if it's
set, it's propagated. It's still allocated and initialized for
compatibility with API users that rely on this feature. But new API
users do not need to allocate it. This was an explicit goal of this
patch.

Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I
first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL
macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0.

Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition,
FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation
functions manually changed to not allocating a palette.

d6fc031c

31 Mar, 2018 1 commit

arm: swscale: Only compile the rgb2yuv asm if .dn aliases are supported · f33f7284

Martin Storsjö authored 7 years ago

Vanilla clang supports altmacro since clang 5.0, and thus doesn't
require gas-preprocessor for building the arm assembly any longer.

However, the built-in assembler doesn't support .dn directives.

This readds checks that were removed in d7320ca3, when
the last usage of .dn directives within libav were removed.

Alternatively, the assembly could be rewritten to not use the
.dn directive, making it available to clang users.
Signed-off-by: Martin Storsjö <martin@martin.st>

f33f7284

24 Mar, 2018 4 commits
- swscale/rgb2rgb : cosmetic, move shuffle_bytes func declaration · 5f6126ea
  Martin Vignali authored 7 years ago
```
move shuffle_bytes_1230, 3012, 3210 with the other shuffle_byte
declaration
```
  5f6126ea
- swscale/rgb : add X86 SIMD (SSSE3), for shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 · 1ba5ca2d
  Martin Vignali authored 7 years ago
  
  1ba5ca2d
- swscale/rgb : move shuffle func shuffle_bytes_1230, shuffle_bytes_3012,... · d4f66408
  Martin Vignali authored 7 years ago
```
swscale/rgb : move shuffle func shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 in order to add SIMD
```
  d4f66408
- swscale/rgb : add X86 SIMD (SSSE3) for shuffle_bytes_2103 and shuffle_bytes_0321 · 923a3241
  Martin Vignali authored 7 years ago
  
  923a3241
03 Mar, 2018 1 commit
- swscale: Introduce a helper to identify semi-planar formats · dd3f1e3a
  Philip Langdale authored 7 years ago
```
This cleans up the ever-more-unreadable list of semi-planar
exclusions for selecting the planar copy wrapper.
```
  dd3f1e3a
02 Mar, 2018 1 commit

swscale: Add p016 output support and generalise yuv420p1x to p010 · 9d5aff09

Philip Langdale authored 7 years ago

To make the best use of existing code, I generalised the wrapper
that currently does yuv420p10 to p010 to support any mixture of
input and output sizes between 10 and 16 bits. This had the side
effect of yielding a working code path for all yuv420p1x formats
to p01x.

9d5aff09

13 Nov, 2017 1 commit

Fix missing used attribute for inline assembly variables · 43171a2a

Thomas Köppe authored 7 years ago

Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.

This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

43171a2a

29 Oct, 2017 1 commit
- lsws/yuv2rgb: Fix yuva2rgb32 on big endian hardware. · 9b0510a8
  Carl Eugen Hoyos authored 7 years ago
  
  9b0510a8
25 Oct, 2017 1 commit

swscale: use dithering in DITHER_COPY only if not set -sws_dither none · 50ce2960

Mateusz authored 7 years ago

This patch uses dithering in DITHER_COPY macro only if
it was not used option '-sws_dither none'.
With option '-sws_dither none' it uses downshift.

For human eye dithering is OK, for video codecs not necessarily.
If user don't want to use dithering, we should respect that.
Signed-off-by: Mateusz Brzostek <mateuszb@poczta.onet.pl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

50ce2960

23 Oct, 2017 1 commit
- swscale: more accurate DITHER_COPY macro for full and limited range · f192f2f0
  Mateusz authored 7 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  f192f2f0
11 Oct, 2017 1 commit
- Bump version for master after 3.4 branchpoint · 80154b1b
  Michael Niedermayer authored 7 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  80154b1b
10 Oct, 2017 1 commit
- Bump minor versions for branching 3.4 · e1de9eab
  Michael Niedermayer authored 7 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  e1de9eab