Commits · dbacecd347599aa421be94ad5e16521aa51f7014 · Linshizhi / ffmpeg.wasm-core

15 May, 2020 3 commits

swscale: aarch64: Add a NEON implementation of interleaveBytes · e0604d50

Martin Storsjö authored 4 years ago

This allows speeding up format conversions from yuv420 to nv12.

                             Cortex A53      A72      A73
interleave_bytes_c:             86077.5  51433.0  66972.0
interleave_bytes_neon:          19701.7  23019.2  15859.2
interleave_bytes_aligned_c:     86603.0  52017.2  67484.2
interleave_bytes_aligned_neon:   9061.0   7623.0   6309.0
Signed-off-by: Martin Storsjö <martin@martin.st>

e0604d50

swscale: arm: fix NEON hscale init · 70b14cc8

Josh de Kock authored 4 years ago

The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

This applies the same fix from 718c8f9a
on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale
there.
Signed-off-by: Martin Storsjö <martin@martin.st>

70b14cc8

swscale: fix NEON hscale init · 718c8f9a

Josh de Kock authored 4 years ago

The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).
Signed-off-by: Josh de Kock <josh@itanimul.li>

718c8f9a

11 May, 2020 1 commit

libswscale: fix for floating point formats, require full chroma · fabeef22

Mark Reid authored 4 years ago

upon more floating point testing, looks like I missed adding this bit.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

fabeef22

05 May, 2020 2 commits
- libswscale: add output support for AV_PIX_FMT_GBRAPF32 · b4967fc7
  Mark Reid authored 4 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  b4967fc7
- libswscale: add input support AV_PIX_FMT_GBRAPF32 · ba5d0515
  Mark Reid authored 4 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  ba5d0515
27 Apr, 2020 1 commit

swscale/vscale: Increase type strictness · 2fae0009

Andreas Rheinhardt authored 4 years ago

libswscale/vscale.c makes extensive use of function pointers and in
doing so it converts these function pointers to and from a pointer to
void. Yet this is actually against the C standard:
C90 only guarantees that one can convert a pointer to any incomplete
type or object type to void* and back with the result comparing equal
to the original which makes pointers to void generic pointers to
incomplete or object type. Yet C90 lacks a generic function pointer
type.
C99 additionally guarantees that a pointer to a function of one type may
be converted to a pointer to a function of another type with the result
and the original comparing equal when converting back.
This makes any function pointer type a generic function pointer type.
Yet even this does not make pointers to void generic function pointers.

Both GCC and Clang emit warnings for this when in pedantic mode.

This commit fixes this by using a union that can hold one member of any
of the required function pointer types to store the function pointer.
This works even for C90.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>

2fae0009

21 Apr, 2020 1 commit
- swscale: aarch64: Don't clobber callee-saved registers v8-v15 · 9025d5c5
  Martin Storsjö authored 4 years ago
```
Signed-off-by: Martin Storsjö <martin@martin.st>
```
  9025d5c5
19 Apr, 2020 1 commit

swscale: aarch64: Avoid using the x18 register · 872790b1

Martin Storsjö authored 4 years ago

The x18 is a reserved platform register on Darwin and Windows.

x8/w8 seems to be unused in this function though (and same about
x10 and x14), so there's really no reason to use x18 here - just change
the uses of x18/w18 into x8/w8 instead without any further rewrites.
Signed-off-by: Martin Storsjö <martin@martin.st>

872790b1

12 Apr, 2020 1 commit
- swscale/yuv2rgb: Fix vertical dither offset with slices · be3c29e3
  Michael Niedermayer authored 4 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  be3c29e3
04 Apr, 2020 2 commits

swscale/output: Fix integer overflow in yuv2rgb_write_full() with out of range input · e057e83a

Michael Niedermayer authored 5 years ago

Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

e057e83a

swscale/output: Fix integer overflow in alpha computation in yuv2gbrp16_full_X_c() · 49ba1879

Michael Niedermayer authored 5 years ago

Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

49ba1879

02 Apr, 2020 1 commit

swscale/swscale: remove useless code · 4700f7d6

Ruiling Song authored 4 years ago

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

4700f7d6

11 Mar, 2020 1 commit
- lsws/input: Do not change transparency range. · 5f8c3834
  Carl Eugen Hoyos authored 5 years ago
```
Fixes ticket #8509.
```
  5f8c3834
26 Feb, 2020 1 commit
- libswscale/x86/yuv2rgb: Fix Segmentation Fault when load unaligned data · 828f7db5
  Ting Fu authored 5 years ago
```
Fixes ticket #8532
Signed-off-by: Ting Fu <ting.fu@intel.com>
```
  828f7db5
24 Feb, 2020 1 commit

swscale: Add swscale input support for Y210LE · d2aa1fbf

Linjie Fu authored 5 years ago

Add swscale input support for Y210LE, output support and fate
test could be added later if there is requirement for software
CSC to this packed format.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>

d2aa1fbf

10 Feb, 2020 1 commit

libswscale/x86/yuv2rgb: add ssse3 version · fc6a5883

Ting Fu authored 5 years ago

Tested using this command:
/ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
-vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null

The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Signed-off-by: Ting Fu <ting.fu@intel.com>

fc6a5883

09 Feb, 2020 1 commit

libswscale/utils.c: Fix bug #8255 · da399e21

Gautam Ramakrishnan authored 5 years ago

Bug #8255 points out a double free error in libwscale/utils.c file.
The double free is because the pointer to cascaded_context of an
sw_context is not set to NULL after freeing it. When the sw_context
is later freed, sws_freeContext is called on the cascaded_context,
causing a double free.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

da399e21

05 Feb, 2020 1 commit

libswscale/x86/yuv2rgb: Change inline assembly into nasm code · e934194b

Ting Fu authored 5 years ago

The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

e934194b

22 Jan, 2020 3 commits

swscale/input: Fix several invalid shifts related to rgb2yuv constants · d48e5101
Michael Niedermayer authored 5 years ago
```
Fixes: Invalid shifts
Fixes: #8140
Fixes: #8146
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
d48e5101

swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template() · 7b7f9753

Michael Niedermayer authored 5 years ago

Fixes: Invalid shifts
Fixes: #8320
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

7b7f9753

swscale/swscale: Fix several invalid shifts related to vChrDrop · a6ca22c1

Michael Niedermayer authored 5 years ago

Fixes: Invalid shifts
Fixes: #8166
Fixes: filter-crop_scale_vflip FATE-test
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a6ca22c1

06 Jan, 2020 1 commit

Silence "string-plus-int" warning shown by clang. · 96fab29e

Carl Eugen Hoyos authored 5 years ago

libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]

96fab29e

04 Jan, 2020 1 commit

swscale/aarch64: use multiply accumulate and shift-right narrow · c3a17fff

Sebastian Pop authored 5 years ago

This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right instructions and
performing the shift with the shift-right narrow instructions.

I see 8% difference on an m6g instance with neoverse-n1 CPUs:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818

Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

c3a17fff

31 Dec, 2019 1 commit
- swscale/utils: remove access of AV_PIX_FMT_NB · 1e3e547a
  Zhao Zhili authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  1e3e547a
17 Dec, 2019 1 commit

swscale/aarch64: use multiply accumulate and increase vector factor to 4 · bd831912

Sebastian Pop authored 5 years ago

This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

bd831912

10 Dec, 2019 1 commit
- swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper · 8558c231
  Limin Wang authored 5 years ago
```
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  8558c231
06 Dec, 2019 1 commit

libswscale/swscale_unscaled.c: remove redundant code · 039a0ebe

Ting Fu authored 5 years ago

Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

039a0ebe

01 Nov, 2019 1 commit

swscale/swscale_unscaled: fix gbrap10be md5 different on big endian system · a5e24be5

Limin Wang authored 5 years ago

You can reproduce it by below command:
./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \
    -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact  \
    -frames:v 1 -f nut md5:

little-endian:
f91e2edd8098276579c1929e5e160416
big-endian:
ba4d011dbbdc78ccbf6cc7d698630929
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

a5e24be5

16 Oct, 2019 3 commits
- swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template() · d2606210
  Michael Niedermayer authored 5 years ago
```
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  d2606210
- swscale/output: Correct Alpha in yuv2ya16_X_c_template() · 3e668293
  Michael Niedermayer authored 5 years ago
```
Untested, no testcase
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  3e668293
- swscale/output: Implement Luma computation from yuv2ya16_X_c_template() without 64bit · 4f4ca675
  Michael Niedermayer authored 5 years ago
```
This also reverts 21838cad
The revert is in this commit to avoid 2 fate updates
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
```
  4f4ca675
04 Oct, 2019 2 commits

swscale: Fix AltiVec/VSX build with recent GCC · e6625ca4

Daniel Kolesa authored 5 years ago

The argument to vec_splat_u16 must be a literal. By making the
function always inline and marking the arguments const, gcc can
turn those into literals, and avoid build errors like:

swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal

Fixes #7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>

e6625ca4

swscale: Replace illegal vector keyword usage in altivec code · 1bdb47b7

Daniel Kolesa authored 5 years ago

While this technically compiles in current ffmpeg, this is only
because ffmpeg is compiled in strict ISO C mode, which disables
the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
replaced with a macro inside altivec.h, which defines vector to
be actually __vector, which accepts random types.

Normally, the vector keyword should be used only with plain
scalar non-typedef types, such as unsigned int. But we have the
vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
in util_altivec.h in libavutil.

This is also consistent with other AltiVec/VSX code elsewhere in
the tree.

Fixes #7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>

1bdb47b7

28 Sep, 2019 2 commits

swscale/utils: Fix invalid left shifts of negative numbers · e2646e23

Andreas Rheinhardt authored 5 years ago

Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411,
vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

e2646e23

swscale/x86/swscale: Fix undefined left shifts of negative numbers · 736c7c20

Andreas Rheinhardt authored 5 years ago

This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

736c7c20

27 Sep, 2019 1 commit

swscale/swscale: cosmetics · cde1d70a

Limin Wang authored 5 years ago

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

cde1d70a

26 Sep, 2019 1 commit
- swscale/output: fix signed integer overflow for ya16 · 21838cad
  Paul B Mahol authored 5 years ago
```
Fixes #7666.
```
  21838cad
09 Sep, 2019 1 commit

swscale/swscale: delete unwanted assignments · 29bde4b3

Limin Wang authored 5 years ago

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

29bde4b3

06 Sep, 2019 1 commit

swscale/output: fix some code indentations · ef134265

Linjie Fu authored 5 years ago

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

ef134265