- 05 Feb, 2019 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \ -s 1920x1728 -f null -vframes 100 -v error -nostats - 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. Fate passes, each format tested with an image to video conversion. Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out of the 16-bit function. This includes the vec_mulo/mule functions too, not just vmuluwm. With TIMER_REPORT skips disabled: yuv420p9le 12412 UNITS in planarX, 131072 runs, 0 skips 73136 UNITS in planarX, 131072 runs, 0 skips yuv420p9be 12481 UNITS in planarX, 131072 runs, 0 skips 73410 UNITS in planarX, 131072 runs, 0 skips yuv420p10le 12322 UNITS in planarX, 131072 runs, 0 skips 72546 UNITS in planarX, 131072 runs, 0 skips yuv420p10be 12291 UNITS in planarX, 131072 runs, 0 skips 72935 UNITS in planarX, 131072 runs, 0 skips yuv420p12le 12316 UNITS in planarX, 131072 runs, 0 skips 72708 UNITS in planarX, 131072 runs, 0 skips yuv420p12be 12319 UNITS in planarX, 131072 runs, 0 skips 72577 UNITS in planarX, 131072 runs, 0 skips yuv420p14le 12259 UNITS in planarX, 131072 runs, 0 skips 72516 UNITS in planarX, 131072 runs, 0 skips yuv420p14be 12440 UNITS in planarX, 131072 runs, 0 skips 72962 UNITS in planarX, 131072 runs, 0 skips yuv420p16le 10548 UNITS in planarX, 131072 runs, 0 skips 73429 UNITS in planarX, 131072 runs, 0 skips yuv420p16be 10634 UNITS in planarX, 131072 runs, 0 skips 150959 UNITS in planarX, 131072 runs, 0 skips Signed-off-by:
Lauri Kasanen <cand@gmx.com>
-
- 01 Jan, 2019 1 commit
-
-
Michael Niedermayer authored
Reviewed-by:
Paul B Mahol <onemda@gmail.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 26 Dec, 2018 1 commit
-
-
Lauri Kasanen authored
This function wouldn't benefit from VSX instructions, so I put it under altivec. ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \ -f null -vframes 100 -v error -nostats - 3743 UNITS in planar1, 65495 runs, 41 skips -cpuflags 0 23511 UNITS in planar1, 65530 runs, 6 skips grayf32be 4647 UNITS in planar1, 65449 runs, 87 skips -cpuflags 0 28608 UNITS in planar1, 65530 runs, 6 skips The native speedup is 6.28133, and the bswapping one 6.15623. Fate passes, each format tested with an image to video conversion. Signed-off-by:
Lauri Kasanen <cand@gmx.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 14 Dec, 2018 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \ -f null -vframes 100 -v error -nostats - 2120 UNITS in planar1, 65393 runs, 143 skips -cpuflags 0 19157 UNITS in planar1, 65512 runs, 24 skips 9.03632 speedup, 16be similarly. Fate passes, each format tested with an image to video conversion. Signed-off-by:
Lauri Kasanen <cand@gmx.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 12 Dec, 2018 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \ -f null -vframes 100 -v error -nostats - Speedups: yuv2plane1_9BE_vsx 11.2042 yuv2plane1_9LE_vsx 11.156 yuv2plane1_10BE_vsx 9.89428 yuv2plane1_10LE_vsx 10.3637 yuv2plane1_12BE_vsx 9.71923 yuv2plane1_12LE_vsx 11.0404 yuv2plane1_14BE_vsx 10.1763 yuv2plane1_14LE_vsx 11.2728 Fate passes, each format tested with an image to video conversion. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 04 Dec, 2018 1 commit
-
-
Lauri Kasanen authored
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied). Signed-off-by:
Lauri Kasanen <cand@gmx.com> Tested-by: Michael Kostylev on BE Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 26 Nov, 2018 1 commit
-
-
Lauri Kasanen authored
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \ -f null -vframes 100 -v error -nostats - 1158 UNITS in planar1, 65528 runs, 8 skips -cpuflags 0 19082 UNITS in planar1, 65533 runs, 3 skips 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version takes as many cycles as the x86 SSE2 version, yikes it's fast. Note that this function uses VSX instructions, but is not marked so. This is because several existing functions also make that mistake. I'll submit a patch moving them once this is reviewed. Signed-off-by:
Lauri Kasanen <cand@gmx.com> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 24 Nov, 2018 1 commit
-
-
Martin Vignali authored
-
- 01 Nov, 2018 2 commits
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 24 Oct, 2018 3 commits
-
-
Martin Vignali authored
is used for packed and planar format
-
Martin Vignali authored
-
Martin Vignali authored
-
- 18 Oct, 2018 2 commits
-
-
Martin Vignali authored
-
Martin Vignali authored
suggested by Carl Eugen Hoyos
-
- 13 Oct, 2018 2 commits
-
-
Martin Vignali authored
-
Martin Vignali authored
-
- 09 Sep, 2018 1 commit
-
-
Paul B Mahol authored
-
- 22 Aug, 2018 2 commits
-
-
Martin Vignali authored
-
Martin Vignali authored
Currently float are converted to 16b uint in input part using src depth (32 bits) in hScale16To19 and hScale16to15, make an invalid shift for the data So shift the value when using float input like 16 bpc uint.
-
- 14 Aug, 2018 1 commit
-
-
Sergey Lavrushkin authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 10 Jun, 2018 1 commit
-
-
Carl Eugen Hoyos authored
Fixes the following warnings: In file included from libswscale/rgb2rgb.c:128:0: libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used
-
- 05 May, 2018 1 commit
-
-
Paul B Mahol authored
Signed-off-by:
Paul B Mahol <onemda@gmail.com>
-
- 22 Apr, 2018 1 commit
-
-
Martin Vignali authored
and checkasm test
-
- 16 Apr, 2018 2 commits
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 03 Apr, 2018 1 commit
-
-
wm4 authored
PSEUDOPAL pixel formats are not paletted, but carried a palette with the intention of allowing code to treat unpaletted formats as paletted. The palette simply mapped the byte values to the resulting RGB values, making it some sort of LUT for RGB conversion. It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8, GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap formats. The last one, GRAY8, is more common, but its treatment is grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming from typical Y video planes was not mapped to the correct RGB values. This cannot be fixed, because AVFrame.color_range can be freely changed at runtime, and there is nothing to ensure the pseudo palette is updated. Also, nothing actually used the PSEUDOPAL palette data, except xwdenc (trivially changed in the previous commit). All other code had to treat it as a special case, just to ignore or to propagate palette data. In conclusion, this was just a very strange old mechnaism that has no real justification to exist anymore (although it may have been nice and useful in the past). Now it's an artifact that makes the API harder to use: API users who allocate their own pixel data have to be aware that they need to allocate the palette, or FFmpeg will crash on them in _some_ situations. On top of this, there was no API to allocate the pseuo palette outside of av_frame_get_buffer(). This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes the pseudo palette optional. Nothing accesses it anymore, though if it's set, it's propagated. It's still allocated and initialized for compatibility with API users that rely on this feature. But new API users do not need to allocate it. This was an explicit goal of this patch. Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0. Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition, FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation functions manually changed to not allocating a palette.
-
- 31 Mar, 2018 1 commit
-
-
Martin Storsjö authored
Vanilla clang supports altmacro since clang 5.0, and thus doesn't require gas-preprocessor for building the arm assembly any longer. However, the built-in assembler doesn't support .dn directives. This readds checks that were removed in d7320ca3, when the last usage of .dn directives within libav were removed. Alternatively, the assembly could be rewritten to not use the .dn directive, making it available to clang users. Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- 24 Mar, 2018 4 commits
-
-
Martin Vignali authored
move shuffle_bytes_1230, 3012, 3210 with the other shuffle_byte declaration
-
Martin Vignali authored
-
Martin Vignali authored
swscale/rgb : move shuffle func shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 in order to add SIMD
-
Martin Vignali authored
-
- 03 Mar, 2018 1 commit
-
-
Philip Langdale authored
This cleans up the ever-more-unreadable list of semi-planar exclusions for selecting the planar copy wrapper.
-
- 02 Mar, 2018 1 commit
-
-
Philip Langdale authored
To make the best use of existing code, I generalised the wrapper that currently does yuv420p10 to p010 to support any mixture of input and output sizes between 10 and 16 bits. This had the side effect of yielding a working code path for all yuv420p1x formats to p01x.
-
- 13 Nov, 2017 1 commit
-
-
Thomas Köppe authored
Variables used in inline assembly need to be marked with attribute((used)). Static constants already were, via the define of DECLARE_ASM_CONST. But DECLARE_ALIGNED does not add this attribute, and some of the variables defined with it are const only used in inline assembly, and therefore appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks variables as used. This change makes FFMPEG work with Clang's ThinLTO. Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 29 Oct, 2017 1 commit
-
-
Carl Eugen Hoyos authored
-
- 25 Oct, 2017 1 commit
-
-
Mateusz authored
This patch uses dithering in DITHER_COPY macro only if it was not used option '-sws_dither none'. With option '-sws_dither none' it uses downshift. For human eye dithering is OK, for video codecs not necessarily. If user don't want to use dithering, we should respect that. Signed-off-by:
Mateusz Brzostek <mateuszb@poczta.onet.pl> Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 23 Oct, 2017 1 commit
-
-
Mateusz authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 11 Oct, 2017 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-
- 10 Oct, 2017 1 commit
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michael@niedermayer.cc>
-