Commit 50e672bc authored by Lauri Kasanen's avatar Lauri Kasanen

swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1

./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
parent 7c187514
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment