swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. 1.8-2.3x speedup: rgb24 18192 UNITS in yuv2packed1, 32767 runs, 1 skips 9983 UNITS in yuv2packed1, 32760 runs, 8 skips bgr24 18665 UNITS in yuv2packed1, 32766 runs, 2 skips 9925 UNITS in yuv2packed1, 32763 runs, 5 skips rgba 20239 UNITS in yuv2packed1, 32767 runs, 1 skips 8794 UNITS in yuv2packed1, 32759 runs, 9 skips bgra 20354 UNITS in yuv2packed1, 32768 runs, 0 skips 8770 UNITS in yuv2packed1, 32761 runs, 7 skips argb 20185 UNITS in yuv2packed1, 32768 runs, 0 skips 8761 UNITS in yuv2packed1, 32761 runs, 7 skips bgra 20360 UNITS in yuv2packed1, 32766 runs, 2 skips 8759 UNITS in yuv2packed1, 32764 runs, 4 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy.
Showing
This diff is collapsed.
Please
register
or
sign in
to comment