• Lauri Kasanen's avatar
    swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 · ce92ee4b
    Lauri Kasanen authored
    ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -
    
    32-bit mul, power8 only.
    
    ~2x speedup:
    
    rgb24
      24431 UNITS in yuv2packed2,   16384 runs,      0 skips
      13783 UNITS in yuv2packed2,   16383 runs,      1 skips
    bgr24
      24396 UNITS in yuv2packed2,   16384 runs,      0 skips
      14059 UNITS in yuv2packed2,   16384 runs,      0 skips
    rgba
      26815 UNITS in yuv2packed2,   16383 runs,      1 skips
      12797 UNITS in yuv2packed2,   16383 runs,      1 skips
    bgra
      27060 UNITS in yuv2packed2,   16384 runs,      0 skips
      13138 UNITS in yuv2packed2,   16384 runs,      0 skips
    argb
      26998 UNITS in yuv2packed2,   16384 runs,      0 skips
      12728 UNITS in yuv2packed2,   16381 runs,      3 skips
    bgra
      26651 UNITS in yuv2packed2,   16384 runs,      0 skips
      13124 UNITS in yuv2packed2,   16384 runs,      0 skips
    
    This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
    is also heavily inaccurate, while the vsx version has high accuracy.
    ce92ee4b
swscale_vsx.c 71.3 KB