• Lauri Kasanen's avatar
    libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX · 8522d219
    Lauri Kasanen authored
    ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
    -s 1920x1728 -f null -vframes 100 -v error -nostats -
    
    9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
    Fate passes, each format tested with an image to video conversion.
    
    Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
    of the 16-bit function. This includes the vec_mulo/mule functions too,
    not just vmuluwm.
    
    With TIMER_REPORT skips disabled:
    yuv420p9le
      12412 UNITS in planarX,  131072 runs,      0 skips
      73136 UNITS in planarX,  131072 runs,      0 skips
    yuv420p9be
      12481 UNITS in planarX,  131072 runs,      0 skips
      73410 UNITS in planarX,  131072 runs,      0 skips
    yuv420p10le
      12322 UNITS in planarX,  131072 runs,      0 skips
      72546 UNITS in planarX,  131072 runs,      0 skips
    yuv420p10be
      12291 UNITS in planarX,  131072 runs,      0 skips
      72935 UNITS in planarX,  131072 runs,      0 skips
    yuv420p12le
      12316 UNITS in planarX,  131072 runs,      0 skips
      72708 UNITS in planarX,  131072 runs,      0 skips
    yuv420p12be
      12319 UNITS in planarX,  131072 runs,      0 skips
      72577 UNITS in planarX,  131072 runs,      0 skips
    yuv420p14le
      12259 UNITS in planarX,  131072 runs,      0 skips
      72516 UNITS in planarX,  131072 runs,      0 skips
    yuv420p14be
      12440 UNITS in planarX,  131072 runs,      0 skips
      72962 UNITS in planarX,  131072 runs,      0 skips
    yuv420p16le
      10548 UNITS in planarX,  131072 runs,      0 skips
      73429 UNITS in planarX,  131072 runs,      0 skips
    yuv420p16be
      10634 UNITS in planarX,  131072 runs,      0 skips
     150959 UNITS in planarX,  131072 runs,      0 skips
    Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
    8522d219
swscale_ppc_template.c 9.14 KB