libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -
9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.
Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.
With TIMER_REPORT skips disabled:
yuv420p9le
12412 UNITS in planarX, 131072 runs, 0 skips
73136 UNITS in planarX, 131072 runs, 0 skips
yuv420p9be
12481 UNITS in planarX, 131072 runs, 0 skips
73410 UNITS in planarX, 131072 runs, 0 skips
yuv420p10le
12322 UNITS in planarX, 131072 runs, 0 skips
72546 UNITS in planarX, 131072 runs, 0 skips
yuv420p10be
12291 UNITS in planarX, 131072 runs, 0 skips
72935 UNITS in planarX, 131072 runs, 0 skips
yuv420p12le
12316 UNITS in planarX, 131072 runs, 0 skips
72708 UNITS in planarX, 131072 runs, 0 skips
yuv420p12be
12319 UNITS in planarX, 131072 runs, 0 skips
72577 UNITS in planarX, 131072 runs, 0 skips
yuv420p14le
12259 UNITS in planarX, 131072 runs, 0 skips
72516 UNITS in planarX, 131072 runs, 0 skips
yuv420p14be
12440 UNITS in planarX, 131072 runs, 0 skips
72962 UNITS in planarX, 131072 runs, 0 skips
yuv420p16le
10548 UNITS in planarX, 131072 runs, 0 skips
73429 UNITS in planarX, 131072 runs, 0 skips
yuv420p16be
10634 UNITS in planarX, 131072 runs, 0 skips
150959 UNITS in planarX, 131072 runs, 0 skips
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Showing
Please
register
or
sign in
to comment