1. 04 Oct, 2019 2 commits
    • Daniel Kolesa's avatar
      swscale: Fix AltiVec/VSX build with recent GCC · e6625ca4
      Daniel Kolesa authored
      The argument to vec_splat_u16 must be a literal. By making the
      function always inline and marking the arguments const, gcc can
      turn those into literals, and avoid build errors like:
      
      swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal
      
      Fixes #7861.
      Signed-off-by: 's avatarDaniel Kolesa <daniel@octaforge.org>
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      e6625ca4
    • Daniel Kolesa's avatar
      swscale: Replace illegal vector keyword usage in altivec code · 1bdb47b7
      Daniel Kolesa authored
      While this technically compiles in current ffmpeg, this is only
      because ffmpeg is compiled in strict ISO C mode, which disables
      the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
      replaced with a macro inside altivec.h, which defines vector to
      be actually __vector, which accepts random types.
      
      Normally, the vector keyword should be used only with plain
      scalar non-typedef types, such as unsigned int. But we have the
      vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
      in util_altivec.h in libavutil.
      
      This is also consistent with other AltiVec/VSX code elsewhere in
      the tree.
      
      Fixes #7861.
      Signed-off-by: 's avatarDaniel Kolesa <daniel@octaforge.org>
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      1bdb47b7
  2. 12 May, 2019 1 commit
    • Philip Langdale's avatar
      swscale: Add support for NV24 and NV42 · cd483180
      Philip Langdale authored
      The implementation is pretty straight-forward. Most of the existing
      NV12 codepaths work regardless of subsampling and are re-used as is.
      Where necessary I wrote the slightly different NV24 versions.
      
      Finally, the one thing that confused me for a long time was the
      asm specific x86 path that did an explicit exclusion check for NV12.
      I replaced that with a semi-planar check and also updated the
      equivalent PPC code, which Lauri kindly checked.
      cd483180
  3. 07 May, 2019 4 commits
    • Lauri Kasanen's avatar
      e25bddf5
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hScale16To* · a2a16206
      Lauri Kasanen authored
      ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
      
      ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw
      
      32-bit mul, power8 only
      
      2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
        30896 UNITS in hscale,    8192 runs,      0 skips
        63956 UNITS in hscale,    8192 runs,      0 skips
      
      2.06 for hScale16To15_vsx:
        30531 UNITS in hscale,    8192 runs,      0 skips
        63161 UNITS in hscale,    8192 runs,      0 skips
      a2a16206
    • Lauri Kasanen's avatar
      swscale/ppc: Indent · 3437111f
      Lauri Kasanen authored
      3437111f
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hScale8To19 · 9456adc2
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
      
      2.26 speedup (x86 SSE2 is 2.32):
        23772 UNITS in hscale,    4096 runs,      0 skips
        53862 UNITS in hscale,    4096 runs,      0 skips
      9456adc2
  4. 30 Apr, 2019 1 commit
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hscale_fast · d0e4d042
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw
      
      4.27 speedup for hyscale_fast:
        24796 UNITS in hyscale_fast,    4096 runs,      0 skips
         5797 UNITS in hyscale_fast,    4096 runs,      0 skips
      
      4.48 speedup for hcscale_fast:
        19911 UNITS in hcscale_fast,    4095 runs,      1 skips
         4437 UNITS in hcscale_fast,    4096 runs,      0 skips
      d0e4d042
  5. 11 Apr, 2019 1 commit
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 · ce92ee4b
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~2x speedup:
      
      rgb24
        24431 UNITS in yuv2packed2,   16384 runs,      0 skips
        13783 UNITS in yuv2packed2,   16383 runs,      1 skips
      bgr24
        24396 UNITS in yuv2packed2,   16384 runs,      0 skips
        14059 UNITS in yuv2packed2,   16384 runs,      0 skips
      rgba
        26815 UNITS in yuv2packed2,   16383 runs,      1 skips
        12797 UNITS in yuv2packed2,   16383 runs,      1 skips
      bgra
        27060 UNITS in yuv2packed2,   16384 runs,      0 skips
        13138 UNITS in yuv2packed2,   16384 runs,      0 skips
      argb
        26998 UNITS in yuv2packed2,   16384 runs,      0 skips
        12728 UNITS in yuv2packed2,   16381 runs,      3 skips
      bgra
        26651 UNITS in yuv2packed2,   16384 runs,      0 skips
        13124 UNITS in yuv2packed2,   16384 runs,      0 skips
      
      This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
      is also heavily inaccurate, while the vsx version has high accuracy.
      ce92ee4b
  6. 07 Apr, 2019 3 commits
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full_X · 8607e29f
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                      -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                      -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~6.4x speedup:
      
      rgb24
       214278 UNITS in yuv2packedX,   16384 runs,      0 skips
        33249 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgr24
       214616 UNITS in yuv2packedX,   16384 runs,      0 skips
        33233 UNITS in yuv2packedX,   16384 runs,      0 skips
      rgba
       214517 UNITS in yuv2packedX,   16384 runs,      0 skips
        33271 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgra
       214973 UNITS in yuv2packedX,   16384 runs,      0 skips
        33397 UNITS in yuv2packedX,   16384 runs,      0 skips
      argb
       214613 UNITS in yuv2packedX,   16384 runs,      0 skips
        33310 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgra
       214637 UNITS in yuv2packedX,   16384 runs,      0 skips
        33330 UNITS in yuv2packedX,   16384 runs,      0 skips
      8607e29f
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full_2 · 3256e949
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                  -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                  -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~4x speedup:
      
      rgb24
        52763 UNITS in yuv2packed2,   16384 runs,      0 skips
        13453 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgr24
        53144 UNITS in yuv2packed2,   16384 runs,      0 skips
        13616 UNITS in yuv2packed2,   16384 runs,      0 skips
      rgba
        52796 UNITS in yuv2packed2,   16384 runs,      0 skips
        12904 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgra
        52732 UNITS in yuv2packed2,   16384 runs,      0 skips
        13262 UNITS in yuv2packed2,   16384 runs,      0 skips
      argb
        52661 UNITS in yuv2packed2,   16384 runs,      0 skips
        12879 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgra
        52662 UNITS in yuv2packed2,   16384 runs,      0 skips
        12932 UNITS in yuv2packed2,   16384 runs,      0 skips
      3256e949
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1 · 50e672bc
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      1.8-2.3x speedup:
      
      rgb24
        18192 UNITS in yuv2packed1,   32767 runs,      1 skips
         9983 UNITS in yuv2packed1,   32760 runs,      8 skips
      bgr24
        18665 UNITS in yuv2packed1,   32766 runs,      2 skips
         9925 UNITS in yuv2packed1,   32763 runs,      5 skips
      rgba
        20239 UNITS in yuv2packed1,   32767 runs,      1 skips
         8794 UNITS in yuv2packed1,   32759 runs,      9 skips
      bgra
        20354 UNITS in yuv2packed1,   32768 runs,      0 skips
         8770 UNITS in yuv2packed1,   32761 runs,      7 skips
      argb
        20185 UNITS in yuv2packed1,   32768 runs,      0 skips
         8761 UNITS in yuv2packed1,   32761 runs,      7 skips
      bgra
        20360 UNITS in yuv2packed1,   32766 runs,      2 skips
         8759 UNITS in yuv2packed1,   32764 runs,      4 skips
      
      This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
      is also heavily inaccurate, while the vsx version has high accuracy.
      50e672bc
  7. 31 Mar, 2019 3 commits
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_X · 7adce3e6
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -
      
      7.2x speedup:
      
      yuyv422
       126354 UNITS in yuv2packedX,   16384 runs,      0 skips
        16383 UNITS in yuv2packedX,   16382 runs,      2 skips
      yvyu422
       117669 UNITS in yuv2packedX,   16384 runs,      0 skips
        16271 UNITS in yuv2packedX,   16379 runs,      5 skips
      uyvy422
       117310 UNITS in yuv2packedX,   16384 runs,      0 skips
        16226 UNITS in yuv2packedX,   16382 runs,      2 skips
      7adce3e6
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_2 · 9a2db4dc
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                      -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                      -cpuflags 0 -v error -
      
      5.1x speedup:
      
      yuyv422
        19339 UNITS in yuv2packed2,   16384 runs,      0 skips
         3718 UNITS in yuv2packed2,   16383 runs,      1 skips
      yvyu422
        19438 UNITS in yuv2packed2,   16384 runs,      0 skips
         3800 UNITS in yuv2packed2,   16380 runs,      4 skips
      uyvy422
        19128 UNITS in yuv2packed2,   16384 runs,      0 skips
         3721 UNITS in yuv2packed2,   16380 runs,      4 skips
      9a2db4dc
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_1 · a6a31ca3
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                  -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
                  -cpuflags 0 -v error -
      
      15.3x speedup:
      
      yuyv422
        14513 UNITS in yuv2packed1,   32768 runs,      0 skips
          949 UNITS in yuv2packed1,   32767 runs,      1 skips
      yvyu422
        14516 UNITS in yuv2packed1,   32767 runs,      1 skips
          943 UNITS in yuv2packed1,   32767 runs,      1 skips
      uyvy422
        14530 UNITS in yuv2packed1,   32767 runs,      1 skips
          941 UNITS in yuv2packed1,   32766 runs,      2 skips
      a6a31ca3
  8. 27 Mar, 2019 1 commit
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full · 681957b8
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
              -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      This uses 32-bit mul, so POWER8 only.
      
      The following output formats get about 4.5x speedup:
      
      rgb24
        39980 UNITS in yuv2packed1,   32768 runs,      0 skips
         8774 UNITS in yuv2packed1,   32768 runs,      0 skips
      bgr24
        40069 UNITS in yuv2packed1,   32768 runs,      0 skips
         8772 UNITS in yuv2packed1,   32766 runs,      2 skips
      rgba
        39759 UNITS in yuv2packed1,   32768 runs,      0 skips
         8681 UNITS in yuv2packed1,   32767 runs,      1 skips
      bgra
        39729 UNITS in yuv2packed1,   32768 runs,      0 skips
         8696 UNITS in yuv2packed1,   32766 runs,      2 skips
      argb
        39766 UNITS in yuv2packed1,   32768 runs,      0 skips
         8672 UNITS in yuv2packed1,   32766 runs,      2 skips
      bgra
        39784 UNITS in yuv2packed1,   32768 runs,      0 skips
         8659 UNITS in yuv2packed1,   32767 runs,      1 skips
      681957b8
  9. 20 Mar, 2019 1 commit
  10. 05 Feb, 2019 1 commit
    • Lauri Kasanen's avatar
      libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX · 8522d219
      Lauri Kasanen authored
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
      -s 1920x1728 -f null -vframes 100 -v error -nostats -
      
      9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
      Fate passes, each format tested with an image to video conversion.
      
      Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
      of the 16-bit function. This includes the vec_mulo/mule functions too,
      not just vmuluwm.
      
      With TIMER_REPORT skips disabled:
      yuv420p9le
        12412 UNITS in planarX,  131072 runs,      0 skips
        73136 UNITS in planarX,  131072 runs,      0 skips
      yuv420p9be
        12481 UNITS in planarX,  131072 runs,      0 skips
        73410 UNITS in planarX,  131072 runs,      0 skips
      yuv420p10le
        12322 UNITS in planarX,  131072 runs,      0 skips
        72546 UNITS in planarX,  131072 runs,      0 skips
      yuv420p10be
        12291 UNITS in planarX,  131072 runs,      0 skips
        72935 UNITS in planarX,  131072 runs,      0 skips
      yuv420p12le
        12316 UNITS in planarX,  131072 runs,      0 skips
        72708 UNITS in planarX,  131072 runs,      0 skips
      yuv420p12be
        12319 UNITS in planarX,  131072 runs,      0 skips
        72577 UNITS in planarX,  131072 runs,      0 skips
      yuv420p14le
        12259 UNITS in planarX,  131072 runs,      0 skips
        72516 UNITS in planarX,  131072 runs,      0 skips
      yuv420p14be
        12440 UNITS in planarX,  131072 runs,      0 skips
        72962 UNITS in planarX,  131072 runs,      0 skips
      yuv420p16le
        10548 UNITS in planarX,  131072 runs,      0 skips
        73429 UNITS in planarX,  131072 runs,      0 skips
      yuv420p16be
        10634 UNITS in planarX,  131072 runs,      0 skips
       150959 UNITS in planarX,  131072 runs,      0 skips
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      8522d219
  11. 14 Dec, 2018 1 commit
  12. 12 Dec, 2018 1 commit
    • Lauri Kasanen's avatar
      swscale/output: VSX-optimize nbps yuv2plane1 · 1046cba2
      Lauri Kasanen authored
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
      -f null -vframes 100 -v error -nostats -
      
      Speedups:
      yuv2plane1_9BE_vsx	11.2042
      yuv2plane1_9LE_vsx	11.156
      yuv2plane1_10BE_vsx	9.89428
      yuv2plane1_10LE_vsx	10.3637
      yuv2plane1_12BE_vsx	9.71923
      yuv2plane1_12LE_vsx	11.0404
      yuv2plane1_14BE_vsx	10.1763
      yuv2plane1_14LE_vsx	11.2728
      
      Fate passes, each format tested with an image to video conversion.
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      1046cba2
  13. 04 Dec, 2018 1 commit