1. 21 Jul, 2019 2 commits
  2. 13 May, 2019 2 commits
  3. 12 May, 2019 2 commits
    • Philip Langdale's avatar
      swscale: Add test for isSemiPlanarYUV to pixdesc_query · 4fa4f1d7
      Philip Langdale authored
      Lauri had asked me what the semi planar formats were and that reminded
      me that we could add it to pixdesc_query so we know exactly what the
      list is.
      4fa4f1d7
    • Philip Langdale's avatar
      swscale: Add support for NV24 and NV42 · cd483180
      Philip Langdale authored
      The implementation is pretty straight-forward. Most of the existing
      NV12 codepaths work regardless of subsampling and are re-used as is.
      Where necessary I wrote the slightly different NV24 versions.
      
      Finally, the one thing that confused me for a long time was the
      asm specific x86 path that did an explicit exclusion check for NV12.
      I replaced that with a semi-planar check and also updated the
      equivalent PPC code, which Lauri kindly checked.
      cd483180
  4. 07 May, 2019 4 commits
    • Lauri Kasanen's avatar
      e25bddf5
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hScale16To* · a2a16206
      Lauri Kasanen authored
      ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
      
      ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw
      
      32-bit mul, power8 only
      
      2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
        30896 UNITS in hscale,    8192 runs,      0 skips
        63956 UNITS in hscale,    8192 runs,      0 skips
      
      2.06 for hScale16To15_vsx:
        30531 UNITS in hscale,    8192 runs,      0 skips
        63161 UNITS in hscale,    8192 runs,      0 skips
      a2a16206
    • Lauri Kasanen's avatar
      swscale/ppc: Indent · 3437111f
      Lauri Kasanen authored
      3437111f
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hScale8To19 · 9456adc2
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw
      
      2.26 speedup (x86 SSE2 is 2.32):
        23772 UNITS in hscale,    4096 runs,      0 skips
        53862 UNITS in hscale,    4096 runs,      0 skips
      9456adc2
  5. 30 Apr, 2019 1 commit
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize hscale_fast · d0e4d042
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw
      
      4.27 speedup for hyscale_fast:
        24796 UNITS in hyscale_fast,    4096 runs,      0 skips
         5797 UNITS in hyscale_fast,    4096 runs,      0 skips
      
      4.48 speedup for hcscale_fast:
        19911 UNITS in hcscale_fast,    4095 runs,      1 skips
         4437 UNITS in hcscale_fast,    4096 runs,      0 skips
      d0e4d042
  6. 11 Apr, 2019 1 commit
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 · ce92ee4b
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~2x speedup:
      
      rgb24
        24431 UNITS in yuv2packed2,   16384 runs,      0 skips
        13783 UNITS in yuv2packed2,   16383 runs,      1 skips
      bgr24
        24396 UNITS in yuv2packed2,   16384 runs,      0 skips
        14059 UNITS in yuv2packed2,   16384 runs,      0 skips
      rgba
        26815 UNITS in yuv2packed2,   16383 runs,      1 skips
        12797 UNITS in yuv2packed2,   16383 runs,      1 skips
      bgra
        27060 UNITS in yuv2packed2,   16384 runs,      0 skips
        13138 UNITS in yuv2packed2,   16384 runs,      0 skips
      argb
        26998 UNITS in yuv2packed2,   16384 runs,      0 skips
        12728 UNITS in yuv2packed2,   16381 runs,      3 skips
      bgra
        26651 UNITS in yuv2packed2,   16384 runs,      0 skips
        13124 UNITS in yuv2packed2,   16384 runs,      0 skips
      
      This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
      is also heavily inaccurate, while the vsx version has high accuracy.
      ce92ee4b
  7. 07 Apr, 2019 3 commits
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full_X · 8607e29f
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                      -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                      -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~6.4x speedup:
      
      rgb24
       214278 UNITS in yuv2packedX,   16384 runs,      0 skips
        33249 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgr24
       214616 UNITS in yuv2packedX,   16384 runs,      0 skips
        33233 UNITS in yuv2packedX,   16384 runs,      0 skips
      rgba
       214517 UNITS in yuv2packedX,   16384 runs,      0 skips
        33271 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgra
       214973 UNITS in yuv2packedX,   16384 runs,      0 skips
        33397 UNITS in yuv2packedX,   16384 runs,      0 skips
      argb
       214613 UNITS in yuv2packedX,   16384 runs,      0 skips
        33310 UNITS in yuv2packedX,   16384 runs,      0 skips
      bgra
       214637 UNITS in yuv2packedX,   16384 runs,      0 skips
        33330 UNITS in yuv2packedX,   16384 runs,      0 skips
      8607e29f
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full_2 · 3256e949
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                  -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                  -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      ~4x speedup:
      
      rgb24
        52763 UNITS in yuv2packed2,   16384 runs,      0 skips
        13453 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgr24
        53144 UNITS in yuv2packed2,   16384 runs,      0 skips
        13616 UNITS in yuv2packed2,   16384 runs,      0 skips
      rgba
        52796 UNITS in yuv2packed2,   16384 runs,      0 skips
        12904 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgra
        52732 UNITS in yuv2packed2,   16384 runs,      0 skips
        13262 UNITS in yuv2packed2,   16384 runs,      0 skips
      argb
        52661 UNITS in yuv2packed2,   16384 runs,      0 skips
        12879 UNITS in yuv2packed2,   16384 runs,      0 skips
      bgra
        52662 UNITS in yuv2packed2,   16384 runs,      0 skips
        12932 UNITS in yuv2packed2,   16384 runs,      0 skips
      3256e949
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1 · 50e672bc
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
              -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      32-bit mul, power8 only.
      
      1.8-2.3x speedup:
      
      rgb24
        18192 UNITS in yuv2packed1,   32767 runs,      1 skips
         9983 UNITS in yuv2packed1,   32760 runs,      8 skips
      bgr24
        18665 UNITS in yuv2packed1,   32766 runs,      2 skips
         9925 UNITS in yuv2packed1,   32763 runs,      5 skips
      rgba
        20239 UNITS in yuv2packed1,   32767 runs,      1 skips
         8794 UNITS in yuv2packed1,   32759 runs,      9 skips
      bgra
        20354 UNITS in yuv2packed1,   32768 runs,      0 skips
         8770 UNITS in yuv2packed1,   32761 runs,      7 skips
      argb
        20185 UNITS in yuv2packed1,   32768 runs,      0 skips
         8761 UNITS in yuv2packed1,   32761 runs,      7 skips
      bgra
        20360 UNITS in yuv2packed1,   32766 runs,      2 skips
         8759 UNITS in yuv2packed1,   32764 runs,      4 skips
      
      This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
      is also heavily inaccurate, while the vsx version has high accuracy.
      50e672bc
  8. 31 Mar, 2019 3 commits
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_X · 7adce3e6
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -
      
      7.2x speedup:
      
      yuyv422
       126354 UNITS in yuv2packedX,   16384 runs,      0 skips
        16383 UNITS in yuv2packedX,   16382 runs,      2 skips
      yvyu422
       117669 UNITS in yuv2packedX,   16384 runs,      0 skips
        16271 UNITS in yuv2packedX,   16379 runs,      5 skips
      uyvy422
       117310 UNITS in yuv2packedX,   16384 runs,      0 skips
        16226 UNITS in yuv2packedX,   16382 runs,      2 skips
      7adce3e6
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_2 · 9a2db4dc
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                      -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                      -cpuflags 0 -v error -
      
      5.1x speedup:
      
      yuyv422
        19339 UNITS in yuv2packed2,   16384 runs,      0 skips
         3718 UNITS in yuv2packed2,   16383 runs,      1 skips
      yvyu422
        19438 UNITS in yuv2packed2,   16384 runs,      0 skips
         3800 UNITS in yuv2packed2,   16380 runs,      4 skips
      uyvy422
        19128 UNITS in yuv2packed2,   16384 runs,      0 skips
         3721 UNITS in yuv2packed2,   16380 runs,      4 skips
      9a2db4dc
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2422_1 · a6a31ca3
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                  -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
                  -cpuflags 0 -v error -
      
      15.3x speedup:
      
      yuyv422
        14513 UNITS in yuv2packed1,   32768 runs,      0 skips
          949 UNITS in yuv2packed1,   32767 runs,      1 skips
      yvyu422
        14516 UNITS in yuv2packed1,   32767 runs,      1 skips
          943 UNITS in yuv2packed1,   32767 runs,      1 skips
      uyvy422
        14530 UNITS in yuv2packed1,   32767 runs,      1 skips
          941 UNITS in yuv2packed1,   32766 runs,      2 skips
      a6a31ca3
  9. 28 Mar, 2019 2 commits
  10. 27 Mar, 2019 2 commits
    • Lauri Kasanen's avatar
      swscale/ppc: VSX-optimize yuv2rgb_full · 681957b8
      Lauri Kasanen authored
      ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
              -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
              -cpuflags 0 -v error -
      
      This uses 32-bit mul, so POWER8 only.
      
      The following output formats get about 4.5x speedup:
      
      rgb24
        39980 UNITS in yuv2packed1,   32768 runs,      0 skips
         8774 UNITS in yuv2packed1,   32768 runs,      0 skips
      bgr24
        40069 UNITS in yuv2packed1,   32768 runs,      0 skips
         8772 UNITS in yuv2packed1,   32766 runs,      2 skips
      rgba
        39759 UNITS in yuv2packed1,   32768 runs,      0 skips
         8681 UNITS in yuv2packed1,   32767 runs,      1 skips
      bgra
        39729 UNITS in yuv2packed1,   32768 runs,      0 skips
         8696 UNITS in yuv2packed1,   32766 runs,      2 skips
      argb
        39766 UNITS in yuv2packed1,   32768 runs,      0 skips
         8672 UNITS in yuv2packed1,   32766 runs,      2 skips
      bgra
        39784 UNITS in yuv2packed1,   32768 runs,      0 skips
         8659 UNITS in yuv2packed1,   32767 runs,      1 skips
      681957b8
    • Lauri Kasanen's avatar
      swscale: Remove duplicated code · 81a4719d
      Lauri Kasanen authored
      In this function, the exact same clamping happens both in the if and unconditionally.
      81a4719d
  11. 20 Mar, 2019 2 commits
  12. 05 Feb, 2019 1 commit
    • Lauri Kasanen's avatar
      libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX · 8522d219
      Lauri Kasanen authored
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
      -s 1920x1728 -f null -vframes 100 -v error -nostats -
      
      9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
      Fate passes, each format tested with an image to video conversion.
      
      Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
      of the 16-bit function. This includes the vec_mulo/mule functions too,
      not just vmuluwm.
      
      With TIMER_REPORT skips disabled:
      yuv420p9le
        12412 UNITS in planarX,  131072 runs,      0 skips
        73136 UNITS in planarX,  131072 runs,      0 skips
      yuv420p9be
        12481 UNITS in planarX,  131072 runs,      0 skips
        73410 UNITS in planarX,  131072 runs,      0 skips
      yuv420p10le
        12322 UNITS in planarX,  131072 runs,      0 skips
        72546 UNITS in planarX,  131072 runs,      0 skips
      yuv420p10be
        12291 UNITS in planarX,  131072 runs,      0 skips
        72935 UNITS in planarX,  131072 runs,      0 skips
      yuv420p12le
        12316 UNITS in planarX,  131072 runs,      0 skips
        72708 UNITS in planarX,  131072 runs,      0 skips
      yuv420p12be
        12319 UNITS in planarX,  131072 runs,      0 skips
        72577 UNITS in planarX,  131072 runs,      0 skips
      yuv420p14le
        12259 UNITS in planarX,  131072 runs,      0 skips
        72516 UNITS in planarX,  131072 runs,      0 skips
      yuv420p14be
        12440 UNITS in planarX,  131072 runs,      0 skips
        72962 UNITS in planarX,  131072 runs,      0 skips
      yuv420p16le
        10548 UNITS in planarX,  131072 runs,      0 skips
        73429 UNITS in planarX,  131072 runs,      0 skips
      yuv420p16be
        10634 UNITS in planarX,  131072 runs,      0 skips
       150959 UNITS in planarX,  131072 runs,      0 skips
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      8522d219
  13. 01 Jan, 2019 1 commit
  14. 26 Dec, 2018 1 commit
    • Lauri Kasanen's avatar
      swscale/output: Altivec-optimize float yuv2plane1 · 8dd9df9e
      Lauri Kasanen authored
      This function wouldn't benefit from VSX instructions, so I put it
      under altivec.
      
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
      -f null -vframes 100 -v error -nostats -
      
      3743 UNITS in planar1,   65495 runs,     41 skips
      
      -cpuflags 0
      
      23511 UNITS in planar1,   65530 runs,      6 skips
      
      grayf32be
      
      4647 UNITS in planar1,   65449 runs,     87 skips
      
      -cpuflags 0
      
      28608 UNITS in planar1,   65530 runs,      6 skips
      
      The native speedup is 6.28133, and the bswapping one 6.15623.
      Fate passes, each format tested with an image to video conversion.
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      8dd9df9e
  15. 14 Dec, 2018 1 commit
  16. 12 Dec, 2018 1 commit
    • Lauri Kasanen's avatar
      swscale/output: VSX-optimize nbps yuv2plane1 · 1046cba2
      Lauri Kasanen authored
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
      -f null -vframes 100 -v error -nostats -
      
      Speedups:
      yuv2plane1_9BE_vsx	11.2042
      yuv2plane1_9LE_vsx	11.156
      yuv2plane1_10BE_vsx	9.89428
      yuv2plane1_10LE_vsx	10.3637
      yuv2plane1_12BE_vsx	9.71923
      yuv2plane1_12LE_vsx	11.0404
      yuv2plane1_14BE_vsx	10.1763
      yuv2plane1_14LE_vsx	11.2728
      
      Fate passes, each format tested with an image to video conversion.
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      1046cba2
  17. 04 Dec, 2018 1 commit
  18. 26 Nov, 2018 1 commit
    • Lauri Kasanen's avatar
      swscale/output: Altivec-optimize yuv2plane1_8 · 46c5693e
      Lauri Kasanen authored
      ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
      -f null -vframes 100 -v error -nostats -
      
      1158 UNITS in planar1,   65528 runs,      8 skips
      
      -cpuflags 0
      
      19082 UNITS in planar1,   65533 runs,      3 skips
      
      16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
      takes as many cycles as the x86 SSE2 version, yikes it's fast.
      
      Note that this function uses VSX instructions, but is not marked so.
      This is because several existing functions also make that mistake.
      I'll submit a patch moving them once this is reviewed.
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      46c5693e
  19. 24 Nov, 2018 1 commit
  20. 06 Nov, 2018 1 commit
  21. 01 Nov, 2018 2 commits
  22. 24 Oct, 2018 3 commits
  23. 18 Oct, 2018 2 commits