1. 15 May, 2020 3 commits
    • Martin Storsjö's avatar
      swscale: aarch64: Add a NEON implementation of interleaveBytes · e0604d50
      Martin Storsjö authored
      This allows speeding up format conversions from yuv420 to nv12.
      
                                   Cortex A53      A72      A73
      interleave_bytes_c:             86077.5  51433.0  66972.0
      interleave_bytes_neon:          19701.7  23019.2  15859.2
      interleave_bytes_aligned_c:     86603.0  52017.2  67484.2
      interleave_bytes_aligned_neon:   9061.0   7623.0   6309.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      e0604d50
    • Josh de Kock's avatar
      swscale: arm: fix NEON hscale init · 70b14cc8
      Josh de Kock authored
      The NEON hscale function only supports X8 filter sizes and should only
      be selected when these are being used. At the moment filterAlign is
      set to 8 but in the future when extra NEON assembly for specific sizes is
      added they will need to have checks here too.
      
      The immediate usecase for this change is making the hscale checkasm
      test easier and without NEON specific edge-cases (x86 already has these
      guards).
      
      This applies the same fix from 718c8f9a
      on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale
      there.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      70b14cc8
    • Josh de Kock's avatar
      swscale: fix NEON hscale init · 718c8f9a
      Josh de Kock authored
      The NEON hscale function only supports X8 filter sizes and should only
      be selected when these are being used. At the moment filterAlign is
      set to 8 but in the future when extra NEON assembly for specific sizes is
      added they will need to have checks here too.
      
      The immediate usecase for this change is making the hscale checkasm
      test easier and without NEON specific edge-cases (x86 already has these
      guards).
      Signed-off-by: 's avatarJosh de Kock <josh@itanimul.li>
      718c8f9a
  2. 11 May, 2020 1 commit
  3. 05 May, 2020 2 commits
  4. 27 Apr, 2020 1 commit
    • Andreas Rheinhardt's avatar
      swscale/vscale: Increase type strictness · 2fae0009
      Andreas Rheinhardt authored
      libswscale/vscale.c makes extensive use of function pointers and in
      doing so it converts these function pointers to and from a pointer to
      void. Yet this is actually against the C standard:
      C90 only guarantees that one can convert a pointer to any incomplete
      type or object type to void* and back with the result comparing equal
      to the original which makes pointers to void generic pointers to
      incomplete or object type. Yet C90 lacks a generic function pointer
      type.
      C99 additionally guarantees that a pointer to a function of one type may
      be converted to a pointer to a function of another type with the result
      and the original comparing equal when converting back.
      This makes any function pointer type a generic function pointer type.
      Yet even this does not make pointers to void generic function pointers.
      
      Both GCC and Clang emit warnings for this when in pedantic mode.
      
      This commit fixes this by using a union that can hold one member of any
      of the required function pointer types to store the function pointer.
      This works even for C90.
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@gmail.com>
      2fae0009
  5. 21 Apr, 2020 1 commit
  6. 19 Apr, 2020 1 commit
  7. 12 Apr, 2020 1 commit
  8. 04 Apr, 2020 2 commits
  9. 02 Apr, 2020 1 commit
  10. 11 Mar, 2020 1 commit
  11. 26 Feb, 2020 1 commit
  12. 24 Feb, 2020 1 commit
  13. 10 Feb, 2020 1 commit
    • Ting Fu's avatar
      libswscale/x86/yuv2rgb: add ssse3 version · fc6a5883
      Ting Fu authored
      Tested using this command:
      /ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
      -vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null
      
      The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
      Signed-off-by: 's avatarTing Fu <ting.fu@intel.com>
      fc6a5883
  14. 09 Feb, 2020 1 commit
  15. 05 Feb, 2020 1 commit
  16. 22 Jan, 2020 3 commits
  17. 06 Jan, 2020 1 commit
  18. 04 Jan, 2020 1 commit
    • Sebastian Pop's avatar
      swscale/aarch64: use multiply accumulate and shift-right narrow · c3a17fff
      Sebastian Pop authored
      This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
      horizontal adds by using fused multiply adds. The patch also uses ld1r to load
      one element and replicate it across all lanes of the vector. The patch also
      improves the clipping code by removing the shift right instructions and
      performing the shift with the shift-right narrow instructions.
      
      I see 8% difference on an m6g instance with neoverse-n1 CPUs:
      $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
      before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
      after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818
      
      Tested with `make check` on aarch64-linux.
      Signed-off-by: 's avatarSebastian Pop <spop@amazon.com>
      Reviewed-by: 's avatarClément Bœsch <u@pkh.me>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      c3a17fff
  19. 31 Dec, 2019 1 commit
  20. 17 Dec, 2019 1 commit
    • Sebastian Pop's avatar
      swscale/aarch64: use multiply accumulate and increase vector factor to 4 · bd831912
      Sebastian Pop authored
      This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
      and bumps the vectorization factor from 2 to 4.
      The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:
      
      $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
      before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
      after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146
      
      The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
      $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
      before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
      after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971
      
      Tested with `make check` on aarch64-linux.
      Signed-off-by: 's avatarSebastian Pop <spop@amazon.com>
      Reviewed-by: 's avatarJean-Baptiste Kempf <jb@videolan.org>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      bd831912
  21. 10 Dec, 2019 1 commit
  22. 06 Dec, 2019 1 commit
  23. 01 Nov, 2019 1 commit
  24. 16 Oct, 2019 3 commits
  25. 04 Oct, 2019 2 commits
    • Daniel Kolesa's avatar
      swscale: Fix AltiVec/VSX build with recent GCC · e6625ca4
      Daniel Kolesa authored
      The argument to vec_splat_u16 must be a literal. By making the
      function always inline and marking the arguments const, gcc can
      turn those into literals, and avoid build errors like:
      
      swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal
      
      Fixes #7861.
      Signed-off-by: 's avatarDaniel Kolesa <daniel@octaforge.org>
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      e6625ca4
    • Daniel Kolesa's avatar
      swscale: Replace illegal vector keyword usage in altivec code · 1bdb47b7
      Daniel Kolesa authored
      While this technically compiles in current ffmpeg, this is only
      because ffmpeg is compiled in strict ISO C mode, which disables
      the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
      replaced with a macro inside altivec.h, which defines vector to
      be actually __vector, which accepts random types.
      
      Normally, the vector keyword should be used only with plain
      scalar non-typedef types, such as unsigned int. But we have the
      vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
      in util_altivec.h in libavutil.
      
      This is also consistent with other AltiVec/VSX code elsewhere in
      the tree.
      
      Fixes #7861.
      Signed-off-by: 's avatarDaniel Kolesa <daniel@octaforge.org>
      Signed-off-by: 's avatarLauri Kasanen <cand@gmx.com>
      1bdb47b7
  26. 28 Sep, 2019 2 commits
  27. 27 Sep, 2019 1 commit
  28. 26 Sep, 2019 1 commit
  29. 09 Sep, 2019 1 commit
  30. 06 Sep, 2019 1 commit