1. 24 Jan, 2017 1 commit
    • Martin Storsjö's avatar
      arm: Add NEON optimizations for 10 and 12 bit vp9 MC · a4d4bad7
      Martin Storsjö authored
      This work is sponsored by, and copyright, Google.
      
      The plain pixel put/copy functions are used from the 8 bit version,
      for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
      copy128 is added.
      
      Compared with the 8 bit version, the filters can no longer use the
      trick to accumulate in 16 bit with only saturation at the end, but now
      the accumulators need to be 32 bit. This avoids the need to keep track
      of which filter index is the largest though, reducing the size of the
      executable code for these filters.
      
      For the horizontal filters, we only do 4 or 8 pixels wide in parallel
      (while doing two rows at a time), since we don't have enough register
      space to filter 16 pixels wide.
      
      For the vertical filters, we still do 4 and 8 pixels in parallel just
      as in the 8 bit case, but we need to store the output after every 2
      rows instead of after every 4 rows.
      
      Examples of relative speedup compared to the C version, from checkasm:
                                     Cortex    A7     A8     A9    A53
      vp9_avg4_10bpp_neon:                   2.25   2.44   3.05   2.16
      vp9_avg8_10bpp_neon:                   3.66   8.48   3.86   3.50
      vp9_avg16_10bpp_neon:                  3.39   8.26   3.37   2.72
      vp9_avg32_10bpp_neon:                  4.03  10.20   4.07   3.42
      vp9_avg64_10bpp_neon:                  4.15  10.01   4.13   3.70
      vp9_avg_8tap_smooth_4h_10bpp_neon:     3.38   6.22   3.41   4.75
      vp9_avg_8tap_smooth_4hv_10bpp_neon:    3.89   6.39   4.30   5.32
      vp9_avg_8tap_smooth_4v_10bpp_neon:     5.32   9.73   6.34   7.31
      vp9_avg_8tap_smooth_8h_10bpp_neon:     4.45   9.40   4.68   6.87
      vp9_avg_8tap_smooth_8hv_10bpp_neon:    4.64   8.91   5.44   6.47
      vp9_avg_8tap_smooth_8v_10bpp_neon:     6.44  13.42   8.68   8.79
      vp9_avg_8tap_smooth_64h_10bpp_neon:    4.66   9.02   4.84   7.71
      vp9_avg_8tap_smooth_64hv_10bpp_neon:   4.61   9.14   4.92   7.10
      vp9_avg_8tap_smooth_64v_10bpp_neon:    6.90  14.13   9.57  10.41
      vp9_put4_10bpp_neon:                   1.33   1.46   2.09   1.33
      vp9_put8_10bpp_neon:                   1.57   3.42   1.83   1.84
      vp9_put16_10bpp_neon:                  1.55   4.78   2.17   1.89
      vp9_put32_10bpp_neon:                  2.06   5.35   2.14   2.30
      vp9_put64_10bpp_neon:                  3.00   2.41   1.95   1.66
      vp9_put_8tap_smooth_4h_10bpp_neon:     3.19   5.81   3.31   4.63
      vp9_put_8tap_smooth_4hv_10bpp_neon:    3.86   6.22   4.32   5.21
      vp9_put_8tap_smooth_4v_10bpp_neon:     5.40   9.77   6.08   7.21
      vp9_put_8tap_smooth_8h_10bpp_neon:     4.22   8.41   4.46   6.63
      vp9_put_8tap_smooth_8hv_10bpp_neon:    4.56   8.51   5.39   6.25
      vp9_put_8tap_smooth_8v_10bpp_neon:     6.60  12.43   8.17   8.89
      vp9_put_8tap_smooth_64h_10bpp_neon:    4.41   8.59   4.54   7.49
      vp9_put_8tap_smooth_64hv_10bpp_neon:   4.43   8.58   5.34   6.63
      vp9_put_8tap_smooth_64v_10bpp_neon:    7.26  13.92   9.27  10.92
      
      For the larger 8tap filters, the speedup vs C code is around 4-14x.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      a4d4bad7
  2. 10 Mar, 2014 1 commit
    • Anton Khirnov's avatar
      Work around broken floating point limits on some systems. · e854b8f9
      Anton Khirnov authored
      The values of {FLT,DBL}_{MAX,MIN} macros on some systems (older musl
      libc, some BSD flavours) are not exactly representable, i.e.
      (double)DBL_MAX == DBL_MAX is false
      This violates (at least some interpretations of) the C99 standard and
      breaks code (e.g. in vf_fps) like
      double f = DBL_MAX;
      [...]
      if (f == DBL_MAX) { // f has not been changed yet
          [....]
      }
      e854b8f9
  3. 07 Jan, 2014 1 commit
  4. 06 Jan, 2014 1 commit
  5. 21 Nov, 2013 1 commit
  6. 04 Aug, 2013 1 commit
  7. 31 Mar, 2011 1 commit
  8. 20 Mar, 2011 1 commit
  9. 19 Mar, 2011 1 commit
  10. 09 Sep, 2010 1 commit
  11. 08 Sep, 2010 1 commit
  12. 22 Dec, 2008 1 commit
  13. 12 Aug, 2008 1 commit
  14. 09 May, 2008 1 commit
  15. 08 May, 2008 1 commit
  16. 16 May, 2007 1 commit
  17. 07 Oct, 2006 1 commit
  18. 18 Aug, 2006 1 commit
  19. 08 Mar, 2006 1 commit