1. 22 Mar, 2014 1 commit
  2. 29 Aug, 2013 1 commit
  3. 28 Aug, 2013 1 commit
  4. 17 Jul, 2013 1 commit
  5. 12 May, 2013 1 commit
  6. 07 May, 2013 1 commit
  7. 12 Mar, 2013 1 commit
  8. 06 Feb, 2013 1 commit
  9. 05 Feb, 2013 1 commit
  10. 13 Nov, 2012 1 commit
  11. 08 Oct, 2012 1 commit
  12. 08 Sep, 2012 1 commit
  13. 30 Aug, 2012 2 commits
  14. 15 Aug, 2012 1 commit
  15. 03 Aug, 2012 1 commit
    • Diego Biurrun's avatar
      x86: build: replace mmx2 by mmxext · 239fdf1b
      Diego Biurrun authored
      Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
      So switching to a consistent naming scheme beforehand is sensible.
      The name "mmxext" is more official and widespread and also the name
      of the CPU flag, as reported e.g. by the Linux kernel.
      239fdf1b
  16. 25 Jul, 2012 1 commit
  17. 22 Jun, 2012 1 commit
  18. 10 Jun, 2012 1 commit
  19. 15 May, 2012 1 commit
  20. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  21. 10 Apr, 2012 1 commit
  22. 20 Feb, 2012 1 commit
  23. 30 Jan, 2012 3 commits
    • Christophe Gisquet's avatar
      rv40: x86 SIMD for biweight · e5c9de2a
      Christophe Gisquet authored
      Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
      multiples of 512 (which is often the case when the values round up nicely).
      
      *_TIMER report for the 16x16 and 8x8 cases:
      C:
      9015 decicycles in 16, 524257 runs, 31 skips
      2656 decicycles in 8, 524271 runs, 17 skips
      MMX:
      4156 decicycles in 16, 262090 runs, 54 skips
      1206 decicycles in 8, 262131 runs, 13 skips
      MMX on fast-path:
      2760 decicycles in 16, 524222 runs, 66 skips
      995 decicycles in 8, 524252 runs, 36 skips
      SSE2:
      2163 decicycles in 16, 262131 runs, 13 skips
      832 decicycles in 8, 262137 runs, 7 skips
      SSE2 with fast path:
      1783 decicycles in 16, 524276 runs, 12 skips
      711 decicycles in 8, 524283 runs, 5 skips
      SSSE3:
      2117 decicycles in 16, 262136 runs, 8 skips
      814 decicycles in 8, 262143 runs, 1 skips
      SSSE3 with fast path:
      1315 decicycles in 16, 524285 runs, 3 skips
      578 decicycles in 8, 524286 runs, 2 skips
      
      This means around a 4% speedup for some sequences.
      Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
      e5c9de2a
    • Diego Biurrun's avatar
      91bafb52
    • Diego Biurrun's avatar
      x86: Place mm_flags variable declaration below the appropriate #ifdef. · c30b1983
      Diego Biurrun authored
      This fixes some unused variable warnings with YASM disabled.
      c30b1983
  24. 11 Aug, 2011 1 commit