1. 06 Jul, 2012 1 commit
  2. 05 Jul, 2012 4 commits
  3. 04 Jul, 2012 2 commits
  4. 30 Jun, 2012 2 commits
  5. 29 Jun, 2012 1 commit
    • Mans Rullgard's avatar
      x86: vc1: fix and enable optimised loop filter · f2fd1678
      Mans Rullgard authored
      The problem is that the ssse3 psign instruction does the wrong
      thing here.  Commit ea60dfe2 incorrectly removed a macro emulating
      this instruction for pre-ssse3 code.  However, the emulation is
      incorrect, and the code relies on the behaviour of the macro.
      Specifically, the psign sets destination elements to zero where
      the corresponding source element is zero, whereas the emulation
      only negates destination elements where the source is negative.
      
      Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
      which is why the original VC-1 code had an additional right shift
      when using it.  Since the psign instruction cannot be used here,
      skip all the macro hell and use the working instruction sequence
      directly.
      
      None of this was noticed due a stray return statement in
      ff_vc1dsp_init_mmx() which meant that only the mmx version of the
      loop filter was ever used (before being removed in ea60dfe2).
      Signed-off-by: 's avatarMans Rullgard <mans@mansr.com>
      f2fd1678
  6. 27 Jun, 2012 1 commit
  7. 25 Jun, 2012 4 commits
  8. 23 Jun, 2012 4 commits
  9. 22 Jun, 2012 1 commit
  10. 17 Jun, 2012 1 commit
  11. 08 Jun, 2012 1 commit
  12. 29 May, 2012 1 commit
  13. 22 May, 2012 1 commit
  14. 21 May, 2012 1 commit
  15. 15 May, 2012 2 commits
  16. 14 May, 2012 1 commit
  17. 12 May, 2012 1 commit
  18. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  19. 02 May, 2012 1 commit
  20. 28 Apr, 2012 5 commits
  21. 21 Apr, 2012 2 commits
  22. 16 Apr, 2012 1 commit
  23. 13 Apr, 2012 1 commit
    • Ronald S. Bultje's avatar
      dsputil: fix optimized emu_edge function on Win64. · b089ca87
      Ronald S. Bultje authored
      Recent register allocation changes (x86inc.asm update) changed the
      register order and thus opcodes for the inner loops. One of them became
      >128bytes, which confuses other parts of this function where it jumps
      to fixed-offset positions to extend the edge by fixed amounts. A simple
      register change fixes this.
      b089ca87