1. 28 Jul, 2012 1 commit
  2. 25 Jul, 2012 2 commits
    • Ronald S. Bultje's avatar
      x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5
      Ronald S. Bultje authored
      This allows compiling with compilers that don't support gcc-style
      inline assembly.
      Signed-off-by: 's avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      79195ce5
    • Yang Wang's avatar
      dsputil_mmx: fix incorrect assembly code · 845e92fd
      Yang Wang authored
      In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
      In the first block (in the unrolled loop), the instructions
      "movq 8%3, %%mm1 \n\t", and so forth, have problems.
      
      From above instruction, it is clear what the programmer wants: a load from
      p + 8. But this assembly code doesn’t guarantee that. It only works if the
      compiler puts p in a register to produce an instruction like this:
      "movq 8(%edi), %mm1". During compiler optimization, it is possible that the
      compiler will be able to constant propagate into p. Suppose p = &x[10000].
      Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
      becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
      
      This will cause a segmentation fault.
      
      This error was fixed in the second block of the assembly code, but not in
      the unrolled loop.
      
      How to reproduce:
          This error is exposed when we build using Intel C++ Compiler, with
          IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      Signed-off-by: 's avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      845e92fd
  3. 19 Jul, 2012 1 commit
  4. 18 Jul, 2012 1 commit
  5. 23 Jun, 2012 2 commits
  6. 08 Jun, 2012 1 commit
  7. 21 May, 2012 1 commit
  8. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  9. 28 Apr, 2012 1 commit
  10. 21 Apr, 2012 2 commits
  11. 04 Apr, 2012 1 commit
  12. 25 Mar, 2012 4 commits
  13. 05 Mar, 2012 1 commit
  14. 15 Feb, 2012 1 commit
  15. 30 Jan, 2012 1 commit
    • Christophe Gisquet's avatar
      x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf · 6b039003
      Christophe Gisquet authored
      While pshufb allows emulating bswap on XMM registers for SSSE3, more
      shuffling is needed for SSE2. Alignment is critical, so specific codepaths
      are provided for this case.
      
      For the huffyuv sequence "angels_480-huffyuvcompress.avi":
      C (using bswap instruction): ~ 55k cycles
      SSE2:                        ~ 40k cycles
      SSSE3 using unaligned loads: ~ 35k cycles
      SSSE3 using aligned loads:   ~ 30k cycles
      Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
      6b039003
  16. 29 Jan, 2012 1 commit
  17. 25 Jan, 2012 1 commit
  18. 14 Dec, 2011 1 commit
  19. 22 Nov, 2011 1 commit
  20. 11 Nov, 2011 1 commit
  21. 07 Nov, 2011 1 commit
  22. 26 Oct, 2011 1 commit
  23. 11 Oct, 2011 1 commit
  24. 15 Aug, 2011 1 commit
  25. 11 Aug, 2011 1 commit
  26. 29 Jul, 2011 1 commit
  27. 21 Jul, 2011 1 commit
  28. 20 Jul, 2011 1 commit
  29. 18 Jul, 2011 1 commit
  30. 10 Jul, 2011 1 commit
  31. 08 Jul, 2011 1 commit
  32. 04 Jul, 2011 2 commits
  33. 03 Jul, 2011 1 commit