1. 13 Aug, 2012 1 commit
    • Mans Rullgard's avatar
      x86: cabac: allow building with suncc · 8ec0204e
      Mans Rullgard authored
      This fixes two issues preventing suncc from building this code.
      
      The undocumented 'a' operand modifier, causing gcc to omit a $ in
      front of immediate operands (as required in addresses), is not
      supported by suncc.  Luckily, the also undocumented 'c' modifer
      has the same effect and is supported.
      
      On some asm statements with a large number of operands, suncc for no
      obvious reason fails to correctly substitute some of the operands.
      Fortunately, some of the operands in these statements are plain
      numbers which can be inserted directly into the code block instead
      of passed as operands.
      
      With these changes, the code builds correctly with both gcc and
      suncc.
      Signed-off-by: 's avatarMans Rullgard <mans@mansr.com>
      8ec0204e
  2. 25 Jun, 2012 1 commit
  3. 23 Jun, 2012 1 commit
  4. 28 Apr, 2012 4 commits
    • Roland Scheidegger's avatar
      h264: new assembly version of get_cabac for x86_64 with PIC · 82c71913
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register.
      There is a surprisingly large performance improvement over the c version (more
      so than the generated assembly seems to suggest) just in get_cabac, I measured
      roughly 40% faster for get_cabac on a K8. However, overall the difference is
      not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
      Hopefully it still compiles on x86 32bit...
      Now that only one table is used, there's some chance even darwin as compiles
      this (apparently the label arithmetic used previously doesn't work if it
      involves symbols defined in a different file, thanks to Ronald S. Bultje for
      helping me with this).
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      82c71913
    • Roland Scheidegger's avatar
      h264: use one table instead of several for cabac functions · 7f668cd2
      Roland Scheidegger authored
      The reason is this is easier for PIC code (in particular on darwin...).
      Keep the old names as pointers (static in cabac_functions.h so gcc
      knows these are just immediate offsets) so the c code can nicely stay the same
      (alternatively could use offsets directly in the functions needing the
      tables). This should produce the same code as before with non-pic and better
      code (confirmed) with pic.
      
      The assembly uses the new table but still won't work for PIC case.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      7f668cd2
    • Roland Scheidegger's avatar
      h264: new assembly version of get_cabac for x86_64 with PIC · 9b9df1cd
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register. get_cabac() gets about 40% faster, for
      an overall speedup of about 5%.
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      9b9df1cd
    • Roland Scheidegger's avatar
      h264: use one table instead of several for cabac functions · 14e9ffc1
      Roland Scheidegger authored
      The reason is this is easier for PIC code (in particular on darwin...).
      Keep the old names as pointers (static in cabac_functions.h so gcc
      knows these are just immediate offsets) so the c code can nicely stay the same
      (alternatively could use offsets directly in the functions needing the
      tables). This should produce the same code as before with non-pic and better
      code (confirmed) with pic.
      
      The assembly uses the new table but still won't work for PIC case.
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      14e9ffc1
  5. 21 Apr, 2012 1 commit
  6. 20 Apr, 2012 1 commit
    • Roland Scheidegger's avatar
      h264: assembly version of get_cabac for x86_64 with PIC (v4) · a812b599
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register.
      There is a surprisingly large performance improvement over the c version (more
      so than the generated assembly seems to suggest) just in get_cabac, I measured
      roughly 40% faster for get_cabac on a K8. However, overall the difference is
      not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
      Hopefully it still compiles on x86 32bit...
      v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
      for every table, and got rid of unnecessary @GOTPCREL.
      v3: apply similar fixes to the the decode_significance functions, and use
      same macro arguments for non-pic case.
      v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
      the c code to be faster otherwise since both cmov and sbb suck hard on a
      Prescott, even can't construct the mask with a 64bit shift as that's just as
      terrible - it's quite difficult to find usable instructions on that chip...).
      This is tested to work but not on a P4, in theory it _should_ be fast there.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      a812b599
  7. 28 Mar, 2012 3 commits
  8. 09 Jan, 2012 1 commit
  9. 06 Jan, 2012 1 commit
  10. 27 Dec, 2011 1 commit
    • Martin Storsjö's avatar
      x86: Fix constraints for decode_significance*_x86 · 676a9ee1
      Martin Storsjö authored
      Originally, prior to 8742a4ff, the caller code was compiled
      within this condition:
      
      ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
      
      Since HAVE_7REGS is defined as
      (ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
      the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
      to HAVE_7REGS (for 32 bit at least). The correct simplification
      of the original condition thus is HAVE_7REGS, not
      HAVE_EBX_AVAILABLE.
      
      This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
      and HAVE_EBX_AVAILABLE = 1.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      676a9ee1
  11. 21 Dec, 2011 1 commit
  12. 11 Dec, 2011 1 commit
  13. 08 Nov, 2011 1 commit
  14. 29 Jul, 2011 1 commit
  15. 28 Jul, 2011 2 commits
  16. 29 Jun, 2011 1 commit
  17. 22 Jun, 2011 1 commit
  18. 20 Jun, 2011 9 commits
  19. 14 Jun, 2011 1 commit
  20. 13 Jun, 2011 2 commits
  21. 19 Mar, 2011 1 commit
  22. 20 Apr, 2010 1 commit
  23. 01 Feb, 2009 1 commit
  24. 13 Jan, 2009 1 commit
  25. 22 Dec, 2008 1 commit