1. 28 Apr, 2012 4 commits
    • Roland Scheidegger's avatar
      h264: new assembly version of get_cabac for x86_64 with PIC · 82c71913
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register.
      There is a surprisingly large performance improvement over the c version (more
      so than the generated assembly seems to suggest) just in get_cabac, I measured
      roughly 40% faster for get_cabac on a K8. However, overall the difference is
      not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
      Hopefully it still compiles on x86 32bit...
      Now that only one table is used, there's some chance even darwin as compiles
      this (apparently the label arithmetic used previously doesn't work if it
      involves symbols defined in a different file, thanks to Ronald S. Bultje for
      helping me with this).
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      82c71913
    • Roland Scheidegger's avatar
      h264: use one table instead of several for cabac functions · 7f668cd2
      Roland Scheidegger authored
      The reason is this is easier for PIC code (in particular on darwin...).
      Keep the old names as pointers (static in cabac_functions.h so gcc
      knows these are just immediate offsets) so the c code can nicely stay the same
      (alternatively could use offsets directly in the functions needing the
      tables). This should produce the same code as before with non-pic and better
      code (confirmed) with pic.
      
      The assembly uses the new table but still won't work for PIC case.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      7f668cd2
    • Roland Scheidegger's avatar
      h264: new assembly version of get_cabac for x86_64 with PIC · 9b9df1cd
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register. get_cabac() gets about 40% faster, for
      an overall speedup of about 5%.
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      9b9df1cd
    • Roland Scheidegger's avatar
      h264: use one table instead of several for cabac functions · 14e9ffc1
      Roland Scheidegger authored
      The reason is this is easier for PIC code (in particular on darwin...).
      Keep the old names as pointers (static in cabac_functions.h so gcc
      knows these are just immediate offsets) so the c code can nicely stay the same
      (alternatively could use offsets directly in the functions needing the
      tables). This should produce the same code as before with non-pic and better
      code (confirmed) with pic.
      
      The assembly uses the new table but still won't work for PIC case.
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      14e9ffc1
  2. 21 Apr, 2012 1 commit
  3. 20 Apr, 2012 1 commit
    • Roland Scheidegger's avatar
      h264: assembly version of get_cabac for x86_64 with PIC (v4) · a812b599
      Roland Scheidegger authored
      This adds a hand-optimized assembly version for get_cabac much like the
      existing one, but it works if the table offsets are RIP-relative.
      Compared to the non-RIP-relative version this adds 2 lea instructions
      and it needs one extra register.
      There is a surprisingly large performance improvement over the c version (more
      so than the generated assembly seems to suggest) just in get_cabac, I measured
      roughly 40% faster for get_cabac on a K8. However, overall the difference is
      not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
      Hopefully it still compiles on x86 32bit...
      v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
      for every table, and got rid of unnecessary @GOTPCREL.
      v3: apply similar fixes to the the decode_significance functions, and use
      same macro arguments for non-pic case.
      v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
      the c code to be faster otherwise since both cmov and sbb suck hard on a
      Prescott, even can't construct the mask with a 64bit shift as that's just as
      terrible - it's quite difficult to find usable instructions on that chip...).
      This is tested to work but not on a P4, in theory it _should_ be fast there.
      Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
      a812b599
  4. 28 Mar, 2012 3 commits
  5. 09 Jan, 2012 1 commit
  6. 06 Jan, 2012 1 commit
  7. 27 Dec, 2011 1 commit
    • Martin Storsjö's avatar
      x86: Fix constraints for decode_significance*_x86 · 676a9ee1
      Martin Storsjö authored
      Originally, prior to 8742a4ff, the caller code was compiled
      within this condition:
      
      ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
      
      Since HAVE_7REGS is defined as
      (ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
      the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
      to HAVE_7REGS (for 32 bit at least). The correct simplification
      of the original condition thus is HAVE_7REGS, not
      HAVE_EBX_AVAILABLE.
      
      This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
      and HAVE_EBX_AVAILABLE = 1.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      676a9ee1
  8. 21 Dec, 2011 1 commit
  9. 11 Dec, 2011 1 commit
  10. 08 Nov, 2011 1 commit
  11. 29 Jul, 2011 1 commit
  12. 28 Jul, 2011 2 commits
  13. 29 Jun, 2011 1 commit
  14. 22 Jun, 2011 1 commit
  15. 20 Jun, 2011 9 commits
  16. 14 Jun, 2011 1 commit
  17. 13 Jun, 2011 2 commits
  18. 19 Mar, 2011 1 commit
  19. 20 Apr, 2010 1 commit
  20. 01 Feb, 2009 1 commit
  21. 13 Jan, 2009 1 commit
  22. 22 Dec, 2008 1 commit
  23. 16 Oct, 2008 1 commit
    • Diego Pettenò's avatar
      Convert asm keyword into __asm__. · be449fca
      Diego Pettenò authored
      Neither the asm() nor the __asm__() keyword is part of the C99
      standard, but while GCC accepts the former in C89 syntax, it is not
      accepted in C99 unless GNU extensions are turned on (with -fasm). The
      latter form is accepted in any syntax as an extension (without
      requiring further command-line options).
      
      Sun Studio C99 compiler also does not accept asm() while accepting
      __asm__(), albeit reporting warnings that it's not valid C99 syntax.
      
      Originally committed as revision 15627 to svn://svn.ffmpeg.org/ffmpeg/trunk
      be449fca
  24. 31 Aug, 2008 1 commit
  25. 09 May, 2008 1 commit