• Ganesh Ajjanagadde's avatar
    avcodec/jpeg2000: replace naive pow call with smarter exp2fi · 42868ca5
    Ganesh Ajjanagadde authored
    pow is a very wasteful function for this purpose. A low hanging fruit
    would be simply to replace with exp2f, and that does yield some speedup.
    However, there are 2 drawbacks of this:
    1. It does not exploit the integer nature of the argument.
    2. (minor) Some platforms lack a proper exp2f routine, making benefits available
    only to non broken libm.
    3. exp2f does not solve the same issue that plagues pow, namely terrible
    worst case performance. This is a fundamental issue known as the
    "table-maker's dilemma" recognized by Prof. Kahan himself and
    subsequently elaborated and researched by many others. All this is clear from benchmarks below.
    
    This exploits the IEEE-754 format to get very good performance even in
    the worst case for integer powers of 2. This solves all the issues noted
    above. Function tested with clang usan over [-1000, 1000] (beyond range of
    relevance for this, which is [-255, 255]), patch itself with FATE.
    
    Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
    the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
    Low number of runs also given to prove the point about worst case:
    
    pow:
     216270 decicycles in pow,       1 runs,      0 skips
     110175 decicycles in pow,       2 runs,      0 skips
      56085 decicycles in pow,       4 runs,      0 skips
      29013 decicycles in pow,       8 runs,      0 skips
      15472 decicycles in pow,      16 runs,      0 skips
       8689 decicycles in pow,      32 runs,      0 skips
       5295 decicycles in pow,      64 runs,      0 skips
       3599 decicycles in pow,     128 runs,      0 skips
       2748 decicycles in pow,     256 runs,      0 skips
       2304 decicycles in pow,     511 runs,      1 skips
       2072 decicycles in pow,    1022 runs,      2 skips
       1963 decicycles in pow,    2044 runs,      4 skips
       1894 decicycles in pow,    4091 runs,      5 skips
       1860 decicycles in pow,    8184 runs,      8 skips
    
    exp2f:
     134140 decicycles in pow,       1 runs,      0 skips
      68110 decicycles in pow,       2 runs,      0 skips
      34530 decicycles in pow,       4 runs,      0 skips
      17677 decicycles in pow,       8 runs,      0 skips
       9175 decicycles in pow,      16 runs,      0 skips
       4931 decicycles in pow,      32 runs,      0 skips
       2808 decicycles in pow,      64 runs,      0 skips
       1747 decicycles in pow,     128 runs,      0 skips
       1208 decicycles in pow,     256 runs,      0 skips
        952 decicycles in pow,     512 runs,      0 skips
        822 decicycles in pow,    1024 runs,      0 skips
        765 decicycles in pow,    2047 runs,      1 skips
        722 decicycles in pow,    4094 runs,      2 skips
        693 decicycles in pow,    8190 runs,      2 skips
    
    exp2fi:
       2740 decicycles in pow,       1 runs,      0 skips
       1530 decicycles in pow,       2 runs,      0 skips
        955 decicycles in pow,       4 runs,      0 skips
        622 decicycles in pow,       8 runs,      0 skips
        477 decicycles in pow,      16 runs,      0 skips
        368 decicycles in pow,      32 runs,      0 skips
        317 decicycles in pow,      64 runs,      0 skips
        291 decicycles in pow,     128 runs,      0 skips
        277 decicycles in pow,     256 runs,      0 skips
        268 decicycles in pow,     512 runs,      0 skips
        265 decicycles in pow,    1024 runs,      0 skips
        263 decicycles in pow,    2048 runs,      0 skips
        263 decicycles in pow,    4095 runs,      1 skips
        260 decicycles in pow,    8191 runs,      1 skips
    Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
    Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
    42868ca5
jpeg2000.c 23.1 KB