Commit 42868ca5 authored by Ganesh Ajjanagadde's avatar Ganesh Ajjanagadde

avcodec/jpeg2000: replace naive pow call with smarter exp2fi

pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.

This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.

Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:

pow:
 216270 decicycles in pow,       1 runs,      0 skips
 110175 decicycles in pow,       2 runs,      0 skips
  56085 decicycles in pow,       4 runs,      0 skips
  29013 decicycles in pow,       8 runs,      0 skips
  15472 decicycles in pow,      16 runs,      0 skips
   8689 decicycles in pow,      32 runs,      0 skips
   5295 decicycles in pow,      64 runs,      0 skips
   3599 decicycles in pow,     128 runs,      0 skips
   2748 decicycles in pow,     256 runs,      0 skips
   2304 decicycles in pow,     511 runs,      1 skips
   2072 decicycles in pow,    1022 runs,      2 skips
   1963 decicycles in pow,    2044 runs,      4 skips
   1894 decicycles in pow,    4091 runs,      5 skips
   1860 decicycles in pow,    8184 runs,      8 skips

exp2f:
 134140 decicycles in pow,       1 runs,      0 skips
  68110 decicycles in pow,       2 runs,      0 skips
  34530 decicycles in pow,       4 runs,      0 skips
  17677 decicycles in pow,       8 runs,      0 skips
   9175 decicycles in pow,      16 runs,      0 skips
   4931 decicycles in pow,      32 runs,      0 skips
   2808 decicycles in pow,      64 runs,      0 skips
   1747 decicycles in pow,     128 runs,      0 skips
   1208 decicycles in pow,     256 runs,      0 skips
    952 decicycles in pow,     512 runs,      0 skips
    822 decicycles in pow,    1024 runs,      0 skips
    765 decicycles in pow,    2047 runs,      1 skips
    722 decicycles in pow,    4094 runs,      2 skips
    693 decicycles in pow,    8190 runs,      2 skips

exp2fi:
   2740 decicycles in pow,       1 runs,      0 skips
   1530 decicycles in pow,       2 runs,      0 skips
    955 decicycles in pow,       4 runs,      0 skips
    622 decicycles in pow,       8 runs,      0 skips
    477 decicycles in pow,      16 runs,      0 skips
    368 decicycles in pow,      32 runs,      0 skips
    317 decicycles in pow,      64 runs,      0 skips
    291 decicycles in pow,     128 runs,      0 skips
    277 decicycles in pow,     256 runs,      0 skips
    268 decicycles in pow,     512 runs,      0 skips
    265 decicycles in pow,    1024 runs,      0 skips
    263 decicycles in pow,    2048 runs,      0 skips
    263 decicycles in pow,    4095 runs,      1 skips
    260 decicycles in pow,    8191 runs,      1 skips
Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
parent 5b0da699
...@@ -192,6 +192,21 @@ void ff_jpeg2000_set_significance(Jpeg2000T1Context *t1, int x, int y, ...@@ -192,6 +192,21 @@ void ff_jpeg2000_set_significance(Jpeg2000T1Context *t1, int x, int y,
// static const uint8_t lut_gain[2][4] = { { 0, 0, 0, 0 }, { 0, 1, 1, 2 } }; (unused) // static const uint8_t lut_gain[2][4] = { { 0, 0, 0, 0 }, { 0, 1, 1, 2 } }; (unused)
static inline float exp2fi(int x) {
/* Normal range */
if (-126 <= x && x <= 128)
return av_int2float(x+127 << 23);
/* Too large */
else if (x > 128)
return INFINITY;
/* Subnormal numbers */
else if (x > -150)
return av_int2float(1 << (x+149));
/* Negligibly small */
else
return 0;
}
static void init_band_stepsize(AVCodecContext *avctx, static void init_band_stepsize(AVCodecContext *avctx,
Jpeg2000Band *band, Jpeg2000Band *band,
Jpeg2000CodingStyle *codsty, Jpeg2000CodingStyle *codsty,
...@@ -221,7 +236,7 @@ static void init_band_stepsize(AVCodecContext *avctx, ...@@ -221,7 +236,7 @@ static void init_band_stepsize(AVCodecContext *avctx,
* R_b = R_I + log2 (gain_b ) * R_b = R_I + log2 (gain_b )
* see ISO/IEC 15444-1:2002 E.1.1 eqn. E-3 and E-4 */ * see ISO/IEC 15444-1:2002 E.1.1 eqn. E-3 and E-4 */
gain = cbps; gain = cbps;
band->f_stepsize = pow(2.0, gain - qntsty->expn[gbandno]); band->f_stepsize = exp2fi(gain - qntsty->expn[gbandno]);
band->f_stepsize *= qntsty->mant[gbandno] / 2048.0 + 1.0; band->f_stepsize *= qntsty->mant[gbandno] / 2048.0 + 1.0;
break; break;
default: default:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment