1. 09 Dec, 2015 4 commits
    • Ganesh Ajjanagadde's avatar
      avcodec/aacsbr_tablegen: always initialize tables at runtime · cb93df0d
      Ganesh Ajjanagadde authored
      This gets rid of virtually useless hardcoded tables hackery. The reason
      it is useless is that a 320 element lut is anyway placed regardless of
      --enable-hardcoded-tables, from which all necessary tables are trivially
      derived at runtime at very low cost:
      
      sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
      what is relevant here since looping drastically changes the bench). Fluctuations
      are on the order of 10% for the single run test:
      39400 decicycles in aacsbr_tableinit,       1 runs,      0 skips
      25325 decicycles in aacsbr_tableinit,       2 runs,      0 skips
      18475 decicycles in aacsbr_tableinit,       4 runs,      0 skips
      15008 decicycles in aacsbr_tableinit,       8 runs,      0 skips
      13016 decicycles in aacsbr_tableinit,      16 runs,      0 skips
      12005 decicycles in aacsbr_tableinit,      32 runs,      0 skips
      11546 decicycles in aacsbr_tableinit,      64 runs,      0 skips
      11506 decicycles in aacsbr_tableinit,     128 runs,      0 skips
      11500 decicycles in aacsbr_tableinit,     256 runs,      0 skips
      11183 decicycles in aacsbr_tableinit,     509 runs,      3 skips
      
      Tested with FATE with/without --enable-hardcoded-tables.
      Reviewed-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      cb93df0d
    • Jean Delvare's avatar
      avfilter/vf_delogo: round to the closest value · e74f1a12
      Jean Delvare authored
      When the interpolated value is divided by the sum of weights, no
      rounding is done, which means the value is truncated. This results in
      a slight bias towards dark green in the interpolated area. Rounding
      properly removes the bias.
      
      I measured this change to reduce the interpolation error by 1 to 2 %
      on average on a number of sample input and logo area combinations.
      Signed-off-by: 's avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      e74f1a12
    • Ganesh Ajjanagadde's avatar
      swscale/utils: replace pow by exp2 · c8905e0d
      Ganesh Ajjanagadde authored
      exp2 is a faster function.
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      c8905e0d
    • Ganesh Ajjanagadde's avatar
      avcodec/jpeg2000: replace naive pow call with smarter exp2fi · 42868ca5
      Ganesh Ajjanagadde authored
      pow is a very wasteful function for this purpose. A low hanging fruit
      would be simply to replace with exp2f, and that does yield some speedup.
      However, there are 2 drawbacks of this:
      1. It does not exploit the integer nature of the argument.
      2. (minor) Some platforms lack a proper exp2f routine, making benefits available
      only to non broken libm.
      3. exp2f does not solve the same issue that plagues pow, namely terrible
      worst case performance. This is a fundamental issue known as the
      "table-maker's dilemma" recognized by Prof. Kahan himself and
      subsequently elaborated and researched by many others. All this is clear from benchmarks below.
      
      This exploits the IEEE-754 format to get very good performance even in
      the worst case for integer powers of 2. This solves all the issues noted
      above. Function tested with clang usan over [-1000, 1000] (beyond range of
      relevance for this, which is [-255, 255]), patch itself with FATE.
      
      Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
      the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
      Low number of runs also given to prove the point about worst case:
      
      pow:
       216270 decicycles in pow,       1 runs,      0 skips
       110175 decicycles in pow,       2 runs,      0 skips
        56085 decicycles in pow,       4 runs,      0 skips
        29013 decicycles in pow,       8 runs,      0 skips
        15472 decicycles in pow,      16 runs,      0 skips
         8689 decicycles in pow,      32 runs,      0 skips
         5295 decicycles in pow,      64 runs,      0 skips
         3599 decicycles in pow,     128 runs,      0 skips
         2748 decicycles in pow,     256 runs,      0 skips
         2304 decicycles in pow,     511 runs,      1 skips
         2072 decicycles in pow,    1022 runs,      2 skips
         1963 decicycles in pow,    2044 runs,      4 skips
         1894 decicycles in pow,    4091 runs,      5 skips
         1860 decicycles in pow,    8184 runs,      8 skips
      
      exp2f:
       134140 decicycles in pow,       1 runs,      0 skips
        68110 decicycles in pow,       2 runs,      0 skips
        34530 decicycles in pow,       4 runs,      0 skips
        17677 decicycles in pow,       8 runs,      0 skips
         9175 decicycles in pow,      16 runs,      0 skips
         4931 decicycles in pow,      32 runs,      0 skips
         2808 decicycles in pow,      64 runs,      0 skips
         1747 decicycles in pow,     128 runs,      0 skips
         1208 decicycles in pow,     256 runs,      0 skips
          952 decicycles in pow,     512 runs,      0 skips
          822 decicycles in pow,    1024 runs,      0 skips
          765 decicycles in pow,    2047 runs,      1 skips
          722 decicycles in pow,    4094 runs,      2 skips
          693 decicycles in pow,    8190 runs,      2 skips
      
      exp2fi:
         2740 decicycles in pow,       1 runs,      0 skips
         1530 decicycles in pow,       2 runs,      0 skips
          955 decicycles in pow,       4 runs,      0 skips
          622 decicycles in pow,       8 runs,      0 skips
          477 decicycles in pow,      16 runs,      0 skips
          368 decicycles in pow,      32 runs,      0 skips
          317 decicycles in pow,      64 runs,      0 skips
          291 decicycles in pow,     128 runs,      0 skips
          277 decicycles in pow,     256 runs,      0 skips
          268 decicycles in pow,     512 runs,      0 skips
          265 decicycles in pow,    1024 runs,      0 skips
          263 decicycles in pow,    2048 runs,      0 skips
          263 decicycles in pow,    4095 runs,      1 skips
          260 decicycles in pow,    8191 runs,      1 skips
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      42868ca5
  2. 08 Dec, 2015 22 commits
  3. 07 Dec, 2015 14 commits