1. 09 Dec, 2015 5 commits
    • Ganesh Ajjanagadde's avatar
      lavfi/show_palette: fix memory leak · 00c32201
      Ganesh Ajjanagadde authored
      Recent commits 6aaac24d and
      3835554b made progress towards cleaning
      up usage of the formats API, and in particular fixed possible NULL pointer
      dereferences.
      
      This commit addresses the issue of possible resource leaks when some intermediate
      call fails. Unfortunately, even leaving aside this subtle intermediate
      failure aspect, commit 80876320 was only
      partially successful in addressing memleaks. Hopefully, this commit
      fixes the issue completely.
      
      Tested with valgrind --leak-check=full --show-leak-kinds=all, and manual simulation
      of malloc/realloc failures.
      
      Fixes: CID 1270818.
      Reviewed-by: 's avatarClément Bœsch <u@pkh.me>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      00c32201
    • Ganesh Ajjanagadde's avatar
      avcodec/aacsbr_tablegen: always initialize tables at runtime · cb93df0d
      Ganesh Ajjanagadde authored
      This gets rid of virtually useless hardcoded tables hackery. The reason
      it is useless is that a 320 element lut is anyway placed regardless of
      --enable-hardcoded-tables, from which all necessary tables are trivially
      derived at runtime at very low cost:
      
      sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
      what is relevant here since looping drastically changes the bench). Fluctuations
      are on the order of 10% for the single run test:
      39400 decicycles in aacsbr_tableinit,       1 runs,      0 skips
      25325 decicycles in aacsbr_tableinit,       2 runs,      0 skips
      18475 decicycles in aacsbr_tableinit,       4 runs,      0 skips
      15008 decicycles in aacsbr_tableinit,       8 runs,      0 skips
      13016 decicycles in aacsbr_tableinit,      16 runs,      0 skips
      12005 decicycles in aacsbr_tableinit,      32 runs,      0 skips
      11546 decicycles in aacsbr_tableinit,      64 runs,      0 skips
      11506 decicycles in aacsbr_tableinit,     128 runs,      0 skips
      11500 decicycles in aacsbr_tableinit,     256 runs,      0 skips
      11183 decicycles in aacsbr_tableinit,     509 runs,      3 skips
      
      Tested with FATE with/without --enable-hardcoded-tables.
      Reviewed-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      cb93df0d
    • Jean Delvare's avatar
      avfilter/vf_delogo: round to the closest value · e74f1a12
      Jean Delvare authored
      When the interpolated value is divided by the sum of weights, no
      rounding is done, which means the value is truncated. This results in
      a slight bias towards dark green in the interpolated area. Rounding
      properly removes the bias.
      
      I measured this change to reduce the interpolation error by 1 to 2 %
      on average on a number of sample input and logo area combinations.
      Signed-off-by: 's avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      e74f1a12
    • Ganesh Ajjanagadde's avatar
      swscale/utils: replace pow by exp2 · c8905e0d
      Ganesh Ajjanagadde authored
      exp2 is a faster function.
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      c8905e0d
    • Ganesh Ajjanagadde's avatar
      avcodec/jpeg2000: replace naive pow call with smarter exp2fi · 42868ca5
      Ganesh Ajjanagadde authored
      pow is a very wasteful function for this purpose. A low hanging fruit
      would be simply to replace with exp2f, and that does yield some speedup.
      However, there are 2 drawbacks of this:
      1. It does not exploit the integer nature of the argument.
      2. (minor) Some platforms lack a proper exp2f routine, making benefits available
      only to non broken libm.
      3. exp2f does not solve the same issue that plagues pow, namely terrible
      worst case performance. This is a fundamental issue known as the
      "table-maker's dilemma" recognized by Prof. Kahan himself and
      subsequently elaborated and researched by many others. All this is clear from benchmarks below.
      
      This exploits the IEEE-754 format to get very good performance even in
      the worst case for integer powers of 2. This solves all the issues noted
      above. Function tested with clang usan over [-1000, 1000] (beyond range of
      relevance for this, which is [-255, 255]), patch itself with FATE.
      
      Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
      the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
      Low number of runs also given to prove the point about worst case:
      
      pow:
       216270 decicycles in pow,       1 runs,      0 skips
       110175 decicycles in pow,       2 runs,      0 skips
        56085 decicycles in pow,       4 runs,      0 skips
        29013 decicycles in pow,       8 runs,      0 skips
        15472 decicycles in pow,      16 runs,      0 skips
         8689 decicycles in pow,      32 runs,      0 skips
         5295 decicycles in pow,      64 runs,      0 skips
         3599 decicycles in pow,     128 runs,      0 skips
         2748 decicycles in pow,     256 runs,      0 skips
         2304 decicycles in pow,     511 runs,      1 skips
         2072 decicycles in pow,    1022 runs,      2 skips
         1963 decicycles in pow,    2044 runs,      4 skips
         1894 decicycles in pow,    4091 runs,      5 skips
         1860 decicycles in pow,    8184 runs,      8 skips
      
      exp2f:
       134140 decicycles in pow,       1 runs,      0 skips
        68110 decicycles in pow,       2 runs,      0 skips
        34530 decicycles in pow,       4 runs,      0 skips
        17677 decicycles in pow,       8 runs,      0 skips
         9175 decicycles in pow,      16 runs,      0 skips
         4931 decicycles in pow,      32 runs,      0 skips
         2808 decicycles in pow,      64 runs,      0 skips
         1747 decicycles in pow,     128 runs,      0 skips
         1208 decicycles in pow,     256 runs,      0 skips
          952 decicycles in pow,     512 runs,      0 skips
          822 decicycles in pow,    1024 runs,      0 skips
          765 decicycles in pow,    2047 runs,      1 skips
          722 decicycles in pow,    4094 runs,      2 skips
          693 decicycles in pow,    8190 runs,      2 skips
      
      exp2fi:
         2740 decicycles in pow,       1 runs,      0 skips
         1530 decicycles in pow,       2 runs,      0 skips
          955 decicycles in pow,       4 runs,      0 skips
          622 decicycles in pow,       8 runs,      0 skips
          477 decicycles in pow,      16 runs,      0 skips
          368 decicycles in pow,      32 runs,      0 skips
          317 decicycles in pow,      64 runs,      0 skips
          291 decicycles in pow,     128 runs,      0 skips
          277 decicycles in pow,     256 runs,      0 skips
          268 decicycles in pow,     512 runs,      0 skips
          265 decicycles in pow,    1024 runs,      0 skips
          263 decicycles in pow,    2048 runs,      0 skips
          263 decicycles in pow,    4095 runs,      1 skips
          260 decicycles in pow,    8191 runs,      1 skips
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      42868ca5
  2. 08 Dec, 2015 22 commits
  3. 07 Dec, 2015 13 commits