1. 07 Nov, 2015 17 commits
  2. 06 Nov, 2015 9 commits
  3. 05 Nov, 2015 10 commits
  4. 04 Nov, 2015 4 commits
    • Ganesh Ajjanagadde's avatar
      swresample/resample: speed up build_filter by 50% · 9bec6d71
      Ganesh Ajjanagadde authored
      This speeds up build_filter by ~ 50%. This gain should be pretty
      consistent across all architectures and platforms.
      
      Essentially, this relies on a observation that the filters have some
      even/odd symmetry that may be exploited during the construction of the
      polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] are
      easily derived from [0, 0.5] and expensive reevaluation of function
      points are unnecessary. This requires some rather annoying even/odd
      bookkeeping as can be seen from the patch.
      
      I vaguely recall from signal processing theory more general symmetries allowing even greater
      optimization of the construction. At a high level, "even functions"
      correspond to 2, and one can imagine variations. Nevertheless, for the sake
      of some generality and because of existing filters, this is all that is
      being exploited.
      
      Currently, this patch relies on phase_count being even or (trivially) 1,
      though this is not an inherent limitation to the approach. This
      assumption is safe as phase_count is 1 << phase_bits, and is hence a
      power of two. There is no way for user API to set it to a nontrivial odd
      number. This assumption has been placed as an assert in the code.
      
      To repeat, this assumes even symmetry of the filters, which is the most common
      way to get generalized linear phase anyway and is true of all currently
      supported filters.
      
      As a side note, accuracy should be identical or perhaps slightly better
      due to this "forcing" filter symmetries leading to a better phase
      characteristic. As before, I can't test this claim easily, though it may
      be of interest.
      
      Patch tested with FATE.
      
      Sample benchmark (x86-64, Haswell, GNU/Linux):
      
      test: swr-resample-dblp-44100-2626
      
      new:
      527376779 decicycles in build_filter(loop 1000),     256 runs,      0 skips
      524361765 decicycles in build_filter(loop 1000),     512 runs,      0 skips
      516552574 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
      
      old:
      974178658 decicycles in build_filter(loop 1000),     256 runs,      0 skips
      972794408 decicycles in build_filter(loop 1000),     512 runs,      0 skips
      954350046 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
      
      Note that lower level optimizations are entirely possible, I focussed on
      getting the high level semantics correct. In any case, this should
      provide a good foundation.
      Reviewed-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      9bec6d71
    • Michael Niedermayer's avatar
      avcodec/mjpegdec: Reinitialize IDCT on BPP changes · cc35f6f4
      Michael Niedermayer authored
      Fixes misaligned access
      Fixes: dc9262a469f6f315f74c087a7b3a7f35/signal_sigsegv_2e95bcd_9_9c0f9f4a9ba82aa9b3ab2b91ce4d5277.jpg
      
      Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      cc35f6f4
    • Michael Niedermayer's avatar
      avcodec/mjpegdec: Check index in ljpeg_decode_yuv_scan() before using it · d24888ef
      Michael Niedermayer authored
      Fixes: 04715144ba237443010554be0d05343f/asan_heap-oob_1eafc76_1737_c685b48041a563461839e4e7ab97abb8.jpg
      Fixes out of array access
      
      Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
      Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
      d24888ef
    • Ganesh Ajjanagadde's avatar
      avcodec/aacsbr_template: replace qsort with AV_QSORT · fd0bf457
      Ganesh Ajjanagadde authored
      When sbr->reset is set in encode_frame, a bunch of qsort calls might get made.
      Thus, there is the potential of calling qsort whenever the spectral
      contents change.
      
      AV_QSORT is substantially faster due to the inlining of the comparison callback.
      Thus, the increase in performance should be worth the increase in binary size.
      
      Tested with FATE.
      Reviewed-by: 's avatarRostislav Pehlivanov <atomnuker@gmail.com>
      Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
      fd0bf457