• Ganesh Ajjanagadde's avatar
    avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm · 971d12b7
    Ganesh Ajjanagadde authored
    This uses Stein's binary GCD algorithm:
    https://en.wikipedia.org/wiki/Binary_GCD_algorithm
    to get a roughly 4x speedup over Euclidean GCD on standard architectures
    with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
    At the moment, the compiler intrinsic is used on GCC and Clang due to
    its easy availability.
    
    Quick note regarding overflow: yes, subtractions on int64_t can, but the
    llabs takes care of that. The llabs is also guaranteed to be safe, with
    no annoying INT64_MIN business since INT64_MIN being a power of 2, is
    shifted down before being sent to llabs.
    
    The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
    GCC, this is provided by a built-in. On Microsoft, there is a
    BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
    Apparently it is not available on 32 bit builds; so this may or may not
    work correctly. On Intel, per the documentation there is only an
    intrinsic for _bit_scan_forward and people have posted on forums
    regarding _bit_scan_forward64, but often their documentation is
    woeful. Again, I don't have it, so I can't test.
    
    As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
    use a compiled version based on the De-Bruijn method of Leiserson et al:
    http://supertech.csail.mit.edu/papers/debruijn.pdf.
    
    Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
    with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
    make fate.
    
    aac-am00_88.err:
    builtin:
    714 decicycles in av_gcd,    4095 runs,      1 skips
    
    de-bruijn:
    1440 decicycles in av_gcd,    4096 runs,      0 skips
    
    previous:
    2889 decicycles in av_gcd,    4096 runs,      0 skips
    Signed-off-by: 's avatarGanesh Ajjanagadde <gajjanagadde@gmail.com>
    Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
    971d12b7
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
cmdutils_opencl.c Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
ffmpeg.c Loading commit data...
ffmpeg.h Loading commit data...
ffmpeg_dxva2.c Loading commit data...
ffmpeg_filter.c Loading commit data...
ffmpeg_opt.c Loading commit data...
ffmpeg_vdpau.c Loading commit data...
ffmpeg_videotoolbox.c Loading commit data...
ffplay.c Loading commit data...
ffprobe.c Loading commit data...
ffserver.c Loading commit data...
ffserver_config.c Loading commit data...
ffserver_config.h Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...