• Ronald S. Bultje's avatar
    vp9: add 16x16 idct avx2 (8-bit). · f0a2b624
    Ronald S. Bultje authored
    checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
    that it's about 1.65x as fast as the AVX version for the full IDCT, and
    similar speedups for the sub-IDCTs:
    
    nop: 24.6
    vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
    vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
    vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
    vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
    vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
    vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
    vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
    vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
    vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
    vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
    vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
    vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
    vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
    vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
    vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
    vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
    vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
    vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
    vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
    vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
    vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
    vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
    vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
    vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
    vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
    f0a2b624
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
cmdutils_opencl.c Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
ffmpeg.c Loading commit data...
ffmpeg.h Loading commit data...
ffmpeg_cuvid.c Loading commit data...
ffmpeg_dxva2.c Loading commit data...
ffmpeg_filter.c Loading commit data...
ffmpeg_opt.c Loading commit data...
ffmpeg_qsv.c Loading commit data...
ffmpeg_vaapi.c Loading commit data...
ffmpeg_vdpau.c Loading commit data...
ffmpeg_videotoolbox.c Loading commit data...
ffplay.c Loading commit data...
ffprobe.c Loading commit data...
ffserver.c Loading commit data...
ffserver_config.c Loading commit data...
ffserver_config.h Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...