• Martin Storsjö's avatar
    aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 · 8b11a89c
    Martin Storsjö authored
    This work is sponsored by, and copyright, Google.
    
    Previously all subpartitions except the eob=1 (DC) case ran with
    the same runtime:
    
    vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
    vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0
    
    By skipping individual 8x16 or 8x32 pixel slices in the first pass,
    we reduce the runtime of these functions like this:
    
    vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
    vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
    vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
    vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
    vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
    vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
    vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
    vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
    vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
    vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
    vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
    vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
    vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
    vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8
    
    I.e. in general a very minor overhead for the full subpartition case due
    to the additional cmps, but a significant speedup for the cases when we
    only need to process a small part of the actual input data.
    
    This is cherrypicked from libav commits
    cad42fad and
    a0c443a3.
    Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
    8b11a89c
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavresample Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
arch.mak Loading commit data...
cmdutils.c Loading commit data...
cmdutils.h Loading commit data...
cmdutils_common_opts.h Loading commit data...
cmdutils_opencl.c Loading commit data...
common.mak Loading commit data...
configure Loading commit data...
ffmpeg.c Loading commit data...
ffmpeg.h Loading commit data...
ffmpeg_cuvid.c Loading commit data...
ffmpeg_dxva2.c Loading commit data...
ffmpeg_filter.c Loading commit data...
ffmpeg_opt.c Loading commit data...
ffmpeg_qsv.c Loading commit data...
ffmpeg_vaapi.c Loading commit data...
ffmpeg_vdpau.c Loading commit data...
ffmpeg_videotoolbox.c Loading commit data...
ffplay.c Loading commit data...
ffprobe.c Loading commit data...
ffserver.c Loading commit data...
ffserver_config.c Loading commit data...
ffserver_config.h Loading commit data...
library.mak Loading commit data...
version.sh Loading commit data...