libavcodec/x86 · 110d0cdc9d1ec414a658f841a3fbefbf6f796d61 · Linshizhi / ffmpeg.wasm-core

rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc

Christophe Gisquet authored Apr 19, 2012

Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>

110d0cdc

Name	Last commit	Last update
..
Makefile		Loading commit data...
ac3dsp.asm		Loading commit data...
ac3dsp_mmx.c		Loading commit data...
cabac.h		Loading commit data...
cavsdsp_mmx.c		Loading commit data...
dct32_sse.asm		Loading commit data...
deinterlace.asm		Loading commit data...
dnxhd_mmx.c		Loading commit data...
dsputil_mmx.c		Loading commit data...
dsputil_mmx.h		Loading commit data...
dsputil_mmx_avg_template.c		Loading commit data...
dsputil_mmx_qns_template.c		Loading commit data...
dsputil_mmx_rnd_template.c		Loading commit data...
dsputil_yasm.asm		Loading commit data...
dsputilenc_mmx.c		Loading commit data...
dsputilenc_yasm.asm		Loading commit data...
fdct_mmx.c		Loading commit data...
fft.c		Loading commit data...
fft.h		Loading commit data...
fft_3dn.c		Loading commit data...
fft_3dn2.c		Loading commit data...
fft_mmx.asm		Loading commit data...
fft_sse.c		Loading commit data...
fmtconvert.asm		Loading commit data...
fmtconvert_mmx.c		Loading commit data...
h264_chromamc.asm		Loading commit data...
h264_chromamc_10bit.asm		Loading commit data...
h264_deblock.asm		Loading commit data...
h264_deblock_10bit.asm		Loading commit data...
h264_i386.h		Loading commit data...
h264_idct.asm		Loading commit data...
h264_idct_10bit.asm		Loading commit data...
h264_intrapred.asm		Loading commit data...
h264_intrapred_10bit.asm		Loading commit data...
h264_intrapred_init.c		Loading commit data...
h264_qpel_10bit.asm		Loading commit data...
h264_qpel_mmx.c		Loading commit data...
h264_weight.asm		Loading commit data...
h264_weight_10bit.asm		Loading commit data...
h264dsp_mmx.c		Loading commit data...
idct_mmx.c		Loading commit data...
idct_mmx_xvid.c		Loading commit data...
idct_sse2_xvid.c		Loading commit data...
idct_xvid.h		Loading commit data...
imdct36_sse.asm		Loading commit data...
lpc_mmx.c		Loading commit data...
mathops.h		Loading commit data...
mlpdsp.c		Loading commit data...
motion_est_mmx.c		Loading commit data...
mpegaudiodec_mmx.c		Loading commit data...
mpegvideo_mmx.c		Loading commit data...
mpegvideo_mmx_template.c		Loading commit data...
pngdsp-init.c		Loading commit data...
pngdsp.asm		Loading commit data...
proresdsp-init.c		Loading commit data...
proresdsp.asm		Loading commit data...
rv34dsp.asm		Loading commit data...
rv34dsp_init.c		Loading commit data...
rv40dsp.asm		Loading commit data...
rv40dsp_init.c		Loading commit data...
sbrdsp.asm		Loading commit data...
sbrdsp_init.c		Loading commit data...
simple_idct_mmx.c		Loading commit data...
snowdsp_mmx.c		Loading commit data...
vc1dsp_mmx.c		Loading commit data...
vc1dsp_yasm.asm		Loading commit data...
vp3dsp.asm		Loading commit data...
vp56_arith.h		Loading commit data...
vp56dsp.asm		Loading commit data...
vp56dsp_init.c		Loading commit data...
vp8dsp-init.c		Loading commit data...
vp8dsp.asm		Loading commit data...
w64xmmtest.c		Loading commit data...