• Christophe Gisquet's avatar
    rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
    Christophe Gisquet authored
    Code mostly inspired by vp8's MC, however:
    - its MMX2 horizontal filter is worse because it can't take advantage of
      the coefficient redundancy
    - that same coefficient redundancy allows better code for non-SSSE3 versions
    
    Benchmark (rounded to tens of unit):
            V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
    C       445    358   985    1785    1559    3280
    MMX*    219    271   478     714     929    1443
    SSE2    131    158   294     425     515     892
    SSSE3   120    122   248     387     390     763
    
    End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
    all loop filter functions now take around 55% of decoding time, while luma MC
    dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
    Signed-off-by: 's avatarDiego Biurrun <diego@biurrun.de>
    110d0cdc
Name
Last commit
Last update
..
Makefile Loading commit data...
ac3dsp.asm Loading commit data...
ac3dsp_mmx.c Loading commit data...
cabac.h Loading commit data...
cavsdsp_mmx.c Loading commit data...
dct32_sse.asm Loading commit data...
deinterlace.asm Loading commit data...
dnxhd_mmx.c Loading commit data...
dsputil_mmx.c Loading commit data...
dsputil_mmx.h Loading commit data...
dsputil_mmx_avg_template.c Loading commit data...
dsputil_mmx_qns_template.c Loading commit data...
dsputil_mmx_rnd_template.c Loading commit data...
dsputil_yasm.asm Loading commit data...
dsputilenc_mmx.c Loading commit data...
dsputilenc_yasm.asm Loading commit data...
fdct_mmx.c Loading commit data...
fft.c Loading commit data...
fft.h Loading commit data...
fft_3dn.c Loading commit data...
fft_3dn2.c Loading commit data...
fft_mmx.asm Loading commit data...
fft_sse.c Loading commit data...
fmtconvert.asm Loading commit data...
fmtconvert_mmx.c Loading commit data...
h264_chromamc.asm Loading commit data...
h264_chromamc_10bit.asm Loading commit data...
h264_deblock.asm Loading commit data...
h264_deblock_10bit.asm Loading commit data...
h264_i386.h Loading commit data...
h264_idct.asm Loading commit data...
h264_idct_10bit.asm Loading commit data...
h264_intrapred.asm Loading commit data...
h264_intrapred_10bit.asm Loading commit data...
h264_intrapred_init.c Loading commit data...
h264_qpel_10bit.asm Loading commit data...
h264_qpel_mmx.c Loading commit data...
h264_weight.asm Loading commit data...
h264_weight_10bit.asm Loading commit data...
h264dsp_mmx.c Loading commit data...
idct_mmx.c Loading commit data...
idct_mmx_xvid.c Loading commit data...
idct_sse2_xvid.c Loading commit data...
idct_xvid.h Loading commit data...
imdct36_sse.asm Loading commit data...
lpc_mmx.c Loading commit data...
mathops.h Loading commit data...
mlpdsp.c Loading commit data...
motion_est_mmx.c Loading commit data...
mpegaudiodec_mmx.c Loading commit data...
mpegvideo_mmx.c Loading commit data...
mpegvideo_mmx_template.c Loading commit data...
pngdsp-init.c Loading commit data...
pngdsp.asm Loading commit data...
proresdsp-init.c Loading commit data...
proresdsp.asm Loading commit data...
rv34dsp.asm Loading commit data...
rv34dsp_init.c Loading commit data...
rv40dsp.asm Loading commit data...
rv40dsp_init.c Loading commit data...
sbrdsp.asm Loading commit data...
sbrdsp_init.c Loading commit data...
simple_idct_mmx.c Loading commit data...
snowdsp_mmx.c Loading commit data...
vc1dsp_mmx.c Loading commit data...
vc1dsp_yasm.asm Loading commit data...
vp3dsp.asm Loading commit data...
vp56_arith.h Loading commit data...
vp56dsp.asm Loading commit data...
vp56dsp_init.c Loading commit data...
vp8dsp-init.c Loading commit data...
vp8dsp.asm Loading commit data...
w64xmmtest.c Loading commit data...