• James Darnley's avatar
    yadif: x86 assembly for 9 to 14-bit samples · 0a5814c9
    James Darnley authored
    These smaller samples do not need to be unpacked to double words
    allowing the code to process more pixels every iteration (still 2 in MMX
    but 6 in SSE2).  It also avoids emulating the missing double word
    instructions on older instruction sets.
    
    Like with the previous code for 16-bit samples this has been tested on
    an Athlon64 and a Core2Quad.
    
    Athlon64:
    1809275 decicycles in C,    32718 runs, 50 skips
     911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
     495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster
    
    Core2Quad:
     921363 decicycles in C,     32756 runs, 12 skips
     486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
     293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
     284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster
    Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
    0a5814c9
Makefile 493 Bytes