• Christophe Gisquet's avatar
    x86: simple_idct(_put): 10bits versions · 4369b9dc
    Christophe Gisquet authored
    Modeled from the prores version. Clips to [0;1023] and is bitexact.
    Bitexactness requires to add offsets in different places compared to
    prores or C, and makes the function approximately 2% slower.
    
    For 16 frames of a DNxHD 4:2:2 10bits test sequence:
    
    C:    60861 decicycles in idct, 1048205 runs,    371 skips
    sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
    avx:  26272 decicycles in idct, 1048171 runs,    405 skips
    
    The add version is not implemented, so the corresponding dsp
    function is set to NULL to make it clear in a code executing it.
    Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
    4369b9dc
proresdsp.asm 1.51 KB