• James Darnley's avatar
    avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions · 13d71c28
    James Darnley authored
    Yorkfield:
     - sse2:
       - complex: 4.13x faster (1514 vs. 367 cycles)
       - simple:  4.38x faster (1836 vs. 419 cycles)
    
    Skylake:
     - sse2:
       - complex: 3.61x faster ( 936 vs. 260 cycles)
       - simple:  3.97x faster (1126 vs. 284 cycles)
     - avx (versus sse2):
       - complex: 1.07x faster (260 vs. 244 cycles)
       - simple:  1.03x faster (284 vs. 274 cycles)
    13d71c28
h264_idct_10bit.asm 15.6 KB