• Ronald S. Bultje's avatar
    vp9/x86: idct_32x32_add_ssse3. · e84d14df
    Ronald S. Bultje authored
    Sub-IDCTs will follow later. ped1080.webm goes from 9.295s to 8.191s
    (13.5% faster). The IDCT itself goes from 4372 (intra) or 4337 (inter)
    to 403 (intra) or 329 (inter) cycles for the DC-only form, 23755 (intra)
    or 23723 (inter) to 3497 (intra) or 3607 (inter) cycles for the no-DC
    form, which averages from 23393 (intra) or 16612 (inter) to 3449 (intra)
    or 2392 (inter) for all 32x32s together, i.e. about ~7x faster (all
    tests done on ped1080p.webm).
    e84d14df
vp9dsp_init.c 8.11 KB