1. 08 Jan, 2014 2 commits
    • Ronald S. Bultje's avatar
      vp9/x86: idct_32x32_add_ssse3 sub-16x16-idct. · 37b001d1
      Ronald S. Bultje authored
      Runtime of all IDCTs together goes from 3327 to 2473 cycles (intra, i.e.
      ~35% faster) or from 2312 to 1448 cycles (inter, i.e. ~60% faster). Total
      decode time of ped1080p.webm goes from 8.086sec to 7.974sec (1.4% faster).
      37b001d1
    • Ronald S. Bultje's avatar
      vp9/x86: idct_32x32_add_ssse3. · e84d14df
      Ronald S. Bultje authored
      Sub-IDCTs will follow later. ped1080.webm goes from 9.295s to 8.191s
      (13.5% faster). The IDCT itself goes from 4372 (intra) or 4337 (inter)
      to 403 (intra) or 329 (inter) cycles for the DC-only form, 23755 (intra)
      or 23723 (inter) to 3497 (intra) or 3607 (inter) cycles for the no-DC
      form, which averages from 23393 (intra) or 16612 (inter) to 3449 (intra)
      or 2392 (inter) for all 32x32s together, i.e. about ~7x faster (all
      tests done on ped1080p.webm).
      e84d14df
  2. 07 Jan, 2014 2 commits
  3. 06 Jan, 2014 36 commits