1. 02 Feb, 2017 2 commits
  2. 01 Feb, 2017 1 commit
  3. 31 Jan, 2017 1 commit
  4. 13 Jan, 2017 5 commits
  5. 02 Jan, 2017 2 commits
  6. 20 Dec, 2016 1 commit
  7. 07 Dec, 2016 1 commit
  8. 06 Dec, 2016 4 commits
  9. 30 Nov, 2016 3 commits
  10. 18 Nov, 2016 1 commit
  11. 15 Nov, 2016 1 commit
    • Ronald S. Bultje's avatar
      vp9: add avx2 iadst16 implementations. · 83a139e3
      Ronald S. Bultje authored
      Also a small cosmetic change to the avx2 idct16 version to make it
      explicit that one of the arguments to the write-out macros is unused
      for >=avx2 (it uses pmovzxbw instead of punpcklbw).
      83a139e3
  12. 21 Oct, 2016 1 commit
  13. 18 Oct, 2016 1 commit
  14. 02 Oct, 2016 1 commit
  15. 01 Oct, 2016 1 commit
  16. 23 Sep, 2016 2 commits
  17. 06 Aug, 2016 1 commit
  18. 02 Aug, 2016 1 commit
  19. 26 Jul, 2016 3 commits
    • Ronald S. Bultje's avatar
      vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. · a4edaa02
      Ronald S. Bultje authored
      Each takes about 0.1% of runtime in my profiles, and they didn't have
      any SIMD yet so far (we only had simd for npx=16 double-block versions).
      a4edaa02
    • Ronald S. Bultje's avatar
      vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. · 7ca422bb
      Ronald S. Bultje authored
      Each takes about 0.5% of runtime in my profiles, and they didn't have
      any SIMD yet so far (we only had simd for npx=16 double-block versions).
      7ca422bb
    • Ronald S. Bultje's avatar
      vp9: add 32x32 idct AVX2 implementation. · 726501a3
      Ronald S. Bultje authored
      About 1.8x speedup compared to AVX version for full IDCT. Other
      sub-IDCT scenarios also see speedups. Full --bench output for
      idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):
      
      nop: 16.5
      vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
      vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
      vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
      vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
      vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
      vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
      vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
      vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
      vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
      vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
      vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
      vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
      vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
      vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
      vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
      vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
      vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
      vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
      vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
      vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
      vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
      vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
      vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
      vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
      vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
      vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
      vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
      vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
      vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
      vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
      726501a3
  20. 20 Jul, 2016 7 commits