• Clément Bœsch's avatar
    x86/vp9lpf: simplify 2nd transpose in 44/48/88/84. · 669d4f90
    Clément Bœsch authored
    For non-avx optims, this saves 8 movs.
    
    before:
      1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips
      3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips
      2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips
      3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips
    
    after:
      1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips
      3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips
      2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips
      3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips
    669d4f90
vp9lpf.asm 30.9 KB