x86/vp9lpf: simplify 2nd transpose in 44/48/88/84.
For non-avx optims, this saves 8 movs. before: 1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips 3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips 2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips 3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips after: 1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips 3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips 2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips 3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips
Showing
Please
register
or
sign in
to comment