lavfi/nlmeans: make compute_safe_ssd_integral_image_c faster
before: ssd_integral_image_c: 49204.6 after: ssd_integral_image_c: 44272.8 Unrolling by 4 made the biggest difference on odroid-c2 (aarch64); unrolling by 2 or 8 both raised 46k cycles vs 44k for 4. Additionally, this is a much better reference when writing SIMD (SIMD vectorization will just target 16 instead of 4).
Showing
Please
register
or
sign in
to comment