• Janne Grunau's avatar
    arm: add ff_int32_to_float_fmul_array8_neon · 90b1b935
    Janne Grunau authored
    Quite a bit faster than int32_to_float_fmul_array8_c calling
    ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
    Number of cycles per int32_to_float_fmul_array8 call while decoding
    padded.dts on exynos5422:
    
                   before  after   change
    cortex-a7:     1270     951    -25%
    cortex-a15:     434     285    -34%
    
    checkasm --bench cycle counts:     cortex-a15   cortex-a7
    int32_to_float_fmul_array8_c:      1730.4       4384.5
    int32_to_float_fmul_array8_neon_c:  571.5       1694.3
    int32_to_float_fmul_array8_neon:    374.0       1448.8
    
    Interesting are the differences between
    int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
    The former is current behaviour of calling
    ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
    The raw numbers differ since checkasm uses different lengths than the
    dca decoder.
    90b1b935
fmtconvert_neon.S 2.98 KB