wma lossless: reuse scalarproduct_and_madd_int16
This is done by padding the coefficient buffer with 0s, because the order
may be only a multiple of 4, and the DSP function requires batches of 8.
However, no sample with such a case was found, so request one if it uses
that kind of order.
Approximate relative speedup depending on instruction set:
plain C: -6%
mmxext: 51%
sse2: 54%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Showing
Please
register
or
sign in
to comment