Commit d0a34aee authored by Giorgio Vazzana's avatar Giorgio Vazzana Committed by Michael Niedermayer

md5: optimize second round by using 4-operation form of G()

4-operation form is preferred over 3-operation because it breaks a long
dependency chain, thus allowing a superscalar processor to execute more
operations in parallel.
The idea was taken from: http://www.zorinaq.com/papers/md5-amd64.html

AMD Athlon(tm) II X3 450 Processor, x86_64

$ for i in $(seq 1 4); do ./avutil_md5_test2; done
size: 1048576  runs: 1024  time:    5.821 +- 0.019
size: 1048576  runs: 1024  time:    5.822 +- 0.019
size: 1048576  runs: 1024  time:    5.841 +- 0.018
size: 1048576  runs: 1024  time:    5.821 +- 0.018

$ for i in $(seq 1 4); do ./avutil_md5_test2; done
size: 1048576  runs: 1024  time:    5.646 +- 0.019
size: 1048576  runs: 1024  time:    5.646 +- 0.018
size: 1048576  runs: 1024  time:    5.642 +- 0.019
size: 1048576  runs: 1024  time:    5.641 +- 0.019
Signed-off-by: 's avatarMichael Niedermayer <michaelni@gmx.at>
parent b7be8ea9
......@@ -84,7 +84,7 @@ static const uint32_t T[64] = { // T[i]= fabs(sin(i+1)<<32)
\
if (i < 32) { \
if (i < 16) a += (d ^ (b & (c ^ d))) + X[ i & 15]; \
else a += (c ^ (d & (c ^ b))) + X[(1 + 5*i) & 15]; \
else a += ((d & b) | (~d & c))+ X[(1 + 5*i) & 15]; \
} else { \
if (i < 48) a += (b ^ c ^ d) + X[(5 + 3*i) & 15]; \
else a += (c ^ (b | ~d)) + X[( 7*i) & 15]; \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment