The messagedigest algorithm MD5
1 Introduction
(1) What is a messagedigest algorithm(MDA)?
A messagedigest algorithm is also called a hash function or a cryptographic hash function. It accepts a message as input and generates a fixedlength output, which is generally less than the length of the input message. The output is called a hash value, a fingerprint or a message digest.
(2) Properties of a messagedigest algorithm
When people plan to design a messagedigest algorithm, they try to make the algorithm satisfy the following properties:
. It should be oneway. Given the message digest, it is hard to get the original message.
. Given both input and output, it is difficult to find another input message which generates same output.
. It should be collisionresistant. It is computationally infeasible to find two messages, which generates same message digest. This property is not same as the second property. It is easier to make attack on this property than on the second property.
. The message digest should satisfy pseudorandomness.
When all of the above properties are satisfied, we call the algorithm a collisionresistant messagedigest algorithm. It is unknown whether collisionresistant messagedigest algorithm can exist at all.
(3) Possible applications
Messagedigest algorithms are mainly used in implementing digital signature. In this case, all of the above properties are required. However, the requirement is quite different when different applications use these algorithms. An application may rely upon some or all of the properties of the MDA. For example, some applications use the oneway property of a MDA. Because of its property of pseudorandomness, MDA is also used to be part of the mechanism for random number generation.
(4) Why is MD5 fast?
There are three kinds of operations in MD5:
. Bitwise Boolean Operation
. Modular Addition
. Cyclic Shift Operation
All these three operations are very fast on a 32bit machine. So MD5 is quite fast.
2 The mechanism of MD5
MD5, as well as MD2 and MD4, follows a design principle proposed by Merkle and Damagard. Its basic idea is to do hash in a blockwise mode. In a word, MD5 consists of two phases: padding phase and compression phase. In the padding phase, some extra bits (1 to 512bits) are appended to the input message. The result bits is congruent to 448 mod 512. Then the length of the initial message is transformed to a 64bit binarystring(if the length is greater than 2^{64}, the lower 64bit is used) and this 64 bits is added to the tail of the message too. So the padding phase ends with a bit stream that consists of one or more 512bit blocks. In the compression phase, a compression function is used on each 512bit block and generates a 128bit output. The output is always involved in the calculation of next round.
For convenience, we describe the algorithm through the following five steps:

Add padding bits behind the input message
This step is to elongate the initial message and make its length be congruent to 448 mod 512. First, a single bit “1” is appended to the message. Then, a series of “0” bits are appended so that
Length(the padded message) 448 mod 512
For example, suppose the initial message has 1000 bits. Then this step will add 1 bit “1” and 471 bits “0”. As another example, consider a message with just 448 bits. Since the algorithm doesn’t check whether the initial length is congruent to 448 mod 512, one bit “1” and 511 bits “0” will be appended to the message. Therefore, the padding bits’ length is at least one and at most 512.

Add a 64bit binarystring which is the representation of the message’s length
Here, please pay attention to the meaning of the 64bit binarystring. You shouldn’t regard it as first 64 bits of the initial message. It is indeed the binary representation of the length of the initial message. For example, suppose the message is 1000bits long. Its 64bit binary representation would be 0x00000000000003E8. If the message is very long, greater than 2^{64}, only the lower 64 bits of its binary representation are used.

Initialize four 32bit values
These four 32bit variables would be used to compute the message digest. We denote them by A, B, C, D. Their initial values are:
A = 0x67452301
B = 0xEFCDAB89
C = 0x98BADCFE
D = 0x10325376

Compress every 512bit block

Generate the 128bit output
There are four rounds and sixteen steps in each round.
3 Compared to MD2 and MD4
It uses more rounds.
4 Is MD5 secure?
Rivest designed MD2, MD4 as well as MD5.
5 Cryptoanalysis of MD5 