CSE P 590 TU: Practical Aspects of Modern Cryptography Winter 2006 Project
Bing Wu (9735592)
“Chinese” Attacks on Hashes

Background
A hash function is a function that takes a variablesize input and returns a fixedsize string, which is called the hash value. The hash value is relatively easy to compute for any given input. A hash function is oneway and collisionfree, which are key properties for our topic.
MD4 is a hash function developed in 1990. It is based on the basic arithmetic and logical operations. Since its publication, several other hash functions have been designed using MD4 as the basis, including MD5, HAVAL, RIPEMD, SHA0, SHA1, SHA256, etc. These hash functions all follow the same design principal as well as have similar structures as MD4. These hash functions play very important roles in the digital signatures, data integrity, and many other cryptographic protocols. They not only ensure the information security, but also improve the efficiency. Among them, MD5 and SHA1 are most widely used today.
Since the appearance of all these hash functions, people have been trying to develop techniques to perform collision search attacks on them. Developing new hash functions and breaking them are like two sides of a coin. They both help people develop better hash functions to better serve their purposes in the cryptographic world.
There have been significant developments in the area in the past several years. However, the real breakthroughs came in 2004 and 2005. Dr. Xiaoyun Wang and her Chinese crew published a series of papers to break MD4, MD5, HAVAL, RIPEMD, and SHA1 [27]. Although A. Joux [8] broke SHA0 with the time complexity for finding a collision in about 2 SHA0 operations, their result was significantly better.
Below are the best results by “Chinese” attacks on those hash functions:

The time complexity for finding a collision for MD4 is about 2MD4 operations [5].

The time complexity for finding a collision for MD5 is to find the first blocks with about 2MD5 operations, and the second blocks with about 2MD5 operations [6].

The time complexity for finding a collision for HAVAL128 is about 2HAVAL128 operations [7].

The time complexity for finding a collision for RIPEMD is about 2RIPEMD operations [5].

The time complexity for finding a collision for SHA0 is about 2SHA0 operations [1].

The time complexity for finding a collision for SHA1 is about 2SHA1 operations announced in [2] and later on, the result was improved to about 2SHA1 operations [9].

“Chinese” Collision attacks
The “Chinese” collision attacks on hash functions are precise differential attacks in which the differential path is more restrictive since it depends on the message difference as well as the specific values of the message bits involved. They do not use the exclusiveor as a measure of difference, but instead use modular integer subtraction as the measure. It’s called a modular differential by the Chinese crew.
The attacks have some principles applicable to all MD4 family hash functions. Such attacks first pick a favorable message differential between two messages M and M’ such that these two messages have a higher probability of having the same hash value. Depending on the message differential, some number of probabilistic conditions must be met. It uses a technique called “message modification” to eliminate some of these conditions which appear in the early stages of the hash compression function. This reduction leads to a better overall complexity for the collision attack.
Basically, the attacks include three steps:

Find a collision differential for which M and M’ probably produce a collision.
Taking MD4 as an example,
M = M’ −M = (m,m, ...... ,m)
m = 2, m = 2 − 2, m = −2
mi = 0, 0 ≤ i ≤ 15, i 1, 2, 12.
The reason why the collision differential is selected is not clearly specified in the article [5], but it’s stated in [7] that based on significant amount of analysis, it’s believed to be fairly easy for M and M’ to produce a collision with the collision differential chosen.

Derive a set of sufficient conditions which ensure the collision differential to hold.
For example, the MD4 compression function has three rounds. Each round uses a different nonlinear Boolean function defined as follows:
F(X, Y, Z) = (X Y) (X Z)
G(X, Y, Z) = (X Y) (X Z) (Y Z)
H(X, Y, Z) = X Y Z
Some properties of these three nonlinear Boolean functions are very helpful for determining sufficient conditions for the differential paths that are used in the collision search attack on MD4. The sufficient conditions that ensure all the characteristics to hold can be verified by these properties. Refer to [5] for details.

For any random message M, make some modification to M such that almost all the sufficient conditions hold. This is done by two types of message modification techniques, which are termed as “singlestep modification” and “multistep modification”.
Taking MD4 as an example, it is believed that the first two rounds of MD4 is not oneway, and for the singlestep modification, it is to modify M such that all the conditions in round 1 hold. For the multistep modification, it is to modify M so that some bits of M get changed in round 2 to fulfill more conditions while all the conditions in round 1 remain hold. This greatly improves the probability that M and M’ may produce a collision. Refer to [5] for details.

Results for MD4 and MD5 attacks
Practically, it's computationally feasible to apply “Chinese” attacks on MD4 and MD5 hash functions by employing the computational power of a personal computer. I have C programs implementing the MD4 algorithm in [5] and MD5 algorithm in [6]. They are running under the Unix/Linux environment. I used cygwin to mimic the Linux environment on my WinXP system on a Pentium4 3.40G machine. I used gcc to compile the programs and used Os and appropriate mcpu and mtune flags for my machine to achieve the optimal performance.
Below are two runs for each MD4 and MD5 program respectively. As it shows, it takes about 5 seconds to find a collision for MD4 attacks and about 1 hour for MD5 attacks. These results clearly demonstrate that both MD4 and MD5 are severely broken and thus should not be used any longer.
$ time ./md4.exe
unsigned int m0[16] = {
0x45051308, 0x81730a12, 0xd56ad03c, 0x71628a68,
0xa54f00e7, 0xb32a6311, 0x0e13c786, 0xb48eae4b,
0x4656581e, 0x18a6deab, 0x9b50d7b2, 0x0cfc6be7,
0xb42bdf1e, 0x814dcfbb, 0xb776931d, 0xb27bcba6
};
unsigned int m1[16] = {
0x45051308, 0x01730a12, 0x456ad03c, 0x71628a68,
0xa54f00e7, 0xb32a6311, 0x0e13c786, 0xb48eae4b,
0x4656581e, 0x18a6deab, 0x9b50d7b2, 0x0cfc6be7,
0xb42adf1e, 0x814dcfbb, 0xb776931d, 0xb27bcba6
};
real 0m4.506s
user 0m4.496s
sys 0m0.020s
$ time ./md4.exe
unsigned int m0[16] = {
0xcbe53b38, 0x5032eef7, 0xb019844f, 0xebd1e372,
0x98d286ff, 0x76430bc9, 0xd2a8b026, 0xc2c5c353,
0x6d4d2c65, 0xa011e2ac, 0x61eec40e, 0x434b154e,
0xebafa851, 0xa9601efa, 0xc48a9a59, 0x578bbb57
};
unsigned int m1[16] = {
0xcbe53b38, 0xd032eef7, 0x2019844f, 0xebd1e372,
0x98d286ff, 0x76430bc9, 0xd2a8b026, 0xc2c5c353,
0x6d4d2c65, 0xa011e2ac, 0x61eec40e, 0x434b154e,
0xebaea851, 0xa9601efa, 0xc48a9a59, 0x578bbb57
};
real 0m5.467s
user 0m5.477s
sys 0m0.010s
$ time ./md5.exe
block #1 done
block #2 done
unsigned int m0[32] = {
0x1929aa4b, 0xd8844d17, 0x8e5e1527, 0x34b458d0,
0xacc2f035, 0xdbe6e5f1, 0x234533f1, 0x6c716baf,
0x05352d45, 0x61f48bfd, 0x769ae8c3, 0x8fda4316,
0x754098e2, 0x8c4005d8, 0xc26ca7b4, 0x22f12708,
0x06dea041, 0xe664ec4e, 0x1d72b3a0, 0x03bdc431,
0x47d0fc1c, 0x4c7bdc4e, 0x76648928, 0xbea20bd6,
0x5079c739, 0x4dae799f, 0xbbd34dfa, 0xdc019c7f,
0x9d18d4e3, 0x253ba683, 0xed5a754e, 0x5a8cd41a,
};
unsigned int m1[32] = {
0x1929aa4b, 0xd8844d17, 0x8e5e1527, 0x34b458d0,
0x2cc2f035, 0xdbe6e5f1, 0x234533f1, 0x6c716baf,
0x05352d45, 0x61f48bfd, 0x769ae8c3, 0x8fdac316,
0x754098e2, 0x8c4005d8, 0x426ca7b4, 0x22f12708,
0x06dea041, 0xe664ec4e, 0x1d72b3a0, 0x03bdc431,
0xc7d0fc1c, 0x4c7bdc4e, 0x76648928, 0xbea20bd6,
0x5079c739, 0x4dae799f, 0xbbd34dfa, 0xdc011c7f,
0x9d18d4e3, 0x253ba683, 0x6d5a754e, 0x5a8cd41a,
};
real 68m56.807s
user 68m42.968s
sys 0m0.070s
$ time ./md5.exe
block #1 done
block #2 done
unsigned int m0[32] = {
0x5c625d50, 0x7c27ef85, 0x24835757, 0x9b4b0d82,
0xff777f20, 0x0a0777b0, 0xe2ff34b4, 0xd3302c20,
0x0534ad31, 0x2033e5f7, 0x7fbadc53, 0x12c25f81,
0x5edab51a, 0x746c590d, 0xad958bb0, 0xfbe53434,
0x70fd8d2f, 0x2e073782, 0x22c1af9c, 0xe4b8fb96,
0xc53a1137, 0x0e0f9f6a, 0x66dd690e, 0x8f422950,
0x9e36a841, 0x05f2ddb9, 0x6aa211be, 0x9c429c6b,
0x5a3c7cea, 0x661fa395, 0x1668028a, 0x7c61522d,
};
unsigned int m1[32] = {
0x5c625d50, 0x7c27ef85, 0x24835757, 0x9b4b0d82,
0x7f777f20, 0x0a0777b0, 0xe2ff34b4, 0xd3302c20,
0x0534ad31, 0x2033e5f7, 0x7fbadc53, 0x12c2df81,
0x5edab51a, 0x746c590d, 0x2d958bb0, 0xfbe53434,
0x70fd8d2f, 0x2e073782, 0x22c1af9c, 0xe4b8fb96,
0x453a1137, 0x0e0f9f6a, 0x66dd690e, 0x8f422950,
0x9e36a841, 0x05f2ddb9, 0x6aa211be, 0x9c421c6b,
0x5a3c7cea, 0x661fa395, 0x9668028a, 0x7c61522d,
};
real 56m53.298s
user 56m52.757s
sys 0m0.030s

What does it mean and what to do about it?
Well, it means that hash functions such as MD5 are no longer useful as digital signature hashes. It is no longer the case that you can believe that a person’s signed document is identical to your version of that document, even if the checksum matches.
SHA1 is the best we’ve got at the moment and has the time complexity of the order 2, which appears to be best standing against collision attacks. While this seems like a huge number, distributed searches using many computers across the Internet have solved problems that were twice as complex. In other words, there exist large collections of computers distributed over the Internet that are capable of finding SHA1 collisions.
To exploit a collision attack, an adversary would typically begin by constructing two messages with the same hash where one message appears legitimate. The attacker could then try to get you to digitally sign the legitimate message. He would then claim that you actually signed the malicious message, and prove this claim by showing that your signature matches the malicious message. There is a similar concern for systems involved with the signed code and certificates that an adversary might be able to construct a valid signature or certificate request that had a corresponding hash collision with a malicious signature or certificate request.
In terms of practical security, the major concern about these new attacks is that it might lead to more efficient attacks and thus a migration to stronger hashes is believed to be mandatory. These attacks broke the strong collision resistance, not preimage (oneway) resistance though. That means that it is still infeasible for an attacker to generate a particular input to a hash function that is guaranteed to produce a particular output. Because of this, many of the applications that use cryptographic hashes, such as HMACrelated protocols, password storage or document signing, are only minimally affected by the collision attacks. In the case of document signing, for example, an attacker could not simply fake a signature from an existing document ─ the attacker would have to fool the private key holder into signing a preselected document. Reversing password encryption is not made possible by the attacks either. Constructing a password that works for a given account requires a preimage attack.
Practically, these collision attacks suggest the acceleration of upgrading systems that use hash functions. Three viable approaches for improving the application security are:

Replace the hash function with a stronger one. The most commonly suggested approach is to simply employ SHA2 hash functions, possibly truncating the output to 160 bits for backward compatibility with SHA1 in which case extra special care must be taken to prevent a “man in the middle” attack from downgrading a SHA2 session into a more vulnerable SHA1 session.
Moreover, all new applications and protocols must be designed to have better collision resistance. These applications and protocols need to be able to accommodate new hash standards as they are developed.

Alter the protocol so that it no longer requires that the hash function be collision resistant. A recent proposal suggests adding randomness to hash functions [11]. To implement this, the application must have a good source of randomness and must alter the protocol.

Implement simple message preprocessing to convert plaintext messages into a form that makes all existing collision attacks inapplicable. This approach can be accomplished with minimal code change [12]. This practical alternative is appealing for applications which want to extend the secure life of SHA1.
The bottom line:

Don’t use MD4.

Don’t use MD5.

Don’t use HAVAL.

Don’t use RIPEMD.

Don’t use SHA0.

While SHA1 is showing signs of significant problems in some areas, SHA1 remains stronger than others. However, avoid using SHA1 if possible because it is next up to be cracked.

Use SHA2 hash functions for now and wait for more collisionresistant hash functions. The SHA2 standard is currently resisting known SHA1 attacks. Theoretical attacks against SHA2 hash functions may take a few years to turn into practical attacks. However, they are all potentially vulnerable as well because of their same design principal as other MD4 family hash functions.

VSH is about the best generally published hash function [10], but it needs to have more peer review before it can be seriously considered.

Use alternative approaches described in [1112].

Conclusion
The “Chinese” attacks on hashes are remarkable in the cryptographic area. It makes people eagerly upgrade their systems to employ better hash functions as well as develop new and more collisionresistant hash functions to better serve their cryptographic purposes. This will greatly help us achieve a more secure digital world.
References

Xiaoyun Wang, Hongbo Yu, Yiqun Lisa Yin, Efficient Collision Search Attacks on SHA0, Crypto'05.

Xiaoyun Wang, Yiqun Lisa Yin, Hongbo Yu, Finding Collisions in the Full SHA1, Crypto'05.

Xiaoyun Wang, Yiqun Lisa Yin, Hongbo Yu, Collision Search Attacks on SHA1, 2005.

Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu, Collisions for Hash Functions MD4, MD5, HAVAL128, RIPEMD, Crypto'04.

Xiaoyun Wang, Xuejia Lai, Dengguo Feng, Hui Chen, Xiuyuan Yu, Cryptanalysis of the Hash Functions MD4 and RIPEMD, Eurocrypto’05.

Xiaoyun Wang, Hongbo Yu, How to Break MD5 and Other Hash Functions, Eurocrypto’05.

Xiaoyun Wang, Dengguo Feng, Xiuyuan Yu, An Attack on Hash Function HAVAL128, Science in China Series E.

A. Joux, Collisions for SHA0, Rump session of Crypto’04, August 2004.

Xiaoyun Wang, Andrew Yao, Frances Yao, New Collision Search for SHA1, Rump session of Crypto’05, August 2005.

Scott Contini, Arjen K. Lenstra, Ron Steinfeld, VSH, an efficient and provable collision resistant hash function, Rump session of Crypto’05, August 2005.

S. Halevi, and H. Krawczyk, Strengthening Digital Signatures via Randomized Hashing, Internet Draft, 2005.

Szydlo M. and Yin Y., CollisionResistant usage of MD5 and SHA1 via Message Preprocessing, IACR Eprint archive 2005 #248.
