Google Announces Full SHA-1 Collision: What It Means

Graham Steel
February 23, 2017

Today Google announced the first public full SHA-1 collision, i.e. the first pair of distinct values that when hashed with the SHA-1 function produce the same digest. This should not come as a surprise - it follows the free-start collisions announced at the end of 2015, and many cryptographers had been anticipating full SHA-1 collisions imminently.

To understand what this means, it helps to look at what happened after collisions were found in the MD5 hash function.The first MD5 collisions were announced in 2004. During 2005, various researchers showed examples of pairs of documents that have the same MD5 hash, or pairs of executable files that have the same MD5 hash.I

t took until 2008 for researchers to find a pair of certificates that have the same MD5 digest and were well-formed enough to have one signed by a trusted CA and the other used as an intermediate certificate. This is a very powerful attack as the intermediate certificate can then be used to sign more website certificates to mount man-in-the-middle attacks on any site. To get to this point, they first had to find a procedure to produce chosen prefix collisions, i.e. a way to take two files M and M' and concatenate two suffixes N and N' such that the MD5 hash of M,N is the same as that for M',N'. However, when the FLAME malware was detected in May 2012, forensics showed it was using pair of certificates with a colliding digest, and that the MD5 collision method was different from that which became public, suggesting that government agencies already had techniques for producing chosen-prefix MD5 collisions well before academic researchers.

For SHA-1, the collision revealed by CWI and Google is an identical prefix collision, which is generally a weaker a result than finding a procedure for a chosen prefix collision (however, note the researchers worked on a specific collision that takes advantage of certain particularities of the PDF format to allow the same collision to used to create any number of colliding pairs of PDFs containing two different embedded JPGs - visual explanation here and site for generating colliding PDFs here).

All this suggests that nation-state level adversaries may well already be able to produce SHA-1 collisions. They may even be able to produce chosen-prefix collisions, and find viable certificates with colliding SHA-1 digests (though since the MD5 attacks, certificates are supposed to contain carefully-placed random data to make this harder). Other less well-resourced actors won't be far behind.In conclusion, it's long past time to dump SHA-1 as a digest function for certificates, documents, binaries and elsewhere.If you use SHA-1 inside an HMAC, the problem is much less serious.

First, HMAC is still secure even if the underlying hash function is not collision-resistant. It is only necessary that the hash function be a pseudo-random function. The security consideration is that SHA-1’s output length is only 160 bits. Some agencies such as ENISA already consider this too short for future use.

If you use SHA-1 inside PBKDF2 for storing passwords, you’re not in danger from collisions, but you should probably reconsider your choice anyway. As we explained in a previous post, SHA-1 is easier and cheaper to implement in hardware than SHA-256 or SHA-512, and hence leaves password files more vulnerable to brute-force dictionary attackers.

Want to know what your applications and dependencies are using SHA-1 for? We have a tool for that :)