Recent news about the discovery of free-start collisions for the SHA-1 hash function has attracted plenty of attention. But what does this mean for the security of SHA-1, and what should you do if you're using it?
Hash functions take as input a string of (more or less) any length and produce a relatively short "digest" as an output. If the hash function is a good one, any change in the input will produce a change in the digest which is hard (in the cryptographic sense) to predict.
This is handy for many applications. A typical one is digitally signing documents: rather than computing the signature of a large file, which would be slow, I first compute the digest and then sign this. For verification, it is enough to recompute the digest of the document and check the signature matches.
Since a hash function takes inputs from a huge range of values and produces outputs in a much smaller range of values, it is clear that there will exist pairs of inputs that hash to give the same digest. These are called hash collisions. If the hash function is an ideal one, there should be no better way to find these than by trial and error, i.e. by just generating lots of strings and applying the hash function to compute their digests. If the hash function produces a 160-bit digest, it will take on average 2^80 strings before we find a collision (by the so-called birthday paradox).
In the past, hash functions such as MD5 have been shown to be "broken" in the sense that one can craft collisions with far less computational effort. This can lead to catastrophic loss of security: for example, two different documents that give the same digest, so that the signature for one is also valid for the other, or two different SSL certificates, leading to a trusted "rogue CA" allowing man-in-the-middle attacks on all browsers.
While the recent results on SHA-1 don't give a collision, they show that a collision could be calculated with the kind of computing power that is within reach of a well-funded organisation. So how serious would a collision be and what should you do?
Bad SHA-1 and less bad SHA-1
If you are using SHA-1 to make a digest of a certificate before signing it, or accepting such certificates, collisions are a big problem. After collisions were found in MD5, these were used the Flame malware to trick systems into accepting code to execute as if it had been signed by Microsoft (in a slightly roundabout way - see part IV of this presentation). Attackers are certainly no less sophisticated today than in 2010, so this is a realistic attack vector. This is one reason why web browsers are proposing to phase out SHA-1 digest certificates for TLS origins.
If you are using SHA-1 to digest documents for signing, this is also dangerous, for the same reason: with collisions, a signature for one document could be presented as a valid signature of another.If you use SHA-1 inside an HMAC, the problem is much less serious. First, HMAC is still secure even if the underlying hash function is not collision-resistant. It is only necessary that the hash function be a PRF, that is, (roughly) indistinguishable from random by a reasonably-resourced attacker. The only consideration is that SHA-1's output length is only 160 bits. Some agencies such as ENISA already consider this too short for future use.
If you use SHA-1 inside PBKDF2 for storing passwords, you're not in immediate danger from collisions, but you should probably reconsider your choice anyway. As we explained in a recent post, SHA-1 is much easier to implement in hardware and hence leaves password files more vulnerable to brute-force dictionary attackers.
Where is my SHA-1?
You can find SHA-1 in applications and libraries using our Analyzer software.