The security of RSA relies on the fact that factoring the public modulus of a key is computationally hard: for example, if the modulus is of the form p.q, where p and q are large primes, there is no known efficient algorithm for obtaining the factorisation. But suppose that we have a large number of keys available: we can try to compute the pairwise GCDs of these moduli. If two of them happen to share a common factor (that is, one is p.q and the other one is p.q’), their GCD will be p and it will be possible to factor these two keys.
An algorithm known as Batch-GCD makes it possible to compute the GCDs of a large set of numbers in a tractable way. It has become a standard technique in the cryptanalyst’s toolbox for investigating security of RSA keys. The birthday paradox makes it possible to quantify the frequency of these common factors, and they should be rare enough that this does not create a security risk. However, this assumes a good distribution of the generated prime numbers. That is why a bias in the key generation algorithm is fatal for the security of the key. Since 2012, several papers applied the method to keys found on the Internet and elsewhere resulting in tens of thousands of broken keys.
Here at Cryptosense we have a fast OCaml Batch-GCD implementation. To test it out, we downloaded a large number of public SSH keys from GitHub, which can be obtained from their API. We’ve informed GitHub of the results and will continue to update them. In recent days, results have emerged about the insecurity of some Github SSH keys for other reasons, so we felt the time was right to publish our findings.
TL;DR: Results
We found lots of small factors, then applied ECM factoring to the residues, resulting in a dozen fully factored 2048-bit keys and more than a hundred which are partially factored.
More details
Looking at the fully factored private keys, none of them were factored by a simple large shared prime. Instead, we typically found one or more small factors. After the first results, we started running trial division by the first 20 million primes. This resulted in a number of new factors. We then took the residues of the hundred or so moduli with one or more small factors, and applied elliptic curve factorisation à la Lenstra (we used the Zimmerman et al implementation). This has the specificity of finding the small factors first. So far 12 keys have succumbed to this method (that is, the remaining residue passes the Miller-Rabin primality test, so is almost certainly prime).
While there are valid reasons to use an RSA modulus consisting of the product of several large primes, in this case we found several small prime factors and one large one. Taken in context, this seems to indicate either a flawed key generator that is being used by a tiny number of people, or a corner-case flaw in a more widely used key generator that only exhibits itself very very rarely. It could also be something more banal like a corrupted single character somewhere in the base-64 encoding of the key. However, it’s impossible to identify how SSH keys were generated from the resulting public key, so for the moment we are in the dark. We have asked GitHub to help us contact the users to investigate this further. We will continue to update this post with further information.