If you know a lot about passwords, you probably know that it’s good to choose a password from a distribution with a lot of “entropy”. The (Shannon) entropy of a distribution is defined as:
This definition is an answer to the question “how much information is encoded in a chosen password, on average?” This is also the approximate number of bits we need to specify which password we picked. An adversary would need to try at least passwords in order to guess yours, on average.[1]
This is typically how people think about choosing strong passwords, and it works well in the case where we’re choosing among equally likely passwords. But it breaks horribly when we allow our distribution to be non-uniform. Because choosing good passwords is about memorableness as well as sheer strength, we might sometimes want to use complicated password selection schemes that don’t necessarily give uniformly distributed selections, and Shannon entropy can really screw up here.
A comically bad password selection scheme
Consider the following password selection algorithm:
- Roll a 20-sided die.
- If the die comes up 20, choose a random password from possibilities.
- Otherwise, choose the password
password
.
Of course, this is a terrible password selection strategy. 19 times out of 20, your password will be the very first one the attacker tries, if they know your strategy. But the entropy here doesn’t reflect that - for example, if we pick a random 1000-character lowercase password when we get a 20, this has an overall entropy of about 235 bits. Entropy tells us that we have an extremely strong password, since on average it will take a long time to crack the password. But of course in the typical case, it will take no time at all.
Clearly something is wrong here. How can we fix it?
Min-entropy
The first thing to notice is that, as we’ve already implied, we’re engaged in a game of strategy with any would-be password crackers. We pick some strategy for choosing passwords, and then our opponent picks a strategy for guessing them. Our opponent’s best response is going to be to try the most likely passwords first.
So how should we really measure our password selection strategies? One way would be to use the information content of the most likely password, instead of the average information content:
This is called the min-entropy, and it’s nice for several reasons, not least because when is uniform, .
An issue with this method, though, is that it doesn’t let us distinguish the d20 scheme above with 1000-character passwords on a high roll, from a similar scheme with only 3-character passwords on a high roll. Even though the scheme is bad in both cases, it’s still better in the former case than the latter, if only by a little.
But I think the most important thing about the min-entropy is that it gives us a conservative estimate that we can use to compare password selection schemes, while Shannon entropy gives something more like a central estimate. If the min-entropy is bits, we can feel really confident that it will take guesses to crack our typical passwords, even if the distribution isn’t uniform. If one scheme’s min-entropy is better than another’s Shannon entropy, we can feel secure that the first scheme is better.
Thanks to Adam Scherlis for pointing out the issue with Shannon entropy, Katja Grace for discussions about the problem, and Thomas Sepulchre for pointing out a major flaw in the original post.
- ^
See Massey's (very short) "Guessing and Entropy" for a discussion of this lower bound.
Attackers aren't given infinite attempts, and even if they are, God doesn't give them infinite time. So what you really want is to minimize the probability that the attacker guesses your password before giving up.
Suppose the attacking bot can make 200,000 attempts. By the first scheme, the probability the attacker guesses the password is .95 (plus an infinitesimal). By the first scheme but with a three-character password on a high roll, the probability is 1.00 (with 50 different characters, there are only 125K three-character words, so success in the remaining 199,999 attempts is certain).
By this measure, both passwords are weak, but the second is weaker than the first.
My Blackberry locks attackers out after 10 tries. So I would choose n=10 rather than n=200000. By that measure, the first scheme is roughly p=.950000, and the second is roughly p=.950072.
Taking it a bit further, these conditional probabilities depend upon some prior for various types of attacks. The numbers you have are P(guess password | password scheme AND attacker guesses via login attempts). They are both still really bad, but the difference is slightly greater if the password can be attacked by somehow exfiltrating a password hash database and running a billion guesses against that.