2 min read

6

Breaking encrypted messages offers a unique challenge, at least in its pure form. In most cryptography puzzles, the method of encryption is known, and so the challenge is finding the key. The most common form of encryption used for this is a simple substitution cipher. This is not a very difficult challenge. Depending on the puzzle, it can be tough, but it isn't something that will really strain your intellect to its maximum.

True cryptography occurs when you just get a message, and no other information. Then, the codebreaker has to find a way to determine the type of code, and then they have to find the key. This type of challenge is good for a rationalist, since you have to make sense of something confusing by running experimental tests, usually by analyzing the text. I've always found codebreaking in this sense to be very enjoyable and useful for training your mind. The main obstacle to doing so is the lack of any system designed for it. There is no website (that I know of) that provides this sort of cryptography challenge. Typically, if you want to do this, you have to find other people who also have an interest in it.

It occurred to me that people on Less Wrong might have an interest in doing something of this nature. Now, obviously, we probably won't be trying to break RSA ciphers, but there are a bunch of methods of encryption that were developed over the years before the invention of computer cryptography that could be used, without us requiring any participants to know how to program computers or do anything like that.

Is there any interest in something like this? I personally don't care how much you know already about cryptography. If you don't know anything I'd be happy to give you some book recommendations.

 

PS There is a difference between "cipher" and "code", but in practical language the two are sometimes interchangeable. For instance, "codebreaker" vs "cipherbreaker" isn't often a very important distinction to make, so I used the more common term. As long as the correct message gets across, you can use either term. Just make sure, if you are saying something that is specific to one particular type of encryption that you use the right vocabulary then.

New Comment
34 comments, sorted by Click to highlight new comments since:

Having been exposed to encryption methods like AES and Threefish, I'm not really interested in code breaking. From this point on, it's just a game; the cryptography people use when they're not just making a puzzle is as close to unbreakable as makes no difference. Ask a computer security guy to get his hands on some information, and he's not going to try attacking the crypto, as such. He's going to look for some other way to pry open the secrets.

For example, one of the more insidious ways of breaking encryption is to watch the encryption machine doing its work. If you have a smart card containing a SUPER SECRET PASSWORD that it will only send out encrypted, you may be able to discover this password by watching how much electrical current the smart card's processor is using while it encrypts the password. If encrypting a 'b' generates one pattern on an oscilloscope, and encrypting a 'Q' generates another, then someone with an oscilloscope can trick the smart card into revealing the SUPER SECRETS contained within. There are so many ways that encryption hardware and software can leak information that such methods are usually much easier than attacking the cipher directly.

Similarly, think of how you might steal someone's Less Wrong account. You could try to guess their password, but this is far too time consuming unless they set their password to "password" or "hunter2" or something similarly obvious. If you happen to be on the same unsecured wireless network, though, you may be able to snoop on their packets and steal their session cookie. This bit of information is passed to the Less Wrong server every time you request a page; it's called "reddit_session" and it tells the server that you're logged in. If you steal someone else's session cookie, you can be logged in as them, too. It's not even hard.

And then, of course, there are things like SQL injection attacks, now also known as Bobby Tables attacks. Those are just fun.

Real security vulnerabilities, to me, are a lot more interesting than breaking deliberately simple ciphers. It teaches a remarkably clever, downright nitpicky way of thinking about how to extract the most use from each piece of information. It's potentially useful, too, which is a nice bonus.

unless they set their password to "password" or "*******" or something similarly obvious.

I don't get it -- why is ******* an obvious password?

It's an on keyboard character, and emprically people are waaaay more likely to use those. It's also just one character, repeated. It has really, really low complexity. Given that there are dictionary attacks and rainbow tables covering any three word combination, with initial capitalisation and common letter number substitutions this is an incredibly weak password. Actually, either of any 7-character password or any password that is just one character, repeated are very weak, their intersection suffers from both problems.

True cryptography occurs when you just get a message, and no other information.

I don't think "true cryptography" is a good name for this situation. It gives people exactly the wrong intuitions for security. Also, in some usage "cryptography" is inventing ciphers, "cryptanalysis" is breaking them, and "cryptology" refers to the entire field.

I also think you greatly understate the fun to be had with the other kind of cryptanalysis, where you read a description of a cipher and look for a way to attack it!

I've been working on an idea for a community puzzle:

There is a prisoner who is sending encoded messages to people via postal mail. It is believed that he is continuing to run a criminal organization via these letters. The letters are written in an unknown language, and contain a section that is obviously coded.

We assume that the cipher he is using is relatively simple because as a prisoner, he only has pencil and paper to work with.

By law, the letters can be read, but can't be stopped unless it can be proven that they are being used for criminal purposes. If we can show this, we can also add 30 years to the prisoner's sentence.

The prisoner has a cellmate who knows or has guessed how the letters are being written, but is only willing to give clues in return for time off his sentence. This cellmate is just as bad as the prisoner in question so we don't want to reduce his prison time any more than we have to.

To just give us the whole secret the cellmate is demanding 30 years off his time, which would set him free. He is also willing to give us smaller clues for less time off to help us read each part of the letters.

Our task is to decrypt and translate the letters to English, while buying the fewest clues possible from the cellmate.

Ideas on running the game:

A few example letters will be provided to players.

Purchased hints will be published to everybody.

Players are encouraged to collaborate.

Players can post a few questions to the prisoner every day. His lawyer will limit the number and type of the questions. He will be honest about anything that can be verified through public records or by asking his family. He may not answer or may lie about anything he believes can't be verified. It may not be obvious if he is telling the truth or lying.

If progress isn't being made on the problem, the prosecutor may become impatient and buy the next clue offered by the cellmate.

This seems like a good idea, but (just to be clear) is the goal to minimize the cell-mate's time off when the ciphertext is cracked, or is it just to crack the code by using questions wisely?

The goal is to crack the code, minimizing help from the cell-mate. Asking the right questions could help. A perfect score is 30 -- reduced by each year taken off of the cell-mate's sentence.

Which kind of cryptography puzzles would appeal to a child who is doing fifth grade math level work?

Have them read Edgar Allan Poe's The Gold-Bug, which is basically the story of someone going through the process of breaking a simple substitution cypher. I found it enjoyable when I read it about that age. (I actually read a "Children's version" of the story, but it kept all the crypto, just lowered the reading level.)

Then give them similar problems to the one in the book and have them break it by looking for character frequencies to guess the cypher.

There's a Sherlock Holmes story, The Adventure of the Dancing Men, that has the same thing.

Try the Vigenère cipher. It's easy if you know Babbage's method (AKA Kasiski examination), but otherwise should be pretty challenging. Another advantage is that it has a lot of factors that could be manipulated to change the difficulty of the task, e.g. the key size or single letters of the plaintext.

I think there is some interest on this site for a post on basic Cryptography. It's a subject often mentioned on this site, but other than a few side explanations to supplement certain articles, there isn't much here on the subject.

The Contact Project is an incredibly fun puzzle along these lines.

After reading some of Schneier's Applied Cryptography, I think I know why there isn't much interest in this: because this kind of cryptography is a case of "security by obscurity" [1], and imposes significant costs on the users of the system, which generally outweight the costs imposed on attackers. Basically, if your cryptosystem's security relies on outsiders not knowing how it works, then you can't add and remove users (everyone must be trusted never to squeal about the method), and you can't get the help of others pointing out potential holes -- the only ones looking will be attackers, not defenders.

I admit, though, that I've wondered myself why you can't improve a cryptosystem by hiding the cypher used, as long as you don't depend on it remaining hidden. It seems to me that if you use a feasibly-unbreakable cypher, then you get the advantage of good protocol review and resilience, but you also significantly narrow down the problem space: instead of "this could be anything", it's "let the computer work it". This makes it possible to automate, eliminating the roadblocks involved in having a human cryptanalyst have to think about it.

That's part of why I'm learning about cryptography -- to see why that reasoning works or doesn't work, and to what extent.

For example, I learn that RSA's public key [1] involves a combination of two integers, each with a different use. Then I look at a signed message using the RSA protocol and it's just a long hex-string, with no clear delimiters on which part means what. YES, I can look this up in a spec -- but as an example of what MinibearRex is proposing, what if the public key I associate with a message could be any one of several (two-integer-output) operations on the "public key" string? It's easy for the recipient to guess a few operations and see if they work with some private key bank.

But for attackers, they have to go through a lot of extra work just to get to the "extract private key from public key" stage -- and the former problem can't just be passed off to a beowulf cluster.

(Btw, to have a more informative topic title, you should change it to "Breaking unknown ciphers" or something like that.)

[1] In violation of the maxim "The enemy knows the system", aka Kerchoff's law.

[2] and public keys should really be called "locks", but I'm not even going to go there.

The output from a cryptographic cipher just looks like random bits. Most protocols like PGP and SSL add some plaintext that tells what cipher was used because it helps interoperability and forward-compatibility, but if you want to use the unadorned ciphertext no one's stopping you.

Trying to keep the protocol secret makes your life more complicated, and complexity is the enemy of security.

So one-time pads are insecure?

Edit: Someone think's I'm being obtuse (or is just downvoting out of anger), so let me clarify. If I send a message encrypted with a one-time pad, then, unlike public key cryptography, the message doesn't announce, "Hey! Here's the cipher we used! Here's what you need to do to break it!" No, it just looks like gibberish, with no hint as to how it's done (unless of course the note says, "use our one-time pad, dude" in plaintext).

They have to do considerable work to even reduce the problem to that of subverting a one-time pad ... and yet it has not been made thereby insecure, even with this extra complexity.

Sometimes one-time pads are insecure, yes. There was a case where a bunch of messages the Soviets had encrypted with one-time pads were cracked by American cryptanalysts, because they had reused some of the pads, because of the difficulties involved with sending fresh pads by guaranteed secure courier. (If that weren't a difficult problem, after all, you could just use the guaranteed secure couriers for the actual messages. That's why people don't normally use one-time pads in practice.)

Now you could say any scheme is insecure if used improperly, and that's true as far as it goes. But the corollary is that part of the practical security of a scheme is that it be easy to use properly.

Here's another example of a measure that actually reduces security: password inputs replacing the letters with asterisks as you type. Yes I know it's designed to improve security in an environment where untrusted third parties may look over your shoulder, and if you are in that sort of environment, then it's necessary. But if you are not, then it compromises security by harshly penalizing the use of long passwords. If people would actually understand that usability is part of security, maybe they would understand the need for a setting to disable that feature.

One-time pads are very simple: both parties have n random bytes of secret data. To encrypt or decrypt an n-byte message, just XOR it together with the random bytes. Don't use the same random bytes twice. This is the entire algorithm. How simple is that?

What ciphergoth was getting at is that secure crypto methods should be simple enough that you can analyze them easily looking for vulnerabilities, and implement them correctly without horrible security-breaking bugs. To this end, it's typical to have just one thing that's secret: the key. Everything else about the algorithm is public, and as simple as possible.

Yes, but in this context, the proposal is that the ciphertext not tell Eves what the protocol is. Maybe the public key's hidden somewhere in it, maybe it's a one-time pad, etc. Added complexity, but not in a way that (AFAICT) subverts the security, and I think ciphergoth was being a bit hasty in applying this reasoning -- it warrants a deeper explanation.

If you have a secure encryption algorithm, then whether or not you tell Eve the algorithm isn't important. Yes, it makes the code-breaking harder for her, but that difficulty is a drop in the bucket, negligible compared to the difficulty of guessing the key.

Proper crypto must be secure whether or not the algorithm is known to attackers. Go ahead and keep the algorithm secret if you really want to, but you needn't bother.

So it adds no significant difficulty when the plaintext is in a foreign language with few translators you have access to? It was pointless for the US military to use Navajo code-talkers? The shortage of Arabic translators imposes no notable cost on the CIA's eavesdroppers?

Those things are difficult, sure, and I never said otherwise. But I'm not sure you appreciate just how staggeringly hard it is to break modern crypto. Navajo code-talkers are using a human language, with patterns that can be figured out by a properly determined adversary. There are quite a lot of people who can translate Arabic. Those are nowhere near the difficulty of, say, eavesdropping on a message encrypted with AES-128 when you don't know the key. Or finding a collision with a given SHA-256 hash. Those things are hard.

[-][anonymous]10

Generally, when a security system is broken, it's not because of the "core" algorithm (RSA/AES etc) being broken, it's because of other flaws in the system. If you're keeping the system secret, you're making things a bit harder for the bad guys (who have to play some guessing game, or get hold of a copy of your program and reverse-engineer it), but you're also stopping it from getting the examination it needs from good-guy experts (who have better things to do with their lives than try to understand your disassembled source code).

But the key aspects of the code have been reviewed -- it's just that it's no longer in a format that can algorithmically be passed to a breaker, and requires intelligent thought to get it to that stage, which would seem to put a bottleneck on attacks.

[-][anonymous]10

It's been reviewed by you. Unless you're a three-letter agency, that's extremely unlikely to be thorough enough to say with any confidence that it's secure.

Hm, actually, it depends on what you're trying to be secure against. If, say, you're running a website with a standard installation of something, it can be worth changing it a little bit so that automated scanning tools won't be able to exploit flaws in it.. There won't be huge benefit against people deliberately targetting you, though.

Yes, one-time-pads are insecure; they have no mechanism for message integrity. However, that's a side issue.

There's a reason our files tend to have things like magic bytes at the beginning that tell us what sort of file they are; our lives would be more complicated if these things are missing. Direct cryptanalysis is generally the least of our security worries. Measures like those you propose make things stronger where they are already strong enough at the cost of making them weaker where they are already weak.

Key management is hard. While the algorithm is simple and easy to implement, keeping the one-time pads secret may add the complexity that ciphergoth refers to.

Some professionally designed crypto systems have turned out to have serious flaws. In general, if your system is secret you don't get the advantage of having the entire crypto community look it over and decide if it is reliable.

Moreover, for many practical applications of cryptography you need to interact with a large number of people who need to have some idea how the protocol works. For example, if I want people to be able to email me in a secure fashion even if I've never met them, I need something like an RSA public key. There's no way around that. Similarly, if I'm a large organization, like say a bank or an online business that needs to send a lot of secure data to a variety of systems, each of those systems needs to know about it. Security by obscurity isn't just a bad idea in that sort of context, it is essentially impossible.

For example, I learn that RSA's public key [1] involves a combination of two integers, each with a different use. Then I look at a signed message using the RSA protocol and it's just a long hex-string, with no clear delimiters on which part means what. YES, I can look this up in a spec -- but as an example of what MinibearRex is proposing, what if the public key I associate with a message could be any one of several (two-integer-output) operations on the "public key" string? It's easy for the recipient to guess a few operations and see if they work with some private key bank.

I'm not sure I follow this line of logic. The way RSA works the sender looks up the recipient's public key. How do you intend for the sender to decide what to do?

I'm not sure I follow this line of logic. The way RSA works the sender looks up the recipient's public key. How do you intend for the sender t to decide what to do?

Guess a few (predisclosed, known) permutations of the key. A nightmare for attackers (who have to guess what the public key is), but easy for the recipient to recover.

I'm not sure I follow. If the public key is public, then the attackers don't have to guess. If the public key isn't known how is a sender supposed to encrypt it?

The public key is known; its association with a particular user is not. Through a separate channel [1], members of the club were given (the very short message of) transformations they can apply to a real public key. The sender uses the real public key but labels it as having the transformed one. When the ciphertext decrypts to garbage, attackers have to figure out what the real key is, requiring un-automatable analysis.

The recipient need only try a few reverse transformations to get back the true public key.

[1] which makes it unusable for the context of e.g. banks that you discussed, but the topic was theoretical situations where this can increase security

I don't see what your transformations are buying you. You don't have to label an encrypted message with its key at all. The intended recipient knows their own key. So your proposal is equivalent to telling your "public" key only to people who are supposed to send you messages.