You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Omid comments on Stupid Questions March 2015 - Less Wrong Discussion

5 Post author: Gondolinian 03 March 2015 11:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (199)

You are viewing a single comment's thread. Show more comments above.

Comment author: sixes_and_sevens 05 March 2015 09:53:34AM *  7 points [-]

Let's make sure we're talking about cryptographic hashing functions.

Let's say I make a prediction that Barack Obama is a werewolf. I don't want anyone to know I've made this prediction, because otherwise the Secret Werewolf Police will come and eat me, but I do want to lord it over everyone after the fact.

I take the string "Barack Obama is a werewolf" and I put it through a hashing function, (for purposes of this example, the hashing function MD5). This produces the output 37ecb0a3164e6422bedc0f8db82e45ec. The original string about Barack Obama is not recoverable from this output, because MD5 is a lossy function, but anyone else putting that string through the MD5 hashing function will get the same output. The hash output is like a fingerprint for the original string.

So if I put 37ecb0a3164e6422bedc0f8db82e45ec in a public place a year before Barack Obama transforms into a werewolf on live television, after the fact I can give people the original string and they can verify it for themselves against the hash output.

I use a Chrome plugin for most tasks that involve basic encoding of text values. (If I'm honest, I mostly use it for rot13, and more exotic uses aren't that common). There are also online hash generators for different hashing functions. Some popular hashing functions for this purpose are MD5 and SHA1.

Does this answer your question?

Comment author: Omid 05 March 2015 04:10:32PM 1 point [-]

You understood my question better than I did. Thank you.

Is the following paragraph correct?:

If I had unlimited computing power, I could search for all the inputs that return 37ecb0a3164e6422bedc0f8db82e45ec from the MD5 function. Then I could search those inputs and see which ones are meaningful sentences in English, and then make an educated guess at what the message is. But in reality, it would take too much computing power to find even a single string that returns 37ecb0a3164e6422bedc0f8db82e45ec.

Comment author: sixes_and_sevens 05 March 2015 06:57:29PM 3 points [-]

It's not quite correct, but you've broadly got the thrust of it.

When two different inputs produce the same output from a hashing function, this is called a hash collision. Finding collisions in the SHA256 hashing function is part of how bitcoin mining works. It is very computationally expensive (which is kind of the point, re: bitcoin mining), but it's certainly tractable to find some input that generates the same output as another one.

If you were to try and search the space of all possible inputs for MD5, you'd quickly(ish) find an input that collided with the Obama Werewolf input, but it'd be garbage. If you had some system for quickly and reliably generating comprehensible English sentences, and just searched that space, you'd probably find a comprehensible English sentence that collided with it, but it would almost certainly not be the Obama Werewolf sentence. It's worth noting that MD5 will take an input of any length, and the space of possible comprehensible English sentences of unbounded length is basically infinite.

For short snippets of text, it's hard to find two comprehensible English sentences that collide under MD5, but in the link gjm provides above, there's a method for forcing the MD5 hash of a PDF by exploiting how MD5 works. For this reason and others, if you're doing any cryptographic heavy lifting, you probably want to use something more robust than MD5.

Comment author: sketerpot 08 March 2015 05:45:34AM 0 points [-]

If you were to try and search the space of all possible inputs for MD5, you'd quickly(ish) find an input that collided with the Obama Werewolf input, but it'd be garbage.

Really? Last I checked, the best known preimage attack against MD5 was too slow to be practical. Finding collisions is drastically easier, though I don't know any method for doing it with arbitrary plain-text English sentences.