timtyler comments on Tips and Tricks for Answering Hard Questions - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (52)
Transform it. It's sometimes possible to transform a hard problem in one domain to an easier problem in another domain. This is similar to, but different from, going meta. Going meta is exactly what it says on the box, involving an increase in abstraction, but no change in domain, so that the problem looks the same, but with more generic features. Transforming a problem doesn't necessarily change its abstraction level, but does change the domain so that the problem looks completely different.
My favorite example comes from the field of lossless data compression. Such algorithms try to find patterns in input data, and represent them with as few bits of output data as possible. (Lossy data compression algorithms, like MP3 and JPEG, discard data that humans aren't expected to notice. Discarding is the most effective compression method of all, but lossless algorithms can't get away with that.)
One way to design a lossless data compression algorithm is to directly attack the problem, building up a list of patterns, which you can refer back to as you notice them occurring again. This is called dictionary coding, and you're already familiar with it from the many implementations (like ZIP and gzip) of the Lempel-Ziv algorithm and its many derivatives. LZ is fast, but it's old (dating from 1977-1978) and doesn't compress very well.
There are other approaches. As of several years ago, the most powerful lossless data compression algorithm was Prediction by Partial Matching, developed in 1984 according to Wikipedia. PPM uses Fancy Math to predict future symbols (i.e. bytes) based on past symbols. It compresses data really well, but is rather slow (this is why it hasn't taken over the world). The algorithm is different, but the approach is the same, directly attacking the problem of pattern matching.
In 1994, a new algorithm was developed. Michael Burrows and David Wheeler discovered something very interesting: if you take a string of symbols (i.e. bytes), put every rotation of it in a matrix, and sort the rows of that matrix lexicographically (all of this might sound hard, but it's easy for a competent programmer to write, although doing so efficiently takes some work), then this matrix has a very special property. Every column is a permutation of the input string (trivial to see), the first column is the symbols of the input string in sorted order (also trivial to see), and the last column, although jumbled up, can be untangled into the original input string with only 4 bytes (or 8 bytes) of extra data. The fact that this transformation can be undone is what makes it useful (there are lots of ways to irreversibly jumble strings). (Going back to the overall point, transforming a problem isn't useful until you can transform the solution back.)
After undergoing this (reversible) Burrows-Wheeler Transformation (BWT), your data looks totally different. It's easier to see than to explain. Here's some text that you might have seen before:
And here it is after undergoing the BWT:
If you follow how the BWT is performed, what it does is bring symbols with similar "contexts" together. While sorting that big matrix, all of the rows beginning with "ings" will be sorted together, and most of these rows will end with "R" (each row is a rotation of the input text, so it "wraps around"). The BWT consumes text that is full of rich patterns (i.e. context) in the original domain (here, English) and emits text that has repetitive characters, but with no other structure.
It's hard to identify patterns (i.e. compress) in input text! It's easier to deal with repetitive characters. Solving the problem of compression is easier in the Burrows-Wheeler domain.
If you throw another transformation at the data (the Move-To-Front transformation, which is also reversible), then these repetitive but different characters (eeeoeeeoIIiiaaaiiuoo, etc.) become numbers that are mostly 0, some 1, few 2, and a scattering of higher numbers. (Everything is bytes, but I'm trying to keep it simple for non-programmers). It's really really easy to compress data that looks like this. So easy, in fact, that one of the oldest algorithms ever developed is up to the job. (Not Euclid's, though.) The Huffman algorithm, developed in 1952, efficiently compresses this kind of data. Arithmetic coding, which is newer, is optimal for this kind of data. Anyways, after applying the BWT, everything afterwards is much easier.
(I've glossed over Zero-Length Encoding, which is a transformation that can be applied after MTF but before Huffman-or-arithmetic that further increases compression quality. It's yet another example of how you can transform yourself out of an inconvenient domain, though.)
(Note: I haven't kept up with recent developments in lossless data compression. 7-Zip is powered by LZMA, which as the acronym suggests is Lempel-Ziv based but with special sauce (which resembles PPM as far as I can tell). However, as it was developed in 1998, I believe that the history above is a reasonably applicable example.)
(Edit: Replaced underscores with twiddles in the BWT example because apparently underscores are as good as stars in Less Wrong formatting. Second edit: Further unmangling. Less Wrong really needs pre-tags. Third edit: Trying four spaces. Random thought: My comments need changelogs.)