I've collected some tips and tricks for answering hard questions, some of which may be original, and others I may have read somewhere and forgotten the source of. Please feel free to contribute more tips and tricks, or additional links to the sources or fuller explanations.
Don't stop at the first good answer. We know that human curiosity can be prematurely satiated. Sometimes we can quickly recognize a flaw in an answer that initially seemed good, but sometimes we can't, so we should keep looking for flaws and/or better answers.
Explore multiple approaches simultaneously. A hard question probably has multiple approaches that are roughly equally promising, otherwise it wouldn't be a hard question (well, unless it has no promising approaches). If there are several people attempting to answer it, they should explore different approaches. If you're trying to answer it alone, it makes sense to switch approaches (and look for new approaches) once a while.
Trust your intuitions, but don't waste too much time arguing for them. If several people are attempting to answer the same question and they have different intuitions about how best to approach it, it seems efficient for each to rely on his or her intuition to choose the approach to explore. It only makes sense to spend a lot of time arguing for your own intuition if you have some reason to believe that other people's intuitions are much worse than yours.
Go meta. Instead of attacking the question directly, ask "How should I answer a question like this?" It seems that when people are faced with a question, even one that has stumped great minds for ages, many just jump in and try to attack it with whatever intellectual tools they have at hand. For really hard questions, we may need to look for, or build, new tools.
Dissolve the question. Sometimes, the question is meaningless and asking it is just a cognitive error. If you can detect and correct the error then the question may just go away.
Sleep on it. I find that I tend to have a greater than average number of insights in the period of time just after I wake up and before I get out of bed. Our brains seem to continue to work while we're asleep, and it may help to prime it by reviewing the problem before going to sleep. (I think Eliezer wrote a post or comment to this effect, but I can't find it now.)
Be ready to recognize a good answer when you see it. The history of science shows that human knowledge does make progress, but sometimes only by an older generation dying off or retiring. It seems that we often can't recognize a good answer even when it's staring us in the face. I wish I knew more about what factors affect this ability, but one thing that might help is to avoid acquiring a high social status, or the mental state of having high social status. (See also, How To Actually Change Your Mind.)
Transform it. It's sometimes possible to transform a hard problem in one domain to an easier problem in another domain. This is similar to, but different from, going meta. Going meta is exactly what it says on the box, involving an increase in abstraction, but no change in domain, so that the problem looks the same, but with more generic features. Transforming a problem doesn't necessarily change its abstraction level, but does change the domain so that the problem looks completely different.
My favorite example comes from the field of lossless data compression. Such algorithms try to find patterns in input data, and represent them with as few bits of output data as possible. (Lossy data compression algorithms, like MP3 and JPEG, discard data that humans aren't expected to notice. Discarding is the most effective compression method of all, but lossless algorithms can't get away with that.)
One way to design a lossless data compression algorithm is to directly attack the problem, building up a list of patterns, which you can refer back to as you notice them occurring again. This is called dictionary coding, and you're already familiar with it from the many implementations (like ZIP and gzip) of the Lempel-Ziv algorithm and its many derivatives. LZ is fast, but it's old (dating from 1977-1978) and doesn't compress very well.
There are other approaches. As of several years ago, the most powerful lossless data compression algorithm was Prediction by Partial Matching, developed in 1984 according to Wikipedia. PPM uses Fancy Math to predict future symbols (i.e. bytes) based on past symbols. It compresses data really well, but is rather slow (this is why it hasn't taken over the world). The algorithm is different, but the approach is the same, directly attacking the problem of pattern matching.
In 1994, a new algorithm was developed. Michael Burrows and David Wheeler discovered something very interesting: if you take a string of symbols (i.e. bytes), put every rotation of it in a matrix, and sort the rows of that matrix lexicographically (all of this might sound hard, but it's easy for a competent programmer to write, although doing so efficiently takes some work), then this matrix has a very special property. Every column is a permutation of the input string (trivial to see), the first column is the symbols of the input string in sorted order (also trivial to see), and the last column, although jumbled up, can be untangled into the original input string with only 4 bytes (or 8 bytes) of extra data. The fact that this transformation can be undone is what makes it useful (there are lots of ways to irreversibly jumble strings). (Going back to the overall point, transforming a problem isn't useful until you can transform the solution back.)
After undergoing this (reversible) Burrows-Wheeler Transformation (BWT), your data looks totally different. It's easier to see than to explain. Here's some text that you might have seen before:
And here it is after undergoing the BWT:
If you follow how the BWT is performed, what it does is bring symbols with similar "contexts" together. While sorting that big matrix, all of the rows beginning with "ings" will be sorted together, and most of these rows will end with "R" (each row is a rotation of the input text, so it "wraps around"). The BWT consumes text that is full of rich patterns (i.e. context) in the original domain (here, English) and emits text that has repetitive characters, but with no other structure.
It's hard to identify patterns (i.e. compress) in input text! It's easier to deal with repetitive characters. Solving the problem of compression is easier in the Burrows-Wheeler domain.
If you throw another transformation at the data (the Move-To-Front transformation, which is also reversible), then these repetitive but different characters (eeeoeeeoIIiiaaaiiuoo, etc.) become numbers that are mostly 0, some 1, few 2, and a scattering of higher numbers. (Everything is bytes, but I'm trying to keep it simple for non-programmers). It's really really easy to compress data that looks like this. So easy, in fact, that one of the oldest algorithms ever developed is up to the job. (Not Euclid's, though.) The Huffman algorithm, developed in 1952, efficiently compresses this kind of data. Arithmetic coding, which is newer, is optimal for this kind of data. Anyways, after applying the BWT, everything afterwards is much easier.
(I've glossed over Zero-Length Encoding, which is a transformation that can be applied after MTF but before Huffman-or-arithmetic that further increases compression quality. It's yet another example of how you can transform yourself out of an inconvenient domain, though.)
(Note: I haven't kept up with recent developments in lossless data compression. 7-Zip is powered by LZMA, which as the acronym suggests is Lempel-Ziv based but with special sauce (which resembles PPM as far as I can tell). However, as it was developed in 1998, I believe that the history above is a reasonably applicable example.)
(Edit: Replaced underscores with twiddles in the BWT example because apparently underscores are as good as stars in Less Wrong formatting. Second edit: Further unmangling. Less Wrong really needs pre-tags. Third edit: Trying four spaces. Random thought: My comments need changelogs.)