Transform it. It's sometimes possible to transform a hard problem in one domain to an easier problem in another domain. This is similar to, but different from, going meta. Going meta is exactly what it says on the box, involving an increase in abstraction, but no change in domain, so that the problem looks the same, but with more generic features. Transforming a problem doesn't necessarily change its abstraction level, but does change the domain so that the problem looks completely different.
My favorite example comes from the field of lossless data compression. Such algorithms try to find patterns in input data, and represent them with as few bits of output data as possible. (Lossy data compression algorithms, like MP3 and JPEG, discard data that humans aren't expected to notice. Discarding is the most effective compression method of all, but lossless algorithms can't get away with that.)
One way to design a lossless data compression algorithm is to directly attack the problem, building up a list of patterns, which you can refer back to as you notice them occurring again. This is called dictionary coding, and you're already familiar with it from the many implementations (like ZIP and gzip) of the Lempel-Ziv algorithm and its many derivatives. LZ is fast, but it's old (dating from 1977-1978) and doesn't compress very well.
There are other approaches. As of several years ago, the most powerful lossless data compression algorithm was Prediction by Partial Matching, developed in 1984 according to Wikipedia. PPM uses Fancy Math to predict future symbols (i.e. bytes) based on past symbols. It compresses data really well, but is rather slow (this is why it hasn't taken over the world). The algorithm is different, but the approach is the same, directly attacking the problem of pattern matching.
In 1994, a new algorithm was developed. Michael Burrows and David Wheeler discovered something very interesting: if you take a string of symbols (i.e. bytes), put every rotation of it in a matrix, and sort the rows of that matrix lexicographically (all of this might sound hard, but it's easy for a competent programmer to write, although doing so efficiently takes some work), then this matrix has a very special property. Every column is a permutation of the input string (trivial to see), the first column is the symbols of the input string in sorted order (also trivial to see), and the last column, although jumbled up, can be untangled into the original input string with only 4 bytes (or 8 bytes) of extra data. The fact that this transformation can be undone is what makes it useful (there are lots of ways to irreversibly jumble strings). (Going back to the overall point, transforming a problem isn't useful until you can transform the solution back.)
After undergoing this (reversible) Burrows-Wheeler Transformation (BWT), your data looks totally different. It's easier to see than to explain. Here's some text that you might have seen before:
Three~Rings~for~the~Elven-kings~under~the~sky,#Seven~for~the~Dwarf-lords~in~the
ir~halls~of~stone,#Nine~for~Mortal~Men~doomed~to~die,#One~for~the~Dark~Lord~on~
his~dark~throne#In~the~Land~of~Mordor~where~the~Shadows~lie.#One~Ring~to~rule~t
hem~all,~One~Ring~to~find~them,#One~Ring~to~bring~them~all~and~in~the~darkness~
bind~them#In~the~Land~of~Mordor~where~the~Shadows~lie.#$
And here it is after undergoing the BWT:
.me,,.,,#emeylnfee~~~##~~~~~~~##~##~~~~#~~$eeeeeklffr,eeeeeemmlsoseonoeensrndss
sddsdoefrrrnneenrnddegkgdggsrrhht~~h~LLwDdd~~nnnrnne~~n~~rraarnniiihhhhhnnnehhh
nnhrrlmrhhhhhvMvdhhnSrooo~~~~~nnnnnnSS~ttttttttttttttww~Ttdll~~bfNrRRRRkehrr-rs
lalu~~aaa-lEeeeeoeeeoIIiiaaaiiuooOOOiOkiiiiiitttt~~~o~rtdffffddLMMlMddoioooeooo
oooeehabaaaho~sigdwwlg~e~r~~~~~~~~~~~~~~~~~~~sr~leD~~ook
If you follow how the BWT is performed, what it does is bring symbols with similar "contexts" together. While sorting that big matrix, all of the rows beginning with "ings" will be sorted together, and most of these rows will end with "R" (each row is a rotation of the input text, so it "wraps around"). The BWT consumes text that is full of rich patterns (i.e. context) in the original domain (here, English) and emits text that has repetitive characters, but with no other structure.
It's hard to identify patterns (i.e. compress) in input text! It's easier to deal with repetitive characters. Solving the problem of compression is easier in the Burrows-Wheeler domain.
If you throw another transformation at the data (the Move-To-Front transformation, which is also reversible), then these repetitive but different characters (eeeoeeeoIIiiaaaiiuoo, etc.) become numbers that are mostly 0, some 1, few 2, and a scattering of higher numbers. (Everything is bytes, but I'm trying to keep it simple for non-programmers). It's really really easy to compress data that looks like this. So easy, in fact, that one of the oldest algorithms ever developed is up to the job. (Not Euclid's, though.) The Huffman algorithm, developed in 1952, efficiently compresses this kind of data. Arithmetic coding, which is newer, is optimal for this kind of data. Anyways, after applying the BWT, everything afterwards is much easier.
(I've glossed over Zero-Length Encoding, which is a transformation that can be applied after MTF but before Huffman-or-arithmetic that further increases compression quality. It's yet another example of how you can transform yourself out of an inconvenient domain, though.)
(Note: I haven't kept up with recent developments in lossless data compression. 7-Zip is powered by LZMA, which as the acronym suggests is Lempel-Ziv based but with special sauce (which resembles PPM as far as I can tell). However, as it was developed in 1998, I believe that the history above is a reasonably applicable example.)
(Edit: Replaced underscores with twiddles in the BWT example because apparently underscores are as good as stars in Less Wrong formatting. Second edit: Further unmangling. Less Wrong really needs pre-tags. Third edit: Trying four spaces. Random thought: My comments need changelogs.)
Four spaces worked! Thank you.
Effective transformations are uncommon, but when one is possible and you find it, you win. Transformation can't really deal with inherently hard problems, but it can show you that a problem you thought was hard is actually easy if you look at it in an alien way.
I can think of another example, but unfortunately it's even more technical. (Not all hard problems are as easy to state as the Four Color Theorem! But like the BWT, this one directly affects the real world.) C++ programmers have been dealing with a hard problem for decades - the language allows high abstraction to coexist with high performance, but it's a little too fond of copying memory around. This takes time (i.e. decreases performance) for no useful benefit. Recently, scary wizards discovered that the problem of identifying which copies are necessary and which are not can be transformed into the problem of distinguishing lvalues from rvalues (insert animal noises here if you like - this is the technical part). Before this discovery, the language didn't allow programmers to cleanly distinguish between meows and woofs (so knowing this wouldn't have helped ordinary programmers). But compilers have always been able to distinguish between them. It's like Compilers 101, and even C compilers know the difference. Exposing this bit of information to user-programmers in a certain way allows all of those nasty unnecessary copies to be avoided. Someone directly attacking the problem of unnecessary copies would never have invented this in two to the twentieth years.
As a bonus, this solved many hard problems at once - it turns out that once user-programmers can tell meows and woofs apart, they can also solve the "forwarding problem" with "perfect forwarding", making Boost's developers (and me) cry with joy.
To restate, finding an appropriate transformation is a trick that turns a hard problem that you can't directly attack, into an easy problem that you already know how to solve, or can figure out how to solve without very much work. It doesn't solve the problem by itself, but it feels like it does. I would expect most if not all examples to be in mathematics and the hard sciences where abstractions can be cleanly and rigorously transformed, but I would be pleasantly surprised by an example from biology, etc.
(Edit: Avoided unnecessary sentence.)
As a former C++ programmer who doesn't keep track of the current events, I'd appreciate a specific link/keyword to the mechanism you were describing.
This C++0x feature is rvalue references. Wikipedia has an article section about it, although it contains inaccuracies ("The function returning a std::vector temporary need only return a std::vector&&." is wrongity wrong.)
Pleased to make an acquaintance of your secret identity. :-) It's a shame you are not on Facebook.
It's a shame you are not on Facebook.
Why? (I'm genuinely curious. I haven't yet figured out how being on Facebook could be a productive use of my free time, which is unfortunately very limited.)
I'm genuinely curious. I haven't yet figured out how being on Facebook could be a productive use of my free time, which is unfortunately very limited.
The basic functionality is simply to keep track of the list of interesting people, ideally with contact info automagically updating. Following the status updates is an extra that supports the sense of being in touch without explicit effort on your part, which can be made efficient if you block enough of the more prolific/uninteresting updates, and/or restrict the connections to people you know reasonably well.
A perhaps similar example, sometimes I have solved geometry problems (on tests) by using analytical geometry. Transform the problem into algebra by letting point 1 be (x1,y1), point 2 be (x2,y2), etc, get equations for the lines between the points, calculate their points of intersection, and so on. Sometimes this gives the answer with just mechanical application of algebra, no real insight or pattern recognition needed.
When I was a kid this was how I solved all nontrivial geometry problems, because I was much better at algebra than geometry!
many important arbitrarily complicated functions can be represented as sums of much simpler functions
Check out Integral transforms, there's a huge class of transforms that solve a bunch of math problems handily, and it comes with that nice explanation of why they seem to work so often.
The trick consists on discovering a twist you can apply to your complicated object which makes it easily separable into pieces that are susceptible to the function you wanted to apply to your object, such that you have some way of also untwisting your function'd twisted object to get your answer.
Here's an example of Tim Gowers trying to apply the trick to help solve the Erdos Discrepancy Problem, subject of Polymath5 (collaborative mathematics over the internet!). That's less than 24 hours ago. Yeah, it's that useful in mathematics. :D (Edit: looking at it again, I'd say it's a combination of transform and meta, which is a common combo)
Let's try applying that to a recent hard problem: The Preference Utilitarian's Time Inconsistency. (...time passes...) Didn't see a way. How about a reference class that says "cryonics, singularity, superhuman AI etc. are highly probable"?
Great topic.
Ask someone who has been in a similar situation or solved a similar question or tried to solve a similar question. This may seem obvious, but is often ignored.
Be ready to recognize a bad answer when you see it. Sometimes, what looks like a good answer at the outset develops fatal problems. Don't vest too deeply in your answer. But don't be afraid to keep your answer just because others view it as different or scary.
Asking other people who have solved a similar problem to evaluate your answer is very powerful and simple strategy to follow.
Also, most evidence I have seen is that you can only learn how to do a small number of things well. So if you are solving something outside of your area of expertise (which probably includes most problems you'll encounter during your life) then there is probably somebody out there who can give a much better answer than you (although the cost to find such a person may be too great).
Post Note: The fact that you can only learn a few things really well seems to be true with mathematics: as in here. More generally, mastering a topic seems to take ten years or so [PDF] (see Edit below).
Edit: The software does not seem to allow for links that have parentheses, so you would need to copy the whole link--including the ".pdf" at the end--in order to actually pull up the document.
Edit Jan 18: Hex-escaped the parentheses so it should work better.
Your Gian-Carlo Rota link talks about tricks used in proofs, which is different from sub-fields of mathematics. It's certainly true that modern mathematicians and scientists are hyperspecialized, but that's not the same thing.
There is a trick for your parenthesized link: hex-escape the parentheses. Here's the link itself: http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice%28PsychologicalReview%29.pdf Here are cute fluffy kittens.
Thanks for the link assistance.
I agree that my mathematics example is insufficient to prove the general claim: "One will master only a small number of skills". I suppose a proper argument would require an in-depth study of people who solve hard problems.
I think the essential point of my claim is that there is high variance with respect to the subset of the population that can solve a given difficult problem. This seems to be true in most of the sciences and engineering to the best of my knowledge (though I know mathematics best). The theory I believe that explains why this variation occurs is that the subset of people which can solve a given problem use unconscious heuristics borne out of the hard work they put into previous problems over many years.
Admittedly, the problems I am thinking about are kind of like NP problems: it seems difficult to find a solution, but once a solution is found we can know it when we see it. There tends to be a large number of such problems that can be solved by only a small number of people. And the group of people that can solve them varies a lot from problem to problem.
There are also many hard problems for which it is hard to say what a good solution is (e.g. it seems difficult to evaluate different economic policies), or the "goodness" of a solution varies a lot with different value systems (e.g. abortion policy). It does seem that in these instances politicians claim they can give good answers to all the problems as do management consulting companies. Public intellectuals and pundits also seem to think they can give good answers to lots of questions as well. I suppose that if they are right then my claim is wrong. I argue that such individuals and organizations claim to be able to solve many problems but since its hard to verify the quality of the solutions we should take the claim with a grain of salt. We know that individuals who can solve lots of problems would have a lot of status so there is a clear incentive to claim to be able to solve problems that one cannot actually solve if verifying the solution is sufficiently costly.
I also think there is a good reason to think that even for those problems whose solutions are difficult to evaluate we should expect only a small number of people to actually give a good solution. The reason relates to a point made by Robin Hanson (and myself in another comment) which is that in solving a problem you should try to solve many at once. A good solution to a problem should give insight to many problems. Conversely, to understand and recognize a good solution to a given hard problem one should understand what it says about many other problems. The space of problems is too vast for any human being to know but a small portion, so I expect that people who are able to solve a given problem should only be those aware of many related problems and that most people will not be aware of the related problems. Given that in our civilization different people are exposed to different problems (no matter in which field they are employed) we should expect high variance of who can solve which hard problems.
For Wei Dai and everyone else here offering their own tips and tricks: what hard questions have you answered using them, and which tricks helped the most?
The hardest question I've answered is "How are probabilities supposed to work in a multiverse where everything that can happen does happen somewhere?" It's hard to say which tricks helped the most because when I started I didn't have a list of tips and tricks, so I don't know how much it would have helped to try to apply them consciously. But here's what worked in retrospect, in rough order of importance:
Go meta. In this case the meta question was much easier than the object-level question, because I could get the answer from history. Probability theory was created by gamblers, and later formally justified using decision theory, so I knew I should take a decision theory approach to the question.
Don't stop at the first good answer. Here is the first good answer that I might have stopped at. (The website was created by Hal Finney some years ago.)
marks's Solve many hard problems at once. Yep, I was also trying to answer "Does quantum immortality/suicide make sense?" and "How are probabilities supposed to work when mind copying is possible?"
Be ready to recognize a good answer when you see it. Apparently several lesswrongers have discovered the same answer independently, but I was the only one who thought it was a big deal and wrote it up. Others shrank from its counter-intuitiveness, or just didn't realize its significance. I also discussed the idea on my own mailing list, where it failed to make much of a splash.
Explore multiple approaches simultaneously. and Trust your intuitions, but don't waste too much time arguing for them. The main approaches were "first-person" and "third-person", and my approach is mostly third-person, but I also spent a lot of time thinking about the first-person approach. (The first-person approach is more concerned about expectations of subjective experiences.) I think there were too many arguments about which is the right approach, when the time could have been better spent actually exploring them.
Sleep on it. Pretty hard to say how much this helped, but I did often go to sleep thinking about the problem.
What's wrong with UDASSA? If you assume that all possible worlds exist, and that there is a natural measure on them, you can get objective probabilities.
What's wrong with UDASSA?
I answered that at against UD+ASSA, part 1 and against UD+ASSA, part 2. See also the additional argument in indexical uncertainty and the Axiom of Independence.
I think your problem with UD (argument 1, in your second link) arises entirely from the way you choose to think about possible worlds. You built on a bad foundation, discovered the foundation was shaky, and so abandoned the original plan. But the problem was just the foundation, not the plan.
Both common sense and physics talk about the world as consisting of things-with-states. This remains true for possible worlds. Possible worlds defined using everyday concepts (e.g. worlds where "McCain defeated Obama in 2008") or using some exact physical theory (e.g. a billiard-ball world) still have this attribute. If you were to talk about all the possible billiard-ball worlds, there's no problem telling them apart, and it's easy to ask whether there's a natural measure on the set of such worlds.
But at your second link you write
There is an infinite number of universal Turing machines, so there is an infinite number of UD. If we want to use one UD as an objective measure, there has to be a universal Turing machine that is somehow uniquely suitable for this purpose. Why that UTM and not some other? We don't even know what that justification might look like.
So you've adopted a concept of possible world which is something like "possible program for a universal Turing machine". But the problem here is arising entirely from your idiosyncratic concept of possible world.
What does a universal Turing machine look like, from the things-with-states perspective? Consider the primordial example of a UTM, Turing's example of a tape moving back and forth through a read-write head. There are two things with states: the head and the tape. They undergo causal interaction and change states as a result.
Originally Turing was thinking of physical machines like the ones around him, made of metal and electronics and so forth. But suppose we try to take the UTM he described to be a universe in itself. How far can we go in that direction? Again, we can do it, thinking in terms of things-with-states and their interactions. We can think in terms of fundamental entities which have states and which can also be joined to each other in some sense. The tape is a one-dimensional string of entities joined side by side. The head is another entity which interacts with the entities making up the "tape", and whose join relations are also dynamical - it moves up and down the tape.
This all describes a type of possible world, just as the "billard-ball world" of elastically colliding impenetrable spheres in n-dimensional space is also a meaningful type of possible world. The dynamical rules for the Turing tape are the "laws of physics" for this world, each set of initial conditions gives rise to a possible history, and so on.
Now suppose you consider a different set of laws for the Turing-tape world. It still has the same structure, but the states and how they change are different. Is this mysterious? No, you've just defined a different class or subclass of possible worlds. Both classes of world are "computationally universal", but that doesn't mean that the world from one class which performs a particular computation is the same world as the world from the other class which performs that computation.
Yet this is what you're assuming, more or less, when you talk about having to pick a UTM as the UTM, in terms of which possible worlds will be defined. You're treating a possible world as a second-order abstraction (equivalence class of computations) and trying to do without a thing-with-states foundation. If you insist on having such a foundation, this problem goes away. You still have the very formidable problem of trying to enumerate all possible forms of interactions among things-with-states. There is still the even larger problem of identifying and justifying the broadest notion of possible world you are willing to consider. What about worlds where there's no time? What about worlds where there's no "physical law" - changes happen, but for no reason? But your particular problem is an artefact of computational idealism, where reality is supposed to consist of computational or mathematical "entities" which exist independently of anything like "things" or "substances".
Another sample problem domain is crossword puzzles:
Don't stop at the first good answer - You can't write in the first word that seems to fit, you need to see if it is going to let you build the other words.
Explore multiple approaches simultaneously - Same idea, you often can think of a few different possible words that could work in a particular area of the puzzle, and you need to keep them all in mind as you work to solve the other words.
Trust your intuitions, but don't waste too much time arguing for them - This one doesn't apply much because usually people don't fight over crossword puzzles.
Go meta - This is a big one, because usually crossword puzzles have a theme, often quite subtle, and if you look carefully you can see how your answers are building as part of a whole. This then gives you another direction to get ideas for possible answers, as things that would go with the theme, rather than just taking the clues literally.
Dissolve the question - Well, I don't know about this, but I suppose if you get frustrated enough you could throw the puzzle into the trash.
Sleep on it - This works well for this kind of puzzle, I find. Coming back to it in the morning you will often make more progress.
Be ready to recognize a good answer when you see it - Once you have enough crossing words in mind you can have good confidence that you are on the right track and go ahead and write those in, even if you don't have good ideas for some of the linked words. You need to recognize that when enough parts come together and your solution makes them fit, that is a strong clue that you are making progress, even if there are still unanswered aspects.
Most of my "hard" questions are programming designs that need to work around a weakness in the system. They probably don't compare well to the big nasties. "Sleep on it" is probably the most valuable of the ones I have seen listed here. "Transform it" also applies to design well.
Also, what kinds of questions qualify as hard, for the purpose of this thread?
(I do have a couple of tricks, but my questions are nowhere near as hard as some of the philosophy / AI theory problems discussed here on LW.)
Also, what kinds of questions qualify as hard, for the purpose of this thread?
Let's say anything that takes more than a few hours to answer.
That's a good question. I can't actually think of any particularly hard questions I've had to attack strategically to answer.
Expanding on the go meta point:
Solve many hard problems at once
Whatever solution you give to a hard problem should give insight or be consistent with answers given to other hard problems. This is similar in spirit to: "http://lesswrong.com/lw/1kn/two_truths_and_a_lie/" and a point made by Robin Hanson (Youtube link: the point is at 3:31) "...the first thing to do with puzzles is [to] try to resist the temptation to explain them one at a time. I think the right, disciplined way to deal puzzles is to collect a bunch of them: lay them all out on the table and find a small number of hypotheses that can explain a large number of puzzles at once."
His point as I understand was that people often narrowly focus on a limited number of health-related puzzles and that we could produce better policy if we attempted to attack many puzzles at once (consider things such as fear of death, the need to show we care, status-regulation, human social dynamics: particularly signaling loyalty).
Edit: I had originally meant to point out that solving several problems is a meta-thought about solutions to problems: i.e. they should relate to solutions to other problems
Did you know that you can add #t=211 to the end of a YouTube URL to make it start 211 seconds into the vid? Your link would become "point made by Robin Hanson".
One that I sometimes forget, usually by encountering a potential path to an answer and quickly switching into short-term investigation mode:
Estimate the value of obtaining an answer and consider whether that would be worth the time/energy investment. The hard question may sound interesting in an attention-grabbing way, but one's level of fascination moments after hearing it may be a poor indicator of a solutions' actual value.
I like the last bit about status, and would add the following...
Kobayashi's Paradox #1: The more you know about one thing, the more you will be expected to know about everything. However, the more you know about one thing, the less you probably actually know about everything else.
Kobayashi's Paradox #2: Status (or your perception of your own status) is inversely proportional to the amount of productive, creative work you will actually get done. (This suggests that perceptions of status do not update as quickly as they should, or are not based upon a current assessment of the person's worth/productivity.) If you need/want to get work done, shun the distractions of 'status'.
It is paradoxical indeed to offer a suggestion to shun the distractions of status... and name it after yourself.
Kobayashi's Paradox #1: The more you know about one thing, the more you will be expected to know about everything. However, the more you know about one thing, the less you probably actually know about everything else.
Only if you control for intelligence and application, or learning-ability.
"Trust your intuitions, but don't waste too much time arguing for them"
This is an excellent point. Intuition plays an absolutely crucial point in human thought, but there's no point in debating an opinion that (by definition, even) you're incapable of verbalizing your reasons for. Let me suggest another maxim:
Intuitions tell you where to look, not what you'll find.
Nice points. I'd also add;
Spend time thinking about it. It's something that seems obvious, but I know I pass over it more than I should. Since answers seem to come unconsciously it's tempting to just wait for a solution to arise and to go think about other things. Unless you keep the problem in your head during down-time, before going to bed, in the shower, taking a walk etc... then you won't be processing for an answer. It's tricky to coax subconscious thoughts to answer the questions you want, but continual conscious thought on the topic is the most straight forward approach in my experience. If you're thinking about other things, you won't get an answer.
Agreed. I tend to have moments of insight not immediately after starting to think about something, but a few hours later as I continue to mull over it throughout my day to day affairs. You never know when something is just going to click, but you do know when it's not going to click - when you're not thinking about it.
I would also stress the importance of actually listening to what other people have to say and considering it with an open mind. I think people tend to get stuck on a certain train of thought - many mind games and puzzles take advantage of this by presenting a problem that seems impossible given the assumptions that most people make to begin with. Solving the problem requires finding the false assumption, but this is often hard to do. If you happen to go down a dead end when answering a hard question, genuinely considering other people's arguments might help you to identify you error and put you on the right track.
Some further suggestions for handling hard questions, gleaned from work done in mathematics:
Hard questions can often be decomposed into a number of smaller not quite as hard (or perhaps even easy) questions whose answers can be strung together to answer the original question. So often a good first step is trying to decompose the original question in various ways.
Try and find a connection between the hard question and ones that people already know how to answer. Then, see if you can figure out what it would take to bridge the gap between the hard question and what has been answered. For example, if the hard question you are trying to answer relates to human consciousness, perhaps a (not entirely ridiculous) approach would be to first examine questions that researchers have already made headway with, like the neural correlates to consciousness, and then focus on solving the problem by thinking about how one could go from a theory of correlates to a theory of consciousness (maybe this is impossible, but then again maybe it is not). This sort of approach can be a lot faster than solving a problem from scratch, both because it can avoid requiring you to reinvent the wheel, and because sometimes linking a problem to ones that are already solved is a lot easier than solving those problems to begin with.
Don't become attached to your first ideas. If you've had some great ideas that have gotten you close to solving a hard problem, but after a lot of work you still aren't where you want to be, don't get stuck forever in what could be a dead end. From time to time, try to refresh your perspective by starting over from scratch. Often people find it painful starting over again, or are so excited by their first promising ideas that they don't want to let them go, but when a problem is truly hard you may well need to restart the problem again and again before hitting on an approach that really will work. This is a bit like reseeding a random number generator.
Discuss the problem with other very smart people (even if they are not experts in precisely what you are doing) and listen closely to what they have to say. You never know when someone will say something that will trigger a great idea, and the process of explaining what you are working on can cause you to gain a new understanding of the subject or, at least, force you to clarify your thinking.
Keep wrong answers. Failure is knowledge. Repeating the same failures is learning the same thing over and over. Also, reworking old attempts may reveal the missing piece to your current attempt.
Start over. Save your work and start again. This is similar to using another approach but even redoing the current approach from the very beginning can reveal alternate paths, bad assumptions, or whatever else slipped past during that allnighter you pulled after ignoring the Sleep On It hint.
Hoard knowledge for its own sake
Related to "explore multiple approaches" and cognitive diversity; don't let all your learning efforts be focused on a single question, for in all likelihood your hard question is going to require insights from surprising directions. Cultivate the childish pleasures of pure curiosity.
I don't work on cosmic problems in my day-to-day work, but I encounter puzzles fairly frequently (what broke? why did performance at node x decrease at time y? why does z work well sometimes but not others?).
Advice I would give:
Get some data. If you have too much data, arbitrarily pick a subset you can handle.
Look for anomalies. Make histograms. Make other graphs. Last weekend I diagnosed a problem with a Windows service with a graph of the Fourier transform of a time series picked out of log files.
what broke? why did performance at node x decrease at time y?
What, broke? Why did spending at node x increase at time y?
It seems that my brain, upon correcting for one grammatical error in a sentence (a missing capitalisation), is more likely to try to correct for other potential grammatical omissions.
The Tricki is a wiki for mathematical problem-solving techniques. The general problem-solving techniques page has tips that seem more broadly applicable (including some already mentioned here).
Once you have what you think is your final solution, don't implement it immediately. The goal here is to take some time to distance yourself a bit from your work so that you can look at it more objectively and holistically.
Investing a lot of energy into an answer can affect your judgement and make you lose perspective (possibly because of an emotional investment in something you've worked hard on, or because you focused on details for so long that you lost track of the big picture).
I've heard more than once about writers doing this; putting a manuscript in a drawer and re-reading it X weeks/months later, with "fresh eyes". I think it can apply to other fields as well.
Don't stop at the first good answer. In fact having a solution in hand can itself suggest other approaches, also having a workable solution relieves pressure which can help you think more adventurously about the problem.
Explore multiple approaches simultaneously. A good idea for multiple people, but if you are working alone mentally switching between approaches is expensive and time wasting. Working alone it's better to bring one idea at a time to completion before going to another.
I've collected some tips and tricks for answering hard questions, some of which may be original, and others I may have read somewhere and forgotten the source of. Please feel free to contribute more tips and tricks, or additional links to the sources or fuller explanations.
Don't stop at the first good answer. We know that human curiosity can be prematurely satiated. Sometimes we can quickly recognize a flaw in an answer that initially seemed good, but sometimes we can't, so we should keep looking for flaws and/or better answers.
Explore multiple approaches simultaneously. A hard question probably has multiple approaches that are roughly equally promising, otherwise it wouldn't be a hard question (well, unless it has no promising approaches). If there are several people attempting to answer it, they should explore different approaches. If you're trying to answer it alone, it makes sense to switch approaches (and look for new approaches) once a while.
Trust your intuitions, but don't waste too much time arguing for them. If several people are attempting to answer the same question and they have different intuitions about how best to approach it, it seems efficient for each to rely on his or her intuition to choose the approach to explore. It only makes sense to spend a lot of time arguing for your own intuition if you have some reason to believe that other people's intuitions are much worse than yours.
Go meta. Instead of attacking the question directly, ask "How should I answer a question like this?" It seems that when people are faced with a question, even one that has stumped great minds for ages, many just jump in and try to attack it with whatever intellectual tools they have at hand. For really hard questions, we may need to look for, or build, new tools.
Dissolve the question. Sometimes, the question is meaningless and asking it is just a cognitive error. If you can detect and correct the error then the question may just go away.
Sleep on it. I find that I tend to have a greater than average number of insights in the period of time just after I wake up and before I get out of bed. Our brains seem to continue to work while we're asleep, and it may help to prime it by reviewing the problem before going to sleep. (I think Eliezer wrote a post or comment to this effect, but I can't find it now.)
Be ready to recognize a good answer when you see it. The history of science shows that human knowledge does make progress, but sometimes only by an older generation dying off or retiring. It seems that we often can't recognize a good answer even when it's staring us in the face. I wish I knew more about what factors affect this ability, but one thing that might help is to avoid acquiring a high social status, or the mental state of having high social status. (See also, How To Actually Change Your Mind.)