It was meant as a draft of an alternative, based on my limited understanding. I see two thresholds where I think you see one. The recipe is uncomputable, so it would take longer than "long after the heat death of the universe" or any other finite amount of time to finish. Also, the computable functions most similar to it would take more steps than there is time to do.
Yes, approximations of Solomonoff Induction need to be not just computable but also tractable.
Update: Alex Altair has now finished this tutorial!
A while ago I began writing a Solomonoff Induction tutorial, but now it's one of those Less Wrong articles I probably will never have time to write.
But, maybe a few of you can take what I've started, team up, and finish the thing. I'm basically just following along with the Solomonoff induction tutorials from Shane Legg (2008, 2004), but making them simpler and readable by a wider audience. I think this would be a valuable thing for Less Wrong to have, and one that would draw even more traffic to our community, but I don't have time to write it myself anymore!
Who wants to be a hero and finish this thing? The original markdown source is available here.
This is a tutorial for beginners. Those familiar with Solomonoff induction may prefer to read Rathmanner & Hutter (2011).
People disagree about things. Some say that vaccines cause autism; others say they don't. Some say everything is physical; others believe in a god. Some say that complicated financial derivatives are essential to a modern competitive economy; others think a nation's economy will do better without them. It's hard to know what is true.
And it's hard to know how to figure out what is true. Some think you should assume the things you are most certain about and then deduce all other beliefs from your original beliefs. Others think you should accept at face value the most intuitive explanations of your personal experience. Others think you should generally agree with the scientific consensus until it is disproven.
As it turns out, there is an exact recipe for finding truth. It was discovered in the 1960s.
The problem is that you don't have time to follow the recipe. To find the truth to even a simple question using this recipe would require you to follow one step after another until long after the heat death of the universe, and you can't do that.
But we can find shortcuts. Suppose you know the exact recipe for baking a cake requires you to count out one molecule of H2O at a time until you have exactly 0.5 cups of water. If you did that, you might not finish before the heat death of the universe. But you could approximate that part of the recipe by measuring out something very close to 0.5 cups of water, and you'd probably still end up with a pretty good cake.
Similarly, once we know the exact recipe for finding truth, we can try to approximate it in a way that allows us to finish all the steps sometime before the heat death of the universe.
This tutorial explains the exact recipe for finding truth, Solomonoff induction, and also explains some attempts to approximate it in ways that allow us to figure out now what is (probably) true. Fear not; I shall not assume you know anything beyond grade-school math.
Like my Crash Course in the Neuroscience of Human Motivation, this tutorial is long. You may not have time for it; that's fine. But if you do read it, I recommend you read it in sections.
Contents:
Algorithms
At an early age you learned a set of precisely-defined steps — a 'recipe' or, more formally, an algorithm — that you could use to find the largest number in an unsorted list of numbers like this:
The algorithm you learned probably looked something like this:
Other algorithms could be used to solve the same problem. For example, you could work your way from right to left instead of from left to right. But the point is that if you follow this algorithm exactly, and you have enough time to complete the task, you can't fail to solve the problem. You can't get confused about what one of the steps means or what the next step is. Every instruction tells you exactly what to do next, all the way through to the answer.
But not just any set of instructions is a precisely-defined algorithm. Sometimes, instructions are unclear or incomplete. Consider the following instructions based on an article about the scientific method:
This is not an algorithm.
First, many of the terms are not clearly defined. What counts as an observation? What counts as a hypothesis? What would a hypothesis need to be like in order to ‘explain’ the observation? What counts as an experiment that will ‘test’ the hypothesis? What does it mean for experimental results to ‘confirm’ or ‘disconfirm’ a hypothesis?
Second, the instructions may be incomplete. What do we do if we reach step 4 and the experimental results neither ‘confirm’ nor ‘disconfirm’ the hypothesis under consideration, but instead are in some sense ‘neutral’ toward the hypothesis? These instructions don’t tell us what to do in that case.
An algorithm is a well-defined procedure that takes some value or values as input and, after a finite series of steps, generates some value or values as output.
For example, the ‘find the largest number’ algorithm above could take the input {21, 18, 4, 19, 55, 12, 30} and would, after 13 steps, produce the following output: {55}. Or it could take the input {34} and, after 1 step, produce the output: {34}.
What we’re looking for is a precise algorithm that will produce truth as its output.
Induction
The problem of inference is this: We have a collection of observations (data), and we have a collection of hypotheses about the underlying causes of those observations. We’d like to know which hypothesis is correct, so we can use that knowledge to predict future events.
Suppose your data concern a large set of stock market price changes and other events in the world. You’d like to know the processes responsible for the stock market price changes, because then you can predict what the stock market will do in the future, and make some money.
All these hypotheses are possible, but intuitively it seems like some hypotheses are more likely than others. If you’ve seen your daughter access the cookies this way before but have never been burgled, then the ‘daughter hypothesis’ seems more plausible. If some expensive things from your bedroom and living room are missing and there is hateful graffiti on your door at the eye level of a very short person, then then ‘short thief’ hypothesis seems more plausible than before. If you suddenly remember that your friend ate a few cookies and broke the fridge door last night, the ‘broken fridge door’ hypothesis gains credibility. If you’ve never been burgled and your daughter is out of town and you have a habit of moving and eating things while sleepwalking, the ‘sleepwalking’ hypothesis is less bizarre.
But the weight you give to each hypothesis depends greatly on your prior knowledge. What if you had just been hit on the head and lost all past memories, and for some reason the most urgent thing you wanted to do was to solve the mystery of the chair in front of the fridge door? Then how would weigh the likelihood of the available hypotheses?
Occam’s Razor
Consider a different inductive problem. A computer program outputs the following sequence of numbers:
Which number comes next? If you guess correctly, you’ll win $500.
In order to predict the next number in the sequence, you make a hypothesis about the process the computer is using to generate these numbers. One obvious hypothesis is that it is simply listing all the odd numbers in ascending order from 1. If that’s true, you should guess 9 to win the $500.
But perhaps the computer is using a different algorithm to generate the numbers. Suppose that n is the step in the sequence, so that n=1 when it generated ‘1’, n=2 when it generated ‘3’, and so on. Maybe the computer used this equation to calculate each number in the sequence:
If so, the next number in the sequence will be 33. (Go ahead, check the calculations.) But doesn’t the first hypothesis seem more likely?
The principle behind this intuition, which goes back to William of Occam, could be stated:
The principle is called Occam’s razor because it ‘shaves away’ unnecessary assumptions.
For example, think about the case of the missing cookies again. In most cases, the ‘daughter’ hypothesis seems to make fewer unnecessary assumptions than the ‘short thief’ hypothesis does. You already know you have a daughter that likes cookies and knows how to move chairs to reach cookies. But in order for the short thief hypothesis to be plausible, you have to assume that (1) a thief found a way to break into your home, that (2) the thief wanted cookies more than anything else from your home, that (3) the thief was, unusually, too short to reach the top of the fridge without the help of a chair, and many other unnecessary assumptions.
Occam’s razor sounds right, but can it be made more precise, and can it be justified? We will return to those questions later. For now, let us consider another important part of induction, that of updating our beliefs in response to new evidence.
Updating Beliefs
You peek your head out of the trench, trying to get a better look.
Bam! A bullet glance off your helmet and you duck down again.
“Okay,” you think. “I know snipers are rare, but that guy just hit me with a bullet from 400 yards away. I suppose it might still be a regular army troop, but there’s a seriously good chance it’s a sniper, since he hit me from that far away.”
After another minute, you dare to take another look, and peek your head out of the trench again.
Bam! Another bullet glances off your helmet! You duck down again.
“Damn,” you think. “It’s definitely a sniper. No matter how rare snipers are, there’s no way that guy just hit me twice in a row from that distance if he’s a regular army troop. He’s gotta be a sniper. I’d better call for support.”
This is an example of updating beliefs in response to evidence, and we do it all the time.
You start with some prior beliefs, and all of them are uncertain. You are 99.99% certain the Earth revolves around the sun, 90% confident your best friend will attend your birthday party, and 40% sure that the song on the radio you’re listening to was played by The Turtles.
Then, you encounter new evidence — new observations — and update your beliefs in response.
Suppose you start out 85% confident the one remaining enemy soldier is not a sniper, leaving only 15% credence to the hypothesis that he is a sniper. But then, a bullet glances off your helmet — an event far more likely if the enemy soldier is a sniper than if he is not. So now you’re only 40% confident he’s not a sniper, and 60% confident he is. Another bullet glances off your helmet, and you update again. Now you’re only 2% confident he’s not a sniper, and 98% confident he is a sniper.
Now, I’m about to show you some math, but don’t run away. The math isn’t supposed to make sense if you haven’t studied it. Don’t worry; I’m going to explain it all.
There’s a theorem in probability theory that tells you how likely one observation is given some other observations. It’s called Bayes’ Theorem.
At this point you may want to take a break and read either tutorial #1, tutorial #2, tutorial #3, or tutorial #4 on Bayes’ Theorem. I’ll only say a little bit more about Bayes’ Theorem in this tutorial.
The short form of Bayes’ Theorem looks like this:
Let’s unpack this. The H refers to some hypothesis, and the E responds to some evidence. p(x) refers to the probability of x. The pipe symbol | means ‘given’ or ‘assuming’. So, Bayes’ Theorem reads:
You can see how this tells us what we’d like to know. We want to know the probability of some hypothesis — some model of the world that, if correct, will allow us to successfully predict the future — given the evidence that we have. And that’s what Bayes’ Theorem tells us. Now we just plug the numbers in, and solve for p(H|E).
Of course, it’s not easy to “just plug the numbers in.” You aren’t an all-knowing god. You don’t know exactly how likely it is that the enemy soldier would hit your helmet if he’s a sniper, compared to how likely that is if he’s not a sniper. But you can do your best. With enough evidence, it will become overwhelmingly clear which hypothesis is correct.
At this point, I'll let the tutorials on Bayes' Theorem above do the heavy lifting on how to update beliefs. Let me get back to the part of induction those tutorials don't explain: the choice of priors.
The Problem of Priors
In the example above where you're a soldier in combat, I gave you your starting probabilities: 85% confidence that the enemy soldier was a sniper, and 15% confidence he was not. But what if you don't know your "priors"? What then?
When using Bayes' Theorem to calculate your probabilities, your choice of prior can influence your results greatly. But how can we determine the probability of a hypothesis before seeing any data? What does that even mean?
Legg (2008) explains:
But this doesn't solve our problem. It merely points out that other statistical techniques don't fare any better.
What we'd like is to reduce the potential for abusing one's opportunity to select priors, and to use agreed-upon priors when possible. Thus, many standard "prior distributions" have been developed. Generally, they distribute probability equally across hypotheses.
But to solve the problem of priors once and for all, we'd like to have an acceptable, universal prior distribution, so that there's no fuzziness about the process of induction. We need a recipe, and algorithm, for finding out the truth. And there can't be any fuzziness in an algorithm.
Binary Sequence Prediction
[intro to binary sequence prediction]
Solomonoff and Kolmogorov
[summarize the contributions of Solomonoff and Kolmogorov]
Solomonoff Lightsaber
[explain the result: Solomonoff Induction]
Approximations
[survey a few of the approximations of Solomonoff Induction in use today]
Philosophical Implications
[survey of some philosophical implications of unviersal induction]
Notes
a This quote and some of the examples to follow are from Legg (2008).
References
Legg (2008). Machine Superintelligence. PhD thesis, Department of Informatics, University of Lugano.
Rathmanner & Hutter (2011). A philosophical treatise of universal induction.