The most common formalizations of Occam's Razor, Solomonoff induction and Minimum Description Length, measure the program size of a computation used in a hypothesis, but don't measure the running time or space requirements of the computation. What if this makes a mind vulnerable to finite forms of Pascal's Wager? A compactly specified wager can grow in size much faster than it grows in complexity. The utility of a Turing machine can grow much faster than its prior probability shrinks.
Consider Knuth's up-arrow notation:
- 3^3 = 3*3*3 = 27
- 3^^3 = (3^(3^3)) = 3^27 = 3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3 = 7625597484987
- 3^^^3 = (3^^(3^^3)) = 3^^7625597484987 = 3^(3^(3^(... 7625597484987 times ...)))
In other words: 3^^^3 describes an exponential tower of threes 7625597484987 layers tall. Since this number can be computed by a simple Turing machine, it contains very little information and requires a very short message to describe. This, even though writing out 3^^^3 in base 10 would require enormously more writing material than there are atoms in the known universe (a paltry 10^80).
Now suppose someone comes to me and says, "Give me five dollars, or I'll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people."
Call this Pascal's Mugging.
"Magic powers from outside the Matrix" are easier said than done - we have to suppose that our world is a computing simulation run from within an environment that can afford simulation of arbitrarily large finite Turing machines, and that the would-be wizard has been spliced into our own Turing tape and is in continuing communication with an outside operator, etc.
Thus the Kolmogorov complexity of "magic powers from outside the Matrix" is larger than the mere English words would indicate. Therefore the Solomonoff-inducted probability, two to the negative Kolmogorov complexity, is exponentially tinier than one might naively think.
But, small as this probability is, it isn't anywhere near as small as 3^^^^3 is large. If you take a decimal point, followed by a number of zeros equal to the length of the Bible, followed by a 1, and multiply this unimaginably tiny fraction by 3^^^^3, the result is pretty much 3^^^^3.
Most people, I think, envision an "infinite" God that is nowhere near as large as 3^^^^3. "Infinity" is reassuringly featureless and blank. "Eternal life in Heaven" is nowhere near as intimidating as the thought of spending 3^^^^3 years on one of those fluffy clouds. The notion that the diversity of life on Earth springs from God's infinite creativity, sounds more plausible than the notion that life on Earth was created by a superintelligence 3^^^^3 bits large. Similarly for envisioning an "infinite" God interested in whether women wear men's clothing, versus a superintelligence of 3^^^^3 bits, etc.
The original version of Pascal's Wager is easily dealt with by the gigantic multiplicity of possible gods, an Allah for every Christ and a Zeus for every Allah, including the "Professor God" who places only atheists in Heaven. And since all the expected utilities here are allegedly "infinite", it's easy enough to argue that they cancel out. Infinities, being featureless and blank, are all the same size.
But suppose I built an AI which worked by some bounded analogue of Solomonoff induction - an AI sufficiently Bayesian to insist on calculating complexities and assessing probabilities, rather than just waving them off as "large" or "small".
If the probabilities of various scenarios considered did not exactly cancel out, the AI's action in the case of Pascal's Mugging would be overwhelmingly dominated by whatever tiny differentials existed in the various tiny probabilities under which 3^^^^3 units of expected utility were actually at stake.
You or I would probably wave off the whole matter with a laugh, planning according to the dominant mainline probability: Pascal's Mugger is just a philosopher out for a fast buck.
But a silicon chip does not look over the code fed to it, assess it for reasonableness, and correct it if not. An AI is not given its code like a human servant given instructions. An AI is its code. What if a philosopher tries Pascal's Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override everything else in the AI's calculations? What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?
How do I know to be worried by this line of reasoning? How do I know to rationalize reasons a Bayesian shouldn't work that way? A mind that worked strictly by Solomonoff induction would not know to rationalize reasons that Pascal's Mugging mattered less than Earth's existence. It would simply go by whatever answer Solomonoff induction obtained.
It would seem, then, that I've implicitly declared my existence as a mind that does not work by the logic of Solomonoff, at least not the way I've described it. What am I comparing Solomonoff's answer to, to determine whether Solomonoff induction got it "right" or "wrong"?
Why do I think it's unreasonable to focus my entire attention on the magic-bearing possible worlds, faced with a Pascal's Mugging? Do I have an instinct to resist exploitation by arguments "anyone could make"? Am I unsatisfied by any visualization in which the dominant mainline probability leads to a loss? Do I drop sufficiently small probabilities from consideration entirely? Would an AI that lacks these instincts be exploitable by Pascal's Mugging?
Is it me who's wrong? Should I worry more about the possibility of some Unseen Magical Prankster of very tiny probability taking this post literally, than about the fate of the human species in the "mainline" probabilities?
It doesn't feel to me like 3^^^^3 lives are really at stake, even at very tiny probability. I'd sooner question my grasp of "rationality" than give five dollars to a Pascal's Mugger because I thought it was "rational".
Should we penalize computations with large space and time requirements? This is a hack that solves the problem, but is it true? Are computationally costly explanations less likely? Should I think the universe is probably a coarse-grained simulation of my mind rather than real quantum physics, because a coarse-grained human mind is exponentially cheaper than real quantum physics? Should I think the galaxies are tiny lights on a painted backdrop, because that Turing machine would require less space to compute?
Given that, in general, a Turing machine can increase in utility vastly faster than it increases in complexity, how should an Occam-abiding mind avoid being dominated by tiny probabilities of vast utilities?
If I could formalize whichever internal criterion was telling me I didn't want this to happen, I might have an answer.
I talked over a variant of this problem with Nick Hay, Peter de Blanc, and Marcello Herreshoff in summer of 2006. I don't feel I have a satisfactory resolution as yet, so I'm throwing it open to any analytic philosophers who might happen to read Overcoming Bias.
In retrospect, I think Eliezer should not have focused on that as much as he did. Let's cut to the core of the issue: How should an AI handle the problem of making choices, which, maybe, just maybe, could have a huge, huge effect?
I think Eliezer overlooked the complexity inherent in a mind...the complexity of the situation isn't in the number; it's in what the things being numbered are. To create 3^^^^3 distinct, complex things that would be valued by a posthuman would be an incredibly difficult, time-consuming task. Of course, at this moment, the AI doesn't care about doing that; it cares whether or not the universe is already running 3^^^^3 of these things. I do think a program to run these computations might be more complex than writing a program to simulate our physics, but stepping back, it would not have to be anywhere near log_2(3^^^^3) bits more complex. Really, really bad case of scope insensitivity on my part.
My first comment was wrong. That argument should have been the primary argument, and the other shouldn't have been in there, at all...but let's step back from Eliezer's exact given situation. This is a general problem which applies to, as far as I can see, pretty much any action an AI could take (see Tom_McCabe2's "QWERTYUIOP" remark).
Let's say the AI wants to save a drowning child. However, the universe happens to care about this single moment in time, and iff the AI saves the child, 3^^^^3 people will die instant instantly, and then the AI will be given information to verify that this has occurred with high probability. One of the simplest ways for the universe-program to implement this is:
If (AI saves child), then reset all bits in that constantly evolving 3^^^^3-entry long data structure over there to zero, send proof to AI. Else, proceed normally.
Note that this is magic. Magic is that which cannot be understood, that which correlates with no data other than itself. The code could just as easily be this:
If (AI saves child), then proceed normally. Else, reset all bits in that constantly evolving 3^^^^3-entry long data structure over there to zero, send proof to AI.
Those two code segments are equally complicated. The AI shouldn't weight either higher than the other. For each small increment in complexity to the "malevolent" code you make from there, to have it carry out the same function, I contend that you can make a corresponding increment in the "benevolent" code to do the same thing.
If our universe was optimized to give us hope, and then thwart our values, there's nothing even an AI can do about that. An AI can only optimize that which it both understands, and is permitted to optimize by the universe's code. The universe's code could be such that it gives the AI false beliefs about pretty much everything, and then the AI would be unable to optimize anything.
If the "malevolent" code runs, then the AI would make a HUGE update after that, possibly choosing not to save any drowning children anymore (though that update would be wrong if the code were as above...overfitting). But it can't update on the possibility that it might update - that would violate conservation of expected evidence. All disease might magically immediately be cured if the AI saves the drowing child. I don't see how this is any more complex.
So, this is what I contest. If one was really that much more likely, the AI would have already known about it (cf. what Eliezer says in "Technical Explanation": "How would I explain the event of my left arm being replaced by a blue tentacle? The answer is that I wouldn't. It isn't going to happen....If I was worried I might someday need a clever excuse for waking up with a tentacle, the reason I was nervous about the possibility would be my explanation."). An AI is designed to accomplish this task as best as is possible. I noticed my confusion when I recalled this paper referring to AIXI I'd previously taken a short look at. The AI won on Partially Observable Pacman; it did much better than I could ever hope to do (if I were given the data in the form of pure numerical reward signals, written down on paper). It didn't get stuck wondering whether it would lose 2,000,000 points when the most it had ever lost before was less than 100.
I know almost nothing about AI. I don't know the right way we should approximate AIXI, and modify it so that it knows it is a part of its environment. I do know enough about rationality from reading Less Wrong to know that we shouldn't shut it off just because it does something counterintuitive, if we did program it right. (And I hope to one day make both of the first two sentences in this paragraph false.)