Followup to: Pascal's Mugging: Tiny Probabilities of Vast Utilities, The Pascal's Wager Fallacy Fallacy, Being Half-Rational About Pascal's Wager Is Even Worse
Short form: Pascal's Muggle
tl;dr: If you assign superexponentially infinitesimal probability to claims of large impacts, then apparently you should ignore the possibility of a large impact even after seeing huge amounts of evidence. If a poorly-dressed street person offers to save 10(10^100) lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10-(10^100) probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky. For the same reason, any evidence you encounter showing that the human species could create a sufficiently large number of descendants - no matter how normal the corresponding laws of physics appear to be, or how well-designed the experiments which told you about them - must be rejected out of hand. There is a possible reply to this objection using Robin Hanson's anthropic adjustment against the probability of large impacts, and in this case you will treat a Pascal's Mugger as having decision-theoretic importance exactly proportional to the Bayesian strength of evidence they present you, without quantitative dependence on the number of lives they claim to save. This however corresponds to an odd mental state which some, such as myself, would find unsatisfactory. In the end, however, I cannot see any better candidate for a prior than having a leverage penalty plus a complexity penalty on the prior probability of scenarios.
In late 2007 I coined the term "Pascal's Mugging" to describe a problem which seemed to me to arise when combining conventional decision theory and conventional epistemology in the obvious way. On conventional epistemology, the prior probability of hypotheses diminishes exponentially with their complexity; if it would take 20 bits to specify a hypothesis, then its prior probability receives a 2-20 penalty factor and it will require evidence with a likelihood ratio of 1,048,576:1 - evidence which we are 1048576 times more likely to see if the theory is true, than if it is false - to make us assign it around 50-50 credibility. (This isn't as hard as it sounds. Flip a coin 20 times and note down the exact sequence of heads and tails. You now believe in a state of affairs you would have assigned a million-to-one probability beforehand - namely, that the coin would produce the exact sequence HTHHHHTHTTH... or whatever - after experiencing sensory data which are more than a million times more probable if that fact is true than if it is false.) The problem is that although this kind of prior probability penalty may seem very strict at first, it's easy to construct physical scenarios that grow in size vastly faster than they grow in complexity.
I originally illustrated this using Pascal's Mugger: A poorly dressed street person says "I'm actually a Matrix Lord running this world as a computer simulation, along with many others - the universe above this one has laws of physics which allow me easy access to vast amounts of computing power. Just for fun, I'll make you an offer - you give me five dollars, and I'll use my Matrix Lord powers to save 3↑↑↑↑3 people inside my simulations from dying and let them live long and happy lives" where ↑ is Knuth's up-arrow notation. This was originally posted in 2007, when I was a bit more naive about what kind of mathematical notation you can throw into a random blog post without creating a stumbling block. (E.g.: On several occasions now, I've seen someone on the Internet approximate the number of dust specks from this scenario as being a "billion", since any incomprehensibly large number equals a billion.) Let's try an easier (and way smaller) number instead, and suppose that Pascal's Mugger offers to save a googolplex lives, where a googol is 10100 (a 1 followed by a hundred zeroes) and a googolplex is 10 to the googol power, so 1010100 or 1010,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 lives saved if you pay Pascal's Mugger five dollars, if the offer is honest.
If Pascal's Mugger had only offered to save a mere googol lives (10100), we could perhaps reply that although the notion of a Matrix Lord may sound simple to say in English, if we actually try to imagine all the machinery involved, it works out to a substantial amount of computational complexity. (Similarly, Thor is a worse explanation for lightning bolts than the laws of physics because, among other points, an anthropomorphic deity is more complex than calculus in formal terms - it would take a larger computer program to simulate Thor as a complete mind, than to simulate Maxwell's Equations - even though in mere human words Thor sounds much easier to explain.) To imagine this scenario in formal detail, we might have to write out the laws of the higher universe the Mugger supposedly comes from, the Matrix Lord's state of mind leading them to make that offer, and so on. And so (we reply) when mere verbal English has been translated into a formal hypothesis, the Kolmogorov complexity of this hypothesis is more than 332 bits - it would take more than 332 ones and zeroes to specify - where 2-332 ~ 10-100. Therefore (we conclude) the net expected value of the Mugger's offer is still tiny, once its prior improbability is taken into account.
But once Pascal's Mugger offers to save a googolplex lives - offers us a scenario whose value is constructed by twice-repeated exponentiation - we seem to run into some difficulty using this answer. Can we really claim that the complexity of this scenario is on the order of a googol bits - that to formally write out the hypothesis would take one hundred billion billion times more bits than there are atoms in the observable universe?
And a tiny, paltry number like a googolplex is only the beginning of computationally simple numbers that are unimaginably huge. Exponentiation is defined as repeated multiplication: If you see a number like 35, it tells you to multiply five 3s together: 3×3×3×3×3 = 243. Suppose we write 35 as 3↑5, so that a single arrow ↑ stands for exponentiation, and let the double arrow ↑↑ stand for repeated exponentation, or tetration. Thus 3↑↑3 would stand for 3↑(3↑3) or 333 = 327 = 7,625,597,484,987. Tetration is also written as follows: 33 = 3↑↑3. Thus 42 = 2222 = 224 = 216 = 65,536. Then pentation, or repeated tetration, would be written with 3↑↑↑3 = 333 = 7,625,597,484,9873 = 33...3 where the ... summarizes an exponential tower of 3s seven trillion layers high.
But 3↑↑↑3 is still quite simple computationally - we could describe a small Turing machine which computes it - so a hypothesis involving 3↑↑↑3 should not therefore get a large complexity penalty, if we're penalizing hypotheses by algorithmic complexity.
I had originally intended the scenario of Pascal's Mugging to point up what seemed like a basic problem with combining conventional epistemology with conventional decision theory: Conventional epistemology says to penalize hypotheses by an exponential factor of computational complexity. This seems pretty strict in everyday life: "What? for a mere 20 bits I am to be called a million times less probable?" But for stranger hypotheses about things like Matrix Lords, the size of the hypothetical universe can blow up enormously faster than the exponential of its complexity. This would mean that all our decisions were dominated by tiny-seeming probabilities (on the order of 2-100 and less) of scenarios where our lightest action affected 3↑↑4 people... which would in turn be dominated by even more remote probabilities of affecting 3↑↑5 people...
This problem is worse than just giving five dollars to Pascal's Mugger - our expected utilities don't converge at all! Conventional epistemology tells us to sum over the predictions of all hypotheses weighted by their computational complexity and evidential fit. This works fine with epistemic probabilities and sensory predictions because no hypothesis can predict more than probability 1 or less than probability 0 for a sensory experience. As hypotheses get more and more complex, their contributed predictions have tinier and tinier weights, and the sum converges quickly. But decision theory tells us to calculate expected utility by summing the utility of each possible outcome, times the probability of that outcome conditional on our action. If hypothetical utilities can grow faster than hypothetical probability diminishes, the contribution of an average term in the series will keep increasing, and this sum will never converge - not if we try to do it the same way we got our epistemic predictions, by summing over complexity-weighted possibilities. (See also this similar-but-different paper by Peter de Blanc.)
Unfortunately I failed to make it clear in my original writeup that this was where the problem came from, and that it was general to situations beyond the Mugger. Nick Bostrom's writeup of Pascal's Mugging for a philosophy journal used a Mugger offering a quintillion days of happiness, where a quintillion is merely 1,000,000,000,000,000,000 = 1018. It takes at least two exponentiations to outrun a singly-exponential complexity penalty. I would be willing to assign a probability of less than 1 in 1018 to a random person being a Matrix Lord. You may not have to invoke 3↑↑↑3 to cause problems, but you've got to use something like 1010100 - double exponentiation or better. Manipulating ordinary hypotheses about the ordinary physical universe taken at face value, which just contains 1080 atoms within range of our telescopes, should not lead us into such difficulties.
(And then the phrase "Pascal's Mugging" got completely bastardized to refer to an emotional feeling of being mugged that some people apparently get when a high-stakes charitable proposition is presented to them, regardless of whether it's supposed to have a low probability. This is enough to make me regret having ever invented the term "Pascal's Mugging" in the first place; and for further thoughts on this see The Pascal's Wager Fallacy Fallacy (just because the stakes are high does not mean the probabilities are low, and Pascal's Wager is fallacious because of the low probability, not the high stakes!) and Being Half-Rational About Pascal's Wager Is Even Worse. Again, when dealing with issues the mere size of the apparent universe, on the order of 1080 - for small large numbers - we do not run into the sort of decision-theoretic problems I originally meant to single out by the concept of "Pascal's Mugging". My rough intuitive stance on x-risk charity is that if you are one of the tiny fraction of all sentient beings who happened to be born here on Earth before the intelligence explosion, when the existence of the whole vast intergalactic future depends on what we do now, you should expect to find yourself surrounded by a smorgasbord of opportunities to affect small large numbers of sentient beings. There is then no reason to worry about tiny probabilities of having a large impact when we can expect to find medium-sized opportunities of having a large impact, so long as we restrict ourselves to impacts no larger than the size of the known universe.)
One proposal which has been floated for dealing with Pascal's Mugger in the decision-theoretic sense is to penalize hypotheses that let you affect a large number of people, in proportion to the number of people affected - what we could call perhaps a "leverage penalty" instead of a "complexity penalty".
Unfortunately this potentially leads us into a different problem, that of Pascal's Muggle.
Suppose a poorly-dressed street person asks you for five dollars in exchange for doing a googolplex's worth of good using his Matrix Lord powers.
"Well," you reply, "I think it very improbable that I would be able to affect so many people through my own, personal actions - who am I to have such a great impact upon events? Indeed, I think the probability is somewhere around one over googolplex, maybe a bit less. So no, I won't pay five dollars - it is unthinkably improbable that I could do so much good!"
"I see," says the Mugger.
A wind begins to blow about the alley, whipping the Mugger's loose clothes about him as they shift from ill-fitting shirt and jeans into robes of infinite blackness, within whose depths tiny galaxies and stranger things seem to twinkle. In the sky above, a gap edged by blue fire opens with a horrendous tearing sound - you can hear people on the nearby street yelling in sudden shock and terror, implying that they can see it too - and displays the image of the Mugger himself, wearing the same robes that now adorn his body, seated before a keyboard and a monitor.
"That's not actually me," the Mugger says, "just a conceptual representation, but I don't want to drive you insane. Now give me those five dollars, and I'll save a googolplex lives, just as promised. It's easy enough for me, given the computing power my home universe offers. As for why I'm doing this, there's an ancient debate in philosophy among my people - something about how we ought to sum our expected utilities - and I mean to use the video of this event to make a point at the next decision theory conference I attend. Now will you give me the five dollars, or not?"
"Mm... no," you reply.
"No?" says the Mugger. "I understood earlier when you didn't want to give a random street person five dollars based on a wild story with no evidence behind it. But now I've offered you evidence."
"Unfortunately, you haven't offered me enough evidence," you explain.
"Really?" says the Mugger. "I've opened up a fiery portal in the sky, and that's not enough to persuade you? What do I have to do, then? Rearrange the planets in your solar system, and wait for the observatories to confirm the fact? I suppose I could also explain the true laws of physics in the higher universe in more detail, and let you play around a bit with the computer program that encodes all the universes containing the googolplex people I would save if you gave me the five dollars -"
"Sorry," you say, shaking your head firmly, "there's just no way you can convince me that I'm in a position to affect a googolplex people, because the prior probability of that is one over googolplex. If you wanted to convince me of some fact of merely 2-100 prior probability, a mere decillion to one - like that a coin would come up heads and tails in some particular pattern of a hundred coinflips - then you could just show me 100 bits of evidence, which is within easy reach of my brain's sensory bandwidth. I mean, you could just flip the coin a hundred times, and my eyes, which send my brain a hundred megabits a second or so - though that gets processed down to one megabit or so by the time it goes through the lateral geniculate nucleus - would easily give me enough data to conclude that this decillion-to-one possibility was true. But to conclude something whose prior probability is on the order of one over googolplex, I need on the order of a googol bits of evidence, and you can't present me with a sensory experience containing a googol bits. Indeed, you can't ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one - evidence I'm a googolplex times more likely to encounter if the hypothesis is true, than if it's false - because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex. You know the old saying about how once you assign something probability one, or probability zero, you can never change your mind regardless of what evidence you see? Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way."
"So no matter what evidence I show you," the Mugger says - as the blue fire goes on crackling in the torn sky above, and screams and desperate prayers continue from the street beyond - "you can't ever notice that you're in a position to help a googolplex people."
"Right!" you say. "I can believe that you're a Matrix Lord. I mean, I'm not a total Muggle, I'm psychologically capable of responding in some fashion to that giant hole in the sky. But it's just completely forbidden for me to assign any significant probability whatsoever that you will actually save a googolplex people after I give you five dollars. You're lying, and I am absolutely, absolutely, absolutely confident of that."
"So you weren't just invoking the leverage penalty as a plausible-sounding way of getting out of paying me the five dollars earlier," the Mugger says thoughtfully. "I mean, I'd understand if that was just a rationalization of your discomfort at forking over five dollars for what seemed like a tiny probability, when I hadn't done my duty to present you with a corresponding amount of evidence before demanding payment. But you... you're acting like an AI would if it was actually programmed with a leverage penalty on hypotheses!"
"Exactly," you say. "I'm forbidden a priori to believe I can ever do that much good."
"Why?" the Mugger says curiously. "I mean, all I have to do is press this button here and a googolplex lives will be saved." The figure within the blazing portal above points to a green button on the console before it.
"Like I said," you explain again, "the prior probability is just too infinitesimal for the massive evidence you're showing me to overcome it -"
The Mugger shrugs, and vanishes in a puff of purple mist.
The portal in the sky above closes, taking with the console and the green button.
(The screams go on from the street outside.)
A few days later, you're sitting in your office at the physics institute where you work, when one of your colleagues bursts in through your door, seeming highly excited. "I've got it!" she cries. "I've figured out that whole dark energy thing! Look, these simple equations retrodict it exactly, there's no way that could be a coincidence!"
At first you're also excited, but as you pore over the equations, your face configures itself into a frown. "No..." you say slowly. "These equations may look extremely simple so far as computational complexity goes - and they do exactly fit the petabytes of evidence our telescopes have gathered so far - but I'm afraid they're far too improbable to ever believe."
"What?" she says. "Why?"
"Well," you say reasonably, "if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we'd be able to create around a googolplex people that way. But that would mean that we, here on Earth, are in a position to affect a googolplex people - since, if we blow ourselves up via a nanotechnological war or (cough) make certain other errors, those googolplex people will never come into existence. The prior probability of us being in a position to impact a googolplex people is on the order of one over googolplex, so your equations must be wrong."
"Hmm..." she says. "I hadn't thought of that. But what if these equations are right, and yet somehow, everything I do is exactly balanced, down to the googolth decimal point or so, with respect to how it impacts the chance of modern-day Earth participating in a chain of events that leads to creating an intergalactic civilization?"
"How would that work?" you say. "There's only seven billion people on today's Earth - there's probably been only a hundred billion people who ever existed total, or will exist before we go through the intelligence explosion or whatever - so even before analyzing your exact position, it seems like your leverage on future affairs couldn't reasonably be less than a one in ten trillion part of the future or so."
"But then given this physical theory which seems obviously true, my acts might imply expected utility differentials on the order of 1010100-13," she explains, "and I'm not allowed to believe that no matter how much evidence you show me."
This problem may not be as bad as it looks; with some further reasoning, the leverage penalty may lead to more sensible behavior than depicted above.
Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a unique position to affect many others (which is why I called it a leverage penalty). At most 10 out of 3↑↑↑3 people can ever be in a position to be "solely responsible" for the fate of 3↑↑↑3 people if "solely responsible" is taken to imply a causal chain that goes through no more than 10 people's decisions; i.e. at most 10 people can ever be solely10 responsible for any given event. Or if "fate" is taken to be a sufficiently ultimate fate that there's at most 10 other decisions of similar magnitude that could cumulate to determine someone's outcome utility to within ±50%, then any given person could have their fate10 determined on at most 10 occasions. We would surely agree, while assigning priors at the dawn of reasoning, that an agent randomly selected from the pool of all agents in Reality has at most a 100/X chance of being able to be solely10 responsible for the fate10 of X people. Any reasoning we do about universes, their complexity, sensory experiences, and so on, should maintain this net balance. You can even strip out the part about agents and carry out the reasoning on pure causal nodes; the chance of a randomly selected causal node being in a unique100 position on a causal graph with respect to 3↑↑↑3 other nodes ought to be at most 100/3↑↑↑3 for finite causal graphs. (As for infinite causal graphs, well, if problems arise only when introducing infinity, maybe it's infinity that has the problem.)
Suppose we apply the Hansonian leverage penalty to the face-value scenario of our own universe, in which there are apparently no aliens and the galaxies we can reach in the future contain on the order of 1080 atoms; which, if the intelligence explosion goes well, might be transformed into on the very loose order of... let's ignore a lot of intermediate calculations and just call it the equivalent of 1080 centuries of life. (The neurons in your brain perform lots of operations; you don't get only one computing operation per element, because you're powered by the Sun over time. The universe contains a lot more negentropy than just 1080 bits due to things like the gravitational potential energy that can be extracted from mass. Plus we should take into account reversible computing. But of course it also takes more than one computing operation to implement a century of life. So I'm just going to xerox the number 1080 for use in these calculations, since it's not supposed to be the main focus.)
Wouldn't it be terribly odd to find ourselves - where by 'ourselves' I mean the hundred billion humans who have ever lived on Earth, for no more than a century or so apiece - solely100,000,000,000 responsible for the fate10 of around 1080 units of life? Isn't the prior probability of this somewhere around 10-68?
Yes, according to the leverage penalty. But a prior probability of 10-68 is not an insurmountable epistemological barrier. If you're taking things at face value, 10-68 is just 226 bits of evidence or thereabouts, and your eyes are sending you a megabit per second. Becoming convinced that you, yes you are an Earthling is epistemically doable; you just need to see a stream of sensory experiences which is 1068 times more probable if you are an Earthling than if you are someone else. If we take everything at face value, then there could be around 1080 centuries of life over the history of the universe, and only 1011 of those centuries will be lived by creatures who discover themselves occupying organic bodies. Taking everything at face value, the sensory experiences of your life are unique to Earthlings and should immediately convince you that you're an Earthling - just looking around the room you occupy will provide you with sensory experiences that plausibly belong to only 1011 out of 1080 life-centuries.
If we don't take everything at face value, then there might be such things as ancestor simulations, and it might be that your experience of looking around the room is something that happens in 1020 ancestor simulations for every time that it happens in 'base level' reality. In this case your probable leverage on the future is diluted (though it may be large even post-dilution). But this is not something that the Hansonian leverage penalty forces you to believe - not when the putative stakes are still as small as 1080. Conceptually, the Hansonian leverage penalty doesn't interact much with the Simulation Hypothesis (SH) at all. If you don't believe SH, then you think that the experiences of creatures like yours are rare in the universe and hence present strong, convincing evidence for you occupying the leverage-privileged position of an Earthling - much stronger evidence than its prior improbability. (There's some separate anthropic issues here about whether or not this is itself evidence for SH, but I don't think that question is intrinsic to leverage penalties per se.)
A key point here is that even if you accept a Hanson-style leverage penalty, it doesn't have to manifest as an inescapable commandment of modesty. You need not refuse to believe (in your deep and irrevocable humility) that you could be someone as special as an Ancient Earthling. Even if Earthlings matter in the universe - even if we occupy a unique position to affect the future of galaxies - it is still possible to encounter pretty convincing evidence that you're an Earthling. Universes the size of 1080 do not pose problems to conventional decision-theoretic reasoning, or to conventional epistemology.
Things play out similarly if - still taking everything at face value - you're wondering about the chance that you could be special even for an Earthling, because you might be one of say 104 people in the history of the universe who contribute a major amount to an x-risk reduction project which ends up actually saving the galaxies. The vast majority of the improbability here is just in being an Earthling in the first place! Thus most of the clever arguments for not taking this high-impact possibility at face value would also tell you not to take being an Earthling at face value, since Earthlings as a whole are much more unique within the total temporal history of the universe than you are supposing yourself to be unique among Earthlings. But given ¬SH, the prior improbability of being an Earthling can be overcome by a few megabits of sensory experience from looking around the room and querying your memories - it's not like 1080 is enough future beings that the number of agents randomly hallucinating similar experiences outweighs the number of real Earthlings. Similarly, if you don't think lots of Earthlings are hallucinating the experience of going to a donation page and clicking on the Paypal button for an x-risk charity, that sensory experience can easily serve to distinguish you as one of 104 people donating to an x-risk philanthropy.
Yes, there are various clever-sounding lines of argument which involve not taking things at face value - "Ah, but maybe you should consider yourself as an indistinguishable part of this here large reference class of deluded people who think they're important." Which I consider to be a bad idea because it renders you a permanent Muggle by putting you into an inescapable reference class of self-deluded people and then dismissing all your further thoughts as insufficient evidence because you could just be deluding yourself further about whether these are good arguments. Nor do I believe the world can only be saved by good people who are incapable of distinguishing themselves from a large class of crackpots, all of whom have no choice but to continue based on the tiny probability that they are not crackpots. (For more on this see Being Half-Rational About Pascal's Wager Is Even Worse.) In this case you are a Pascal's Muggle not because you've explicitly assigned a probability like one over googolplex, but because you took an improbability like 10-6 at unquestioning face value and then cleverly questioned all the evidence which could've overcome that prior improbability, and so, in practice, you can never climb out of the epistemological sinkhole. By the same token, you should conclude that you are just self-deluded about being an Earthling since real Earthlings are so rare and privileged in their leverage.
In general, leverage penalties don't translate into advice about modesty or that you're just deluding yourself - they just say that to be rationally coherent, your picture of the universe has to imply that your sensory experiences are at least as rare as the corresponding magnitude of your leverage.
Which brings us back to Pascal's Mugger, in the original alleyway version. The Hansonian leverage penalty seems to imply that to be coherent, either you believe that your sensory experiences are really actually 1 in a googolplex - that only 1 in a googolplex beings experiences what you're experiencing - or else you really can't take the situation at face value.
Suppose the Mugger is telling the truth, and a googolplex other people are being simulated. Then there are at least a googolplex people in the universe. Perhaps some of them are hallucinating a situation similar to this one by sheer chance? Rather than telling you flatly that you can't have a large impact, the Hansonian leverage penalty implies a coherence requirement on how uniquely you think your sensory experiences identify the position you believe yourself to occupy. When it comes to believing you're one of 1011 Earthlings who can impact 1080 other life-centuries, you need to think your sensory experiences are unique to Earthlings - identify Earthlings with a likelihood ratio on the order of 1069. This is quite achievable, if we take the evidence at face value. But when it comes to improbability on the order of 1/3↑↑↑3, the prior improbability is inescapable - your sensory experiences can't possibly be that unique - which is assumed to be appropriate because almost-everyone who ever believes they'll be in a position to help 3↑↑↑3 people will in fact be hallucinating. Boltzmann brains should be much more common than people in a unique position to affect 3↑↑↑3 others, at least if the causal graphs are finite.
Furthermore - although I didn't realize this part until recently - applying Bayesian updates from that starting point may partially avert the Pascal's Muggle effect:
Mugger: "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."
You: "Nope."
Mugger: "Why not? It's a really large impact."
You: "Yes, and I assign a probability on the order of 1 in 3↑↑↑3 that I would be in a unique position to affect 3↑↑↑3 people."
Mugger: "Oh, is that really the probability that you assign? Behold!"
(A gap opens in the sky, edged with blue fire.)
Mugger: "Now what do you think, eh?"
You: "Well... I can't actually say this observation has a likelihood ratio of 3↑↑↑3 to 1. No stream of evidence that can enter a human brain over the course of a century is ever going to have a likelihood ratio larger than, say, 101026 to 1 at the absurdly most, assuming one megabit per second of sensory data, for a century, each bit of which has at least a 1-in-a-trillion error probability. I'd probably start to be dominated by Boltzmann brains or other exotic minds well before then."
Mugger: "So you're not convinced."
You: "Indeed not. The probability that you're telling the truth is so tiny that God couldn't find it with an electron microscope. Here's the five dollars."
Mugger: "Done! You've saved 3↑↑↑3 lives! Congratulations, you're never going to top that, your peak life accomplishment will now always lie in your past. But why'd you give me the five dollars if you think I'm lying?"
You: "Well, because the evidence you did present me with had a likelihood ratio of at least a billion to one - I would've assigned less than 10-9 prior probability of seeing this when I woke up this morning - so in accordance with Bayes's Theorem I promoted the probability from 1/3↑↑↑3 to at least 109/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars."
I confess that I find this line of reasoning a bit suspicious - it seems overly clever. But on the level of intuitive virtues of rationality, it does seem less stupid than the original Pascal's Muggle; this muggee is at least behaviorally reacting to the evidence. In fact, they're reacting in a way exactly proportional to the evidence - they would've assigned the same net importance to handing over the five dollars if the Mugger had offered 3↑↑↑4 lives, so long as the strength of the evidence seemed the same.
(Anyone who tries to apply the lessons here to actual x-risk reduction charities (which I think is probably a bad idea), keep in mind that the vast majority of the improbable-position-of-leverage in any x-risk reduction effort comes from being an Earthling in a position to affect the future of a hundred billion galaxies, and that sensory evidence for being an Earthling is what gives you most of your belief that your actions can have an outsized impact.)
So why not just run with this - why not just declare the decision-theoretic problem resolved, if we have a rule that seems to give reasonable behavioral answers in practice? Why not just go ahead and program that rule into an AI?
Well... I still feel a bit nervous about the idea that Pascal's Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 109/3↑↑↑3 that it's doing any good.
I think that my own reaction in a similar situation would be along these lines instead:
Mugger: "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."
Me: "Nope."
Mugger: "So then, you think the probability I'm telling the truth is on the order of 1/3↑↑↑3?"
Me: "Yeah... that probably has to follow. I don't see any way around that revealed belief, given that I'm not actually giving you the five dollars. I've heard some people try to claim silly things like, the probability that you're telling the truth is counterbalanced by the probability that you'll kill 3↑↑↑3 people instead, or something else with a conveniently equal and opposite utility. But there's no way that things would balance out exactly in practice, if there was no a priori mathematical requirement that they balance. Even if the prior probability of your saving 3↑↑↑3 people and killing 3↑↑↑3 people, conditional on my giving you five dollars, exactly balanced down to the log(3↑↑↑3) decimal place, the likelihood ratio for your telling me that you would "save" 3↑↑↑3 people would not be exactly 1:1 for the two hypotheses down to the log(3↑↑↑3) decimal place. So if I assigned probabilities much greater than 1/3↑↑↑3 to your doing something that affected 3↑↑↑3 people, my actions would be overwhelmingly dominated by even a tiny difference in likelihood ratio elevating the probability that you saved 3↑↑↑3 people over the probability that you did something bad to them. The only way this hypothesis can't dominate my actions - really, the only way my expected utility sums can converge at all - is if I assign probability on the order of 1/3↑↑↑3 or less. I don't see any way of escaping that part."
Mugger: "But can you, in your mortal uncertainty, truly assign a probability as low as 1 in 3↑↑↑3 to any proposition whatever? Can you truly believe, with your error-prone neural brain, that you could make 3↑↑↑3 statements of any kind one after another, and be wrong, on average, about once?"
Me: "Nope."
Mugger: "So give me five dollars!"
Me: "Nope."
Mugger: "Why not?"
Me: "Because even though I, in my mortal uncertainty, will eventually be wrong about all sorts of things if I make enough statements one after another, this fact can't be used to increase the probability of arbitrary statements beyond what my prior says they should be, because then my prior would sum to more than 1. There must be some kind of required condition for taking a hypothesis seriously enough to worry that I might be overconfident about it -"
Mugger: "Then behold!"
(A gap opens in the sky, edged with blue fire.)
Mugger: "Now what do you think, eh?"
Me (staring up at the sky): "...whoa." (Pause.) "You turned into a cat."
Mugger: "What?"
Me: "Private joke. Okay, I think I'm going to have to rethink a lot of things. But if you want to tell me about how I was wrong to assign a prior probability on the order of 1/3↑↑↑3 to your scenario, I will shut up and listen very carefully to what you have to say about it. Oh, and here's the five dollars, can I pay an extra twenty and make some other requests?"
(The thought bubble pops, and we return to two people standing in an alley, the sky above perfectly normal.)
Mugger: "Now, in this scenario we've just imagined, you were taking my case seriously, right? But the evidence there couldn't have had a likelihood ratio of more than 101026 to 1, and probably much less. So by the method of imaginary updates, you must assign probability at least 10-1026 to my scenario, which when multiplied by a benefit on the order of 3↑↑↑3, yields an unimaginable bonanza in exchange for just five dollars -"
Me: "Nope."
Mugger: "How can you possibly say that? You're not being logically coherent!"
Me: "I agree that I'm not being logically coherent, but I think that's acceptable in this case."
Mugger: "This ought to be good. Since when are rationalists allowed to deliberately be logically incoherent?"
Me: "Since we don't have infinite computing power -"
Mugger: "That sounds like a fully general excuse if I ever heard one."
Me: "No, this is a specific consequence of bounded computing power. Let me start with a simpler example. Suppose I believe in a set of mathematical axioms. Since I don't have infinite computing power, I won't be able to know all the deductive consequences of those axioms. And that means I will necessarily fall prey to the conjunction fallacy, in the sense that you'll present me with a theorem X that is a deductive consequence of my axioms, but which I don't know to be a deductive consequence of my axioms, and you'll ask me to assign a probability to X, and I'll assign it 50% probability or something. Then you present me with a brilliant lemma Y, which clearly seems like a likely consequence of my mathematical axioms, and which also seems to imply X - once I see Y, the connection from my axioms to X, via Y, becomes obvious. So I assign P(X&Y) = 90%, or something like that. Well, that's the conjunction fallacy - I assigned P(X&Y) > P(X). The thing is, if you then ask me P(X), after I've seen Y, I'll reply that P(X) is 91% or at any rate something higher than P(X&Y). I'll have changed my mind about what my prior beliefs logically imply, because I'm not logically omniscient, even if that looks like assigning probabilities over time which are incoherent in the Bayesian sense."
Mugger: "And how does this work out to my not getting five dollars?"
Me: "In the scenario you're asking me to imagine, you present me with evidence which I currently think Just Plain Shouldn't Happen. And if that actually does happen, the sensible way for me to react is by questioning my prior assumptions and the reasoning which led me assign such low probability. One way that I handle my lack of logical omniscience - my finite, error-prone reasoning capabilities - is by being willing to assign infinitesimal probabilities to non-privileged hypotheses so that my prior over all possibilities can sum to 1. But if I actually see strong evidence for something I previously thought was super-improbable, I don't just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place - whether it was really as complex, or unnatural, as I thought. In real life, you are not ever supposed to have a prior improbability of 10-100 for some fact distinguished enough to be written down in advance, and yet encounter strong evidence, say 1010 to 1, that the thing has actually happened. If something like that happens, you don't do a Bayesian update to a posterior of 10-90. Instead you question both whether the evidence might be weaker than it seems, and whether your estimate of prior improbability might have been poorly calibrated, because rational agents who actually have well-calibrated priors should not encounter situations like that until they are ten billion days old. Now, this may mean that I end up doing some non-Bayesian updates: I say some hypothesis has a prior probability of a quadrillion to one, you show me evidence with a likelihood ratio of a billion to one, and I say 'Guess I was wrong about that quadrillion to one thing' rather than being a Muggle about it. And then I shut up and listen to what you have to say about how to estimate probabilities, because on my worldview, I wasn't expecting to see you turn into a cat. But for me to make a super-update like that - reflecting a posterior belief that I was logically incorrect about the prior probability - you have to really actually show me the evidence, you can't just ask me to imagine it. This is something that only logically incoherent agents ever say, but that's all right because I'm not logically omniscient."
At some point, we're going to have to build some sort of actual prior into, you know, some sort of actual self-improving AI.
(Scary thought, right?)
So far as I can presently see, the logic requiring some sort of leverage penalty - not just so that we don't pay $5 to Pascal's Mugger, but also so that our expected utility sums converge at all - seems clear enough that I can't yet see a good alternative to it (feel welcome to suggest one), and Robin Hanson's rationale is by far the best I've heard.
In fact, what we actually need is more like a combined leverage-and-complexity penalty, to avoid scenarios like this:
Mugger: "Give me $5 and I'll save 3↑↑↑3 people."
You: "I assign probability exactly 1/3↑↑↑3 to that."
Mugger: "So that's one life saved for $5, on average. That's a pretty good bargain, right?"
You: "Not by comparison with x-risk reduction charities. But I also like to do good on a smaller scale now and then. How about a penny? Would you be willing to save 3↑↑↑3/500 lives for a penny?"
Mugger: "Eh, fine."
You: "Well, the probability of that is 500/3↑↑↑3, so here's a penny!" (Goes on way, whistling cheerfully.)
Adding a complexity penalty and a leverage penalty is necessary, not just to avert this exact scenario, but so that we don't get an infinite expected utility sum over a 1/3↑↑↑3 probability of saving 3↑↑↑3 lives, 1/(3↑↑↑3 + 1) probability of saving 3↑↑↑3 + 1 lives, and so on. If we combine the standard complexity penalty with a leverage penalty, the whole thing should converge.
Probability penalties are epistemic features - they affect what we believe, not just what we do. Maps, ideally, correspond to territories. Is there any territory that this complexity+leverage penalty can correspond to - any state of a single reality which would make these the true frequencies? Or is it only interpretable as pure uncertainty over realities, with there being no single reality that could correspond to it? To put it another way, the complexity penalty and the leverage penalty seem unrelated, so perhaps they're mutually inconsistent; can we show that the union of these two theories has a model?
As near as I can figure, the corresponding state of affairs to a complexity+leverage prior improbability would be a Tegmark Level IV multiverse in which each reality got an amount of magical-reality-fluid corresponding to the complexity of its program (1/2 to the power of its Kolmogorov complexity) and then this magical-reality-fluid had to be divided among all the causal elements within that universe - if you contain 3↑↑↑3 causal nodes, then each node can only get 1/3↑↑↑3 of the total realness of that universe. (As always, the term "magical reality fluid" reflects an attempt to demarcate a philosophical area where I feel quite confused, and try to use correspondingly blatantly wrong terminology so that I do not mistake my reasoning about my confusion for a solution.) This setup is not entirely implausible because the Born probabilities in our own universe look like they might behave like this sort of magical-reality-fluid - quantum amplitude flowing between configurations in a way that preserves the total amount of realness while dividing it between worlds - and perhaps every other part of the multiverse must necessarily work the same way for some reason. It seems worth noting that part of what's motivating this version of the 'territory' is that our sum over all real things, weighted by reality-fluid, can then converge. In other words, the reason why complexity+leverage works in decision theory is that the union of the two theories has a model in which the total multiverse contains an amount of reality-fluid that can sum to 1 rather than being infinite. (Though we need to suppose that either (a) only programs with a finite number of causal nodes exist, or (2) programs can divide finite reality-fluid among an infinite number of nodes via some measure that gives every experience-moment a well-defined relative amount of reality-fluid. Again see caveats about basic philosophical confusion - perhaps our map needs this property over its uncertainty but the territory doesn't have to work the same way, etcetera.)
If an AI's overall architecture is also such as to enable it to carry out the "You turned into a cat" effect - where if the AI actually ends up with strong evidence for a scenario it assigned super-exponential improbability, the AI reconsiders its priors and the apparent strength of evidence rather than executing a blind Bayesian update, though this part is formally a tad underspecified - then at the moment I can't think of anything else to add in.
In other words: This is my best current idea for how a prior, e.g. as used in an AI, could yield decision-theoretic convergence over explosively large possible worlds.
However, I would still call this a semi-open FAI problem (edit: wide-open) because it seems quite plausible that somebody is going to kick holes in the overall view I've just presented, or come up with a better solution, possibly within an hour of my posting this - the proposal is both recent and weak even by my standards. I'm also worried about whether it turns out to imply anything crazy on anthropic problems. Over to you, readers.
One scheme with the properties you want is Wei Dai's UDASSA, e.g. see here. I think UDASSA is by far the best formal theory we have to date, although I'm under no delusions about how well it captures all of our intuitions (I'm also under no delusions about how consistent our intuitions are, so I'm resigned to accepting a scheme that doesn't capture them).
I think it would be more fair to call this allocation of measure part of my preferences, instead of "magical reality fluid." Thinking that your preferences are objective facts about the world seems like one of the oldest errors in the book, which is only possibly justified in this case because we are still confused about the hard problem of consciousness.
As other commenters have observed, it seems clear that you should never actually believe that the mugger can influence the lives of 3^^^^3 other folks and will do so at your suggestion, whether or not you've made any special "leverage adjustment." Nevertheless, even though you never believe that you have such influence, you would still need to pass to some bounded utility function if you want to use the normal framework of expected utility maximization, since you need to compare the goodness of whole worlds. Either that, or you would need to make quite significant modifications to your decision theory.
Who might have the time, desire, and ability to write up UDASSA clearly, if MIRI provides them with resources?