Shortened version of:  Pascal's Muggle:  Infinitesimal Priors and Strong Evidence

One proposal which has been floated for dealing with Pascal's Mugger is to penalize hypotheses that let you affect a large number of people, in proportion to the number of people affected - what we could call perhaps a "leverage penalty" instead of a "complexity penalty".  This isn't just for Pascal's Mugger in particularly, it seems required to have expected utilities in general converge when the 'size' of scenarios can grow much faster than their algorithmic complexity.

Unfortunately this potentially leads us into a different problem, that of Pascal's Muggle.

Suppose a poorly-dressed street person asks you for five dollars in exchange for doing a googolplex's worth of good using his Matrix Lord powers - say, saving the lives of a googolplex other people inside computer simulations they're running.

"Well," you reply, "I think that it would be very improbable that I would be able to affect so many people through my own, personal actions - who am I to have such a great impact upon events?  Indeed, I think the probability is somewhere around one over googolplex, maybe a bit less.  So no, I won't pay five dollars - it is unthinkably improbable that I could do so much good!"

"I see," says the Mugger.

A wind begins to blow about the alley, whipping the Mugger's loose clothes about him as they shift from ill-fitting shirt and jeans into robes of infinite blackness, within whose depths tiny galaxies and stranger things seem to twinkle.  In the sky above, a gap edged by blue fire opens with a horrendous tearing sound - you can hear people on the nearby street yelling in sudden shock and terror, implying that they can see it too - and displays the image of the Mugger himself, wearing the same robes that now adorn his body, seated before a keyboard and a monitor.

"That's not actually me," the Mugger says, "just a conceptual representation, but I don't want to drive you insane.  Now give me those five dollars, and I'll save a googolplex lives, just as promised.  It's easy enough for me, given the computing power my home universe offers.  As for why I'm doing this, there's an ancient debate in philosophy among my people - something about how we ought to sum our expected utilities - and I mean to use the video of this event to make a point at the next decision theory conference I attend.   Now will you give me the five dollars, or not?"

"Mm... no," you reply.

"No?" says the Mugger.  "I understood earlier when you didn't want to give a random street person five dollars based on a wild story with no evidence behind it whatsoever.  But surely I've offered you evidence now."

"Unfortunately, you haven't offered me enough evidence," you explain.

"Seriously?" says the Mugger.  "I've opened up a fiery portal in the sky, and that's not enough to persuade you?  What do I have to do, then?  Rearrange the planets in your solar system, and wait for the observatories to confirm the fact?  I suppose I could also explain the true laws of physics in the higher universe in more detail, and let you play around a bit with the computer program that encodes all the universes containing the googolplex people I would save if you just gave me the damn five dollars -"

"Sorry," you say, shaking your head firmly, "there's just no way you can convince me that I'm in a position to affect a googolplex people, because the prior probability of that is one over googolplex.  If you wanted to convince me of some fact of merely 2-100 prior probability, a mere decillion to one - like that a coin would come up heads and tails in some particular pattern of a hundred coinflips - then you could just show me 100 bits of evidence, which is within easy reach of my brain's sensory bandwidth.  I mean, you could just flip the coin a hundred times, and my eyes, which send my brain a hundred megabits a second or so - though that gets processed down to one megabit or so by the time it goes through the lateral geniculate nucleus - would easily give me enough data to conclude that this decillion-to-one possibility was true.  But to conclude something whose prior probability is on the order of one over googolplex, I need on the order of a googol bits of evidence, and you can't present me with a sensory experience containing a googol bits.  Indeed, you can't ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one - evidence I'm a googolplex times more likely to encounter if the hypothesis is true, than if it's false - because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex.  You know the old saying about how once you assign something probability one, or probability zero, you can't update that probability regardless of what evidence you see?  Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way."

"So no matter what evidence I show you," the Mugger says - as the blue fire goes on crackling in the torn sky above, and screams and desperate prayers continue from the street beyond - "you can't ever notice that you're in a position to help a googolplex people."

"Right!" you say.  "I can believe that you're a Matrix Lord.  I mean, I'm not a total Muggle, I'm psychologically capable of responding in some fashion to that giant hole in the sky.  But it's just completely forbidden for me to assign any significant probability whatsoever that you will actually save a googolplex people after I give you five dollars.  You're lying, and I am absolutely, absolutely, absolutely confident of that."

"So you weren't just invoking the leverage penalty as a plausible-sounding way of getting out of paying me the five dollars earlier," the Mugger says thoughtfully.  "I mean, I'd understand if that was just a rationalization of your discomfort at forking over five dollars for what seemed like a tiny probability, when I hadn't done my duty to present you with a corresponding amount of evidence before demanding payment.  But you... you're acting like an AI would if it was actually programmed with a leverage penalty on hypotheses!"

"Exactly," you say.  "I'm forbidden a priori to believe I can ever do that much good."

"Why?" the Mugger says curiously.  "I mean, all I have to do is press this button here and a googolplex lives will be saved."  The figure within the blazing portal above points to a green button on the console before it.

"Like I said," you explain again, "the prior probability is just too infinitesimal for the massive evidence you're showing me to overcome it -"

The Mugger shrugs, and vanishes in a puff of purple mist.

The portal in the sky above closes, taking with the console and the green button.

(The screams go on from the street outside.)

A few days later, you're sitting in your office at the physics institute where you work, when one of your colleagues bursts in through your door, seeming highly excited.  "I've got it!" she cries.  "I've figured out that whole dark energy thing!  Look, these simple equations retrodict it exactly, there's no way that could be a coincidence!"

At first you're also excited, but as you pore over the equations, your face configures itself into a frown.  "No..." you say slowly.  "These equations may look extremely simple so far as computational complexity goes - and they do exactly fit the petabytes of evidence our telescopes have gathered so far - but I'm afraid they're far too improbable to ever believe."

"What?" she says.  "Why?"

"Well," you say reasonably, "if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we'd be able to create around a googolplex people that way.  But that would mean that we, here on Earth, are in a position to affect a googolplex people - since, if we blow ourselves up via a nanotechnological war or (cough) make certain other errors, those googolplex people will never come into existence.  The prior probability of us being in a position to impact a googolplex people is on the order of one over googolplex, so your equations must be wrong."

"Hmm..." she says.  "I hadn't thought of that.  But what if these equations are right, and yet somehow, everything I do is exactly balanced, down to the googolth decimal point or so, with respect to how it impacts the chance of modern-day Earth participating in a chain of events that leads to creating an intergalactic civilization?"

"How would that work?" you say.  "There's only seven billion people on today's Earth - there's probably been only a hundred billion people who ever existed total, or will exist before we go through the intelligence explosion or whatever - so even before analyzing your exact position, it seems like your leverage on future affairs couldn't reasonably be less than one in a ten trillion part of the future or so."

"But then given this physical theory which seems obviously true, my acts might imply expected utility differentials on the order of 1010100-13," she explains, "and I'm not allowed to believe that no matter how much evidence you show me."


This problem may not be as bad as it looks; a leverage penalty may lead to more reasonable behavior than depicted above, after taking into account Bayesian updating:


Mugger:  "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."

You:  "Nope."

Mugger:  "Why not?  It's a really large impact."

You:  "Yes, and I assign a probability on the order of 1 in 3↑↑↑3 that I would be in a unique position to affect 3↑↑↑3 people."

Mugger:  "Oh, is that really the probability that you assign?  Behold!"

(A gap opens in the sky, edged with blue fire.)

Mugger:  "Now what do you think, eh?"

You:  "Well... I can't actually say this has a likelihood ratio of 3↑↑↑3 to 1.  No stream of evidence that can enter a human brain over the course of a century is ever going to have a likelihood ratio larger than, say, 101026 to 1 at the absurdly most, assuming one megabit per second of sensory data, for a century, each bit of which has at least a 1-in-a-trillion error probability.  You'd probably start to be dominated by Boltzmann brains or other exotic minds well before then."

Mugger:  "So you're not convinced."

You:  "Indeed not.  The probability that you're telling the truth is so tiny that God couldn't find it with an electron microscope.  Here's the five dollars."

Mugger:  "Done!  You've saved 3↑↑↑3 lives!  Congratulations, you're never going to top that, your peak life accomplishment will now always lie in your past.  But why'd you give me the five dollars if you think I'm lying?"

You:  "Well, because the evidence you did present me with had a likelihood ratio of at least a billion to one - I would've assigned less than 10-9 prior probability of seeing this when I woke up this morning - so in accordance with Bayes's Theorem I promoted the probability from 1/3↑↑↑3 to at least 109/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars."


I confess that I find this line of reasoning a bit suspicious - it seems overly clever - but at least on the level of intuitive-virtues-of-rationality it doesn't seem completely stupid in the same way as Pascal's Muggle.  This muggee is at least behaviorally reacting to the evidence.  In fact, they're reacting in a way exactly proportional to the evidence - they would've assigned the same net importance to handing over the five dollars if the Mugger had offered 3↑↑↑4 lives, so long as the strength of the evidence seemed the same.

But I still feel a bit nervous about the idea that Pascal's Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 109/3↑↑↑3 that it's doing any good.  My own reaction would probably be more like this:


Mugger:  "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."

Me:  "Nope."

Mugger:  "So then, you think the probability I'm telling the truth is on the order of 1/3↑↑↑3?"

Me:  "Yeah... that probably has to follow.  I don't see any way around that revealed belief, given that I'm not actually giving you the five dollars.  I've heard some people try to claim silly things like, the probability that you're telling the truth is counterbalanced by the probability that you'll kill 3↑↑↑3 people instead, or something else with a conveniently exactly equal and opposite utility.  But there's no way that things would balance out that neatly in practice, if there was no a priori mathematical requirement that they balance.  Even if the prior probability of your saving 3↑↑↑3 people and killing 3↑↑↑3 people, conditional on my giving you five dollars, exactly balanced down to the log(3↑↑↑3) decimal place, the likelihood ratio for your telling me that you would "save" 3↑↑↑3 people would not be exactly 1:1 for the two hypotheses down to the log(3↑↑↑3) decimal place.  So if I assigned probabilities much greater than 1/3↑↑↑3 to your doing something that affected 3↑↑↑3 people, my actions would be overwhelmingly dominated by even a tiny difference in likelihood ratio elevating the probability that you saved 3↑↑↑3 people over the probability that you did something equally and oppositely bad to them.  The only way this hypothesis can't dominate my actions - really, the only way my expected utility sums can converge at all - is if I assign probability on the order of 1/3↑↑↑3 or less.  I don't see any way of escaping that part."

Mugger:  "But can you, in your mortal uncertainty, truly assign a probability as low as 1 in 3↑↑↑3 to any proposition whatever?  Can you truly believe, with your error-prone neural brain, that you could make 3↑↑↑3 statements of any kind one after another, and be wrong, on average, about once?"

Me:  "Nope."

Mugger:  "So give me five dollars!"

Me:  "Nope."

Mugger:  "Why not?"

Me:  "Because even though I, in my mortal uncertainty, will eventually be wrong about all sorts of things if I make enough statements one after another, this fact can't be used to increase the probability of arbitrary statements beyond what my prior says they should be, because then my prior would sum to more than 1.  There must be some kind of required condition for taking a hypothesis seriously enough to worry that I might be overconfident about it -"

Mugger:  "Then behold!"

(A gap opens in the sky, edged with blue fire.)

Mugger:  "Now what do you think, eh?"

Me (staring up at the sky):  "...whoa."  (Pause.)  "You turned into a cat."

Mugger:  "What?"

Me:  "Private joke.  Okay, I think I'm going to have to rethink a lot of things.  But if you want to tell me about how I was wrong to assign a prior probability on the order of 1/3↑↑↑3 to your scenario, I will shut up and listen very carefully to what you have to say about it.  Oh, and here's the five dollars, can I pay an extra twenty and make some other requests?"

(The thought bubble pops, and we return to two people standing in an alley, the sky above perfectly normal.)

Mugger:  "Now, in this scenario we've just imagined, you were taking my case seriously, right?  But the evidence there couldn't have had a likelihood ratio of more than 101026 to 1, and probably much less.  So by the method of imaginary updates, you must assign probability at least 10-1026 to my scenario, which when multiplied by a benefit on the order of 3↑↑↑3, yields an unimaginable bonanza in exchange for just five dollars -"

Me:  "Nope."

Mugger:  "How can you possibly say that?  You're not being logically coherent!"

Me:  "I agree that I'm being incoherent in a sense, but I think that's acceptable in this case, since I don't have infinite computing power.  In the scenario you're asking me to imagine, you're presenting me with evidence which I currently think Can't Happen.  And if that actually does happen, the sensible way for me to react is by questioning my prior assumptions and reasoning which led me to believe I shouldn't see it happen.  One way that I handle my lack of logical omniscience - my finite, error-prone reasoning capabilities - is by being willing to assign infinitesimal probabilities to non-privileged hypotheses so that my prior over all possibilities can sum to 1.  But if I actually see strong evidence for something I previously thought was super-improbable, I don't just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place - whether the scenario was really as complex, or unnatural, as I thought.  In real life, you are not ever supposed to have a prior improbability of 10-100 for some fact distinguished enough to be written down, and yet encounter strong evidence, say 1010 to 1, that the thing has actually happened.  If something like that happens, you don't do a Bayesian update to a posterior of 10-90.  Instead you question both whether the evidence might be weaker than it seems, and whether your estimate of prior improbability might have been poorly calibrated, because rational agents who actually have well-calibrated priors should not encounter situations like that until they are ten billion days old.  Now, this may mean that I end up doing some non-Bayesian updates:  I say some hypothesis has a prior probability of a quadrillion to one, you show me evidence with a likelihood ratio of a billion to one, and I say 'Guess I was wrong about that quadrillion to one thing' rather than being a Muggle about it.  And then I shut up and listen to what you have to say about how to estimate probabilities, because on my worldview, I wasn't expecting to see you turn into a cat.  But for me to make a super-update like that - reflecting a posterior belief that I was logically incorrect about the prior probability - you have to really actually show me the evidence, you can't just ask me to imagine it.  This is something that only logically incoherent agents ever say, but that's all right because I'm not logically omniscient."


When I add up a complexity penalty, a leverage penalty, and the "You turned into a cat!" logical non-omniscience clause, I get the best candidate I have so far for the correct decision-theoretic way to handle these sorts of possibilities while still having expected utilities converge.

As mentioned in the longer version, this has very little in the way of relevance for optimal philanthropy, because we don't really need to consider these sorts of rules for handling small large numbers on the order of a universe containing 1080 atoms, and because most of the improbable leverage associated with x-risk charities is associated with discovering yourself to be an Ancient Earthling from before the intelligence explosion, which improbability (for universes the size of 1080 atoms) is easily overcome by the sensory experiences which tell you you're an Earthling.  For more on this see the original long-form post.  The main FAI issue at stake is what sort of prior to program into an AI.

New Comment
53 comments, sorted by Click to highlight new comments since: Today at 10:55 AM

My probable reaction before reading this, now hopefully replaced by the OP if I can remember it:

Mugger: "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."

Me: "Here you-- Wait a minute, I could give these to the MIRI, but I can't do that if I give them to you. There's, like, 1/1 000 000 000 probability you're speaking the truth, and the probability of them making the difference between a negative and positive singularity, wich'd make a difference for way more people like that all over the multiverse, maybe."

Mugger: "... you are terrible at pulling numbers out of your ass. Are you even trying? Unimaginable amounts of lives are on the tine here!"

Me: "I can't deny any of that. Bye!"

me continues never getting around to actually donating and spends the money of something frivolous

Suppose a poorly-dressed street person offers you five dollars in exchange for doing a googolplex's worth of good using his Matrix Lord powers - say, saving the lives of a googolplex other people inside computer simulations they're running.

Shouldn't it be "asks you for five dollars"?

I'd be even more wary of Pascal's gifting than his mugging.

Hmm, I wounder if there is a version of Pascal gifting worth pondering. "The anti-mugger says 'I will pay you $5 if..'" what?

Since a human mind really can't naturally conceive of the difference between huge numbers like these, wouldn't it follow that our utility functions are bounded by an horizontal asymptote? And shouldn't that solve this problem?

I mean, if the amount of utility gained from saving x amount of people is no longer allowed to increase boundlessly, you don't need such improbable leverage penalties. You'd still of course have the property that it's better to save more people, just not linearly better.

I find that unsatisfactory for the following reasons - first, I am a great believer in life and love without bound; second, I suspect that the number of people in the multiverse is already great enough to max out that sort of asymptote and yet I still care; third, if this number is not already maxed out, I find it counterintuitive that someone another universe over could cause me to experience preference reversals in this universe by manipulating the number of people who already exist inside a box.

Ok, I might have formulated myself badly. My argument is that any agent of bounded computational power is forced to use two utility functions. The one they wish they had (i.e. the unbounded linear version) and the one they are forced to use in their calculations because of their limitations (i.e. an asymptotically bounded approximation).

For those agents capable of self-modification, just add a clause to increase their computational power (and thereby increasing the bound of their approximation) whenever the utilities of the "scales they're working on" differ by more than some small specified number.

So, my answer to this person would be "stick around until I can safely modify myself into dealing with your request", or alternatively, if he wants an answer right now after seeing his evidence, "here's 5 dollars".

Why can't you increase your asymptote with new evidence? If, for instance, your utility was bounded at 2^160 utilons before the mugger opened the sky then just increase your bound according to that evidence and then shut up and multiply to decide whether to pay $5. You can't update to a bound of 3^^^3 in one step since you can't receive enough evidence at once, which is a handy feature for avoiding muggings, but your utility at a distant point in the future is essentially unbounded given enough evidential updates over time.

Useful utility bounds should be derivable from our knowledge of the universe. If we can theoretically create 10^80 unique, just-worth-living lives with the estimated matter and energy in the universe then that provides a minimum bound, although it's probably desirable to choose the bound large enough that the 10^80th life is worth nearly as much as the 1st or 10^11th life. When we have evidence for a change in our estimate of the available matter and energy or a change in the efficiency of turning matter and energy into utility we scale the bound appropriately.

I think having a leverage penalty, to the extent that such a thing is reasonable, is more about correcting for biases in how humans generate hypotheses than anything else. For example, many mental patients claim to be important historical figures like Jesus who had a large amount of leverage. It's not completely unreasonable to guess that this comes from some psychological mechanism that might still be in place in non-mental patients, and so it's not completely unreasonable to discard claims of large amounts of leverage in the name of combating this particular way in which humans might privilege hypotheses.

I wouldn't go so far as to claim that the size of the penalty should have any obvious relationship to the size of the leverage claimed, though.

Then this is an entirely different factor which doesn't have to do with having our EU sums converge.

The utility function is up for grabs.

Utilities functions perfectly represent preferences that follow the VNM axioms. We work with utilities more often than preferences and can forget that preferences are the fundamental concept. Agents don't have preferences because they have utility functions -- agents have utility functions because they have preferences.

We use shortcuts that allow us to quickly manage small (conceivable) utilities and we understand the consequences intuitively. But shortcuts don't have to remain valid with superexponentially large utilities. Running superexponentially large utilities backwards through the VNM axioms result in preferences that border on insanity.

Assigning a utility of $2^googul to an outcome means preferring any possibility of its fruition over a certainty of $1,000,000 even if every bit of information in the observable universe) unanimously tells you that you are wrong. This preference in the face of overwhelming counter-evidence is not a perverse artifact; it is the intrinsic meaning of $2^googul. If that meaning truly represents your preferences, fine. If not, your utility function is wrong.

Tiny probabilities can be sliced into conditional sequences of large probabilities, superexponential changes can be sliced into sequences of exponential changes. See e.g. the Lifespan Dilemma. If you reject the final conclusion, you must reject some particular step along the way, on pain of circular preference. This is the larger significance of VNM.

Also: You're not complaining about VNM per se, you're complaining that you think your preferences correspond to a bounded utility function.

I read Lifespan Dilemma (which I had not before, thank you for pointing it out [and writing it]). I torture on specks vs. torture without so much as a second thought, but don't give the Omega a penny for life extension for this reason:

Following a super-exponential utility function requires super-polynomial amounts of memory. I just don't know what it would mean to have more experiences than can be counted using all the matter in the universe. Existing outside of PSPACE is so...alien.

If I wanted to keep count of how many experiences I've had, it would be morally impermissible to simply increment natural numbers. Some natural numbers encode universes of unspeakable torment, and I'd hate for my IP register to accidentally create a hell dimension.

Is it really so disheartening to believe that the marginal utility I'd place on the Nth experience* would eventually decrease to 0?

*Where N is a number that can encode a universe-timeline unfathomably larger than ours.

ETA:

You're not complaining about VNM per se, you're complaining that you think your preferences correspond to a bounded utility function.

Yes, I agree. I have no grief with the VNM axioms. I think that agents should use only a polynomial amount of memory, and therefore desire that they follow utility functions that grow at most exponentially against the complexity of outcomes.

"A polynomial amount of memory ought to be enough for anyone." -- solipsist

I think both this and the longer version are excellent, but I wonder whether it would be better to have the long version in Discussion and the short version in Main rather than the other way around.

That was my first thought, but then I realized that I mainly want discussion on the longer version and that it's the longer version that has the FAI consequences, etc.

Yes, but which version do you want the higher-quality discussion on, and which one do you want more casual readers to see?

Which one is which, and how can you tell?


Context: I read Main much more than Discussion, due to time constraints and my perception that Discussion is less interesting in bulk. I tend to comment rarely, and thus I pay attention to those comments. Which means that something in Main is more likely to have comments from users like me than something in Discussion.

I’m not sure exactly what you meant, but I thought you meant Main gets more “casual” users and Discussion gets better, well, discussions because of the very active (non-casual) users are more predominant there.

My impression differs with regard to the latter, as I'd expect the very active posters to be mostly everywhere and the high-quality lurkers (which are "knowledgeable insiders" but comment rarely, and provide diversity to a discussion) are mostly in Main. It's probably true that we get more “newbies” in Main, but maybe the voting works better at hiding them for me.

You: "Well, because the evidence you did present me with had a likelihood ratio of at least a billion to one - I would've assigned less than 10-9 prior probability of seeing this when I woke up this morning - so in accordance with Bayes's Theorem I promoted the probability from 1/3↑↑↑3 to at least 109/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars."

Five bucks is easy to give. How much would you be willing to give, maximum, given this evidence? $500? $5000? All you have or can borrow? Everything your real or potential descendants will earn during their lifetimes? In how many generations?

When the evidence is a huge hole opening in the sky? I think common sense allows you to take fairly large actions after seeing a huge hole opening in the sky. I mean, in practice, I would tend to try and take certain actions intended to do something about the rather high posterior probability that I was hallucinating and be particularly wary of actions that sound like the sort of thing psychotic patients hallucinate, but this is an artifact of the odd construction of the scenario and wouldn't apply to the more realistic and likely-to-be-actually-encountered case of the physics theory which implied we could use dark energy for computation or whatever.

Just wondering how you would go about estimating the "optimal" amount to give in such a situation. After all, it's a fairly common occurrence, someone claiming a scientific breakthrough providing free energy and showing an apparent working perpetual motion machine, or a new herbal medicine which cures cancer and presenting dozens of convincing testimonies.

If you really, honestly can't tell the difference between that and an xrisk reduction charity, you're probably not reading LW in the first place. Or if you mean something else by this question, could you ask with a different example instead?

Sorry, I'm just an amateur asking what is probably a stupid question. Given the leverage N and the prior probability 1/n, where N>>n>>1, what is the optimal investment amount, in whichever units you prefer? Say, in percent of the resources available to you.

I can't offhand see how to translate the given numbers into a Kelly betting criterion. My own heuristic is something more along the lines of "Find the best thing that looks like it might actually work and do it." Things that won't actually work are not done anyway even if the current state of the search calls them "best", but I try to avoid Enrico-Fermi-style "ten percent" underestimates about what might actually have an impact. No holes have actually appeared in the sky, and I'm not presently working with any N larger than 10^80, and there's no reason to worry about small probabilities of affecting that when it's easy to find several different medium-sized candidates. I'd only want to complicate my reasoning any further if I ended up in a more complicated situation than that.

(I also don't think that perpetual motion machines have N>>n.)

I try to avoid Enrico-Fermi-style "ten percent" underestimates about what might actually have an impact.

But as gwern and other commenters showed, your characterization of accessible chain reactions as an easily drawn implication of then-known physics was wrong.

The best case for bias on Fermi's part in that post is the partially (but not very) independent claim that:

Fermi is also the one who said that nuclear energy was fifty years off in the unlikely event it could be done at all, two years (IIRC) before Fermi himself oversaw the construction of the first nuclear pile.

But you didn't give a citation or quote for that.

And you suggest that

Szilard and Rabi saw the logic in advance of the fact, not just afterward - though not in those exact terms; they just saw the physical logic, and then didn't adjust it downward for 'absurdity' or with more complicated rationalizations

But you only present evidence that they cared about a 10% probability (and Szilard was selected from history in part because of his right-tail estimate).

I'm still suspicious that Fermi could truly not have done better than "ten percent", and wonder if people are trying a little too hard not to give in to hindsight bias and overfitting, at the cost of failing to learn heuristics that could indeed generalize. Agreed that if +chain reaction implied a new fact of physics in the sense that it tells you about a previously unknown heavy element which emits free neutrons and is splittable, the standard heuristic "does the failure of this prediction tell us a new fact of physics" does not work in the vanilla sense. This doesn't mean that a fair estimate of the probability of at least one not-yet-examined element having the desired properties would have been ten percent. Chain reactions were not just barely possible for a large barely-critical fission plant 50 years later, rather they were soon achieved at a prompt supercritical grade adequate for nuclear explosions by two distinct pathways of U-235 refinement and P-239 breeding, both of which admittedly required effort, but was the putting-in of that effort unpredictable? But this should be continued in the other post rather than here.

rather they were soon achieved at a prompt supercritical grade adequate for nuclear explosions by two distinct pathways of U-235 refinement and P-239 breeding

They could realistically only breed enough Pu239 by starting with U235 fueled reactor. Everything that you can do in 1945 goes through U235 , which we have only because it has unusually long half life (350x the next stablest fissile isotope, Np-237) . On top of that, they didn't even know that fission released prompt secondary neutrons at all - those could of simply remained in the fission products and convert to protons via beta decay.

I know they got a critical reaction with a big heap of unrefined uranium. This makes no mention of uranium needing to be isotopically refined for plutonium production on the Manhattan Project. As you are generally a great big troll, I am afraid I cannot trust anything you say about isotopic refinement having been used or required without further references, but I will not actually downvote yet in case you're not lying. Got a cite?

Federation of American Scientists:

The only proven and practical source for the large quantities of neutrons needed to make plutonium at a reasonable speed is a nuclear reactor in which a controlled but self-sustaining 235 U fission chain reaction takes place.

Discussion of the particle accelerator route which would enable the bootstrapping of a non U-235 route eventually (producing a critical mass of 10+ kg of plutonium using superconductors and huge amounts of energy and accelerator time), but only with much increased difficulty:

Wilson died in 2000 but a paper he wrote on this topic in 1976 has now found its way onto the arXiv and it highlights some thought-provoking ideas.

At the time, Wilson was director of Fermilab where he was building an accelerator called the Energy Doubler/Saver, which employed superconducting magnets to steer a beam of high energy protons in a giant circle. These protons were to have energies of up to 1000 GeV.

The Energy Doubler was special because it was the first time superconductivity had been used on a large scale, something that had significant implications for the amount of juice required to make the thing work. “One consequence of the application of superconductivity to accelerator construction is that the power consumption of accelerators will become much smaller,” said Wilson. And that raised an interesting prospect.

Imagine the protons in this accelerator are sent into a block of uranium. Each proton might then be expected to generate a shower of some 60,000 neutrons in the material and most of these would go on to be absorbed by the nuclei to form 60,000 plutonium atoms. When burned in a nuclear reactor, each plutonium atom produces 0.2 GeV of fission energy. So 60,000 of them would produce 12,000 GeV.

Using this back-of-an-envelope calculation, Wilson worked out that a single 1000 GeV proton could lead to the release of 12,000 GeV of fission energy. Of course, this neglects all the messy fine details in which large amounts of energy can be lost. For example, it takes some 20MW of power to produce an 0.2MW beam in the Energy Doubler.

But even with those kinds of losses, it certainly seems worthwhile to study the process in more detail to see if overall energy production is possible.

The original giant heap of uranium bricks with k=1.0006 (CP-1 the first pile) - was that chain reaction all due to U235? Maybe the spontaneous fissions are mostly U235, but are the further fissions mostly neutrons hitting U235? This doesn't correspond with my mental model of a pile like that - surely the 2-3 neutrons per fission would mostly hit U238 rather than U235. I also know there were graphite bricks in the pile and graphite bricks are for having slow neutrons being captured by U238.

Let's suppose U235 didn't exist any more. We couldn't build a huge heap of pure U238 uranium bricks, and throw in a small number of neutrons from somewhere else (radium?) to get things started?

EDIT: Okay, I just read something else about slow neutrons being less likely to be absorbed by U238, so maybe the whole pile is just the tiny fraction of natural U235 with the U238 accomplishing nothing? This would indeed surprise me, but I guess then the case can be made for all access to chain reactions bottlenecking through U235. Still seems a bit suspicious and I would like to ask some physicist who isn't frantically trying to avoid hindsight bias how things look in retrospect.

EDIT2: Just read a third thing about slow neutrons being more easily captured by U238 again.

Let's suppose U235 didn't exist any more. We couldn't build a huge heap of pure U238 uranium bricks, and throw in a small number of neutrons from somewhere else (radium?) to get things started?

U238 is essentially only fissioned by fast neutrons (only fast neutrons are not dramatically more likely to be simply captured than to fission it), and overall tends to capture neutrons without being fissioned (that's how you get Pu-239: U-238 absorbs a neutron, becomes U-239, then after one beta decay becomes Neptunium-239, then after another beta-decay, Plutonium-239).

So, while it does fission and does release neutrons when it fissions, it doesn't sustain chain reaction. Fission doesn't imply chain reaction even with secondary neutrons.

Fortunately U238 has small enough capture cross section that you can make a reactor work with natural uranium, but only if you use graphite to slow neutrons down. You need graphite because http://www.whatisnuclear.com/articles/fast_reactor.html (scroll down to graphs, note that U235 fission cross section increases faster with decrease in neutron energy than U238 capture cross section so even though both are larger at lower energies, u235 fission wins over u238 capture. Also note nice almost-fractal peaks and valleys (resonance) which very much get in your way when you try to figure anything out from real data. This is a true extreme miracle of actual human rationality that this stuff was figured out sufficiently to build anything).

surely the 2-3 neutrons per fission would mostly hit U238 rather than U235

U-235 has a higher neutron absorption cross-section than U-238, so more U-238 than U-235 doesn't necessarily mean more neutrons hitting U-238 than U-235.

How much higher? Natural uranium, which is what they used in CP-1, is over 99% U238.

Wikipedia under "Neutron cross section" lists U-235 as having a capture cross-section of 60 and a fission cross section of 300, while U-238 has a capture cross-section of 2. This is for thermal neutrons (the cross-section depends on the neutron speed).

I'm surprised. I guess CP-1 could've been, in effect, mostly empty space filled with U-235 dust. And I'll go ahead and agree that if all non-particle-accelerator pathways to chain reactions bottlenecked through U-235 then Fermi may have been correct to say 10% (though it is still not totally clear why 10% would've been a better estimate than 2% or 50%, but I'm not Fermi). This would then form only the second case I can think of offhand where erroneous scientific pessimism was not in defiance of laws or evidence already known. (The other one is Kelvin's careful calculation that the Sun was probably around 60 million years old, which was wrong, but because of new physics - albeit plausibly in a situation where new physics could've rightly been expected, and where there was evidence from geology. Everything else I can think of offhand is "You can't have a train going at 35mph, people will suffocate!" or "You can't build nanomachines!" so you can see why my priors made me suspicious of Fermi.)

I wonder how low is the probability of obtaining at least 1 such sufficiently stable fissile isotope, if you change fundamental physical constants a little, preserving stars and life (and not making Earth blow up). It may be very low, actually, seeing it as U235 does have unusually long half life for a fissile isotope.

EDIT: Scratch that, your post is the right response.

If this is right, shouldn't you update more on the example you chose as being the best example turning out to be the other way?

I've currently got a Discussion post running to figure out how much this generalizes.

In a natural uranium fuelled reactor, the actual fuel (at startup) is U235, present in the natural uranium at a concentration of about 0.7%. No U235 in nature = no easy plutonium.

Those two "distinct" pathways both rely on properties of 1 highly unusual nucleus, U235, which is both easy to fission, and stable enough to still be around after ~5 billion years. How unusually stable is it? Well, from the date of, say, 40 million years since explosion of the supernova that formed Solar system, it was for all intents and purposes the only one such isotope left. (Every other fissile isotope was gone not because of fizzling or spontaneous fission but because of alpha decay and such)

I explained it in greater detail here .

Depends on your opportunity costs.

I'm really confused. Like, really, really confused. Hopefully someone can illuminate this topic for me, because right now, I'm not seeing where this "leverage penalty" comes from. Complexity penalties are pretty obviously a consequence of formalizations of Occam's Razor, in particular Solomonoff Induction, but why does the idea of a "leverage penalty" even exist? It seems like a post hoc justification tacked on in order to somehow deal with the original Pascal's Mugging situation. If I started from the basics of probability theory and computational theory, it seems conceivable to me that given enough time, I might be able to independently arrive at the idea of complexity penalties. It does not, on the other hand, seem likely that I would ever be able to derive this concept of a "leverage penalty" from first principles; it seems like a clever after-the-fact justification.

I do realize, however, that the leverage penalty was proposed by a very smart person (Robin Hanson), and then later discussed by another very smart person (Eliezer), both of whom are much smarter than I am, so it is much more likely that I am the one confused here than that they are actually engaging in after-the-fact rationalization. So my question right now is this: where do "leverage penalties" come from? Could someone take the time to humor an aspiring student of mathematics and explain? Thanks in advance!

(Right now, I'm not sure where leverage penalties come from, but if they do come from somewhere, as opposed to being pulled out of thin air, my bet is on anthropics. If this is true, it wouldn't be surprising, because I find anthropics hellishly confusing most of the time, so it seems reasonable that I would be confused about a concept derived from that area.)

Anthropics would be one way of reading it, yes. Think of it as saying, in addition to wanting all of our Turing machines to add up to 1, we also want all of the computational elements inside our Turing machines to add up to 1 because we're trying to guess which computational element 'we' might be. This might seem badly motivated in the sense that we can only say "Because our probabilities have to add up to 1 for us to think!" rather than being able to explain why magical reality fluid ought to work that way a priori, but the justification for a simplicity prior isn't much different - we have to be able to add up all the Turing machines in their entirety to 1 in order to think. So Turing machines that use lots of tape get penalties to the probability of your being any particular or special element inside them. Being able to affect lots of other elements is a kind of specialness.

I'm confused because I had always thought it would be the exact opposite. To predict your observational history given a description of the universe, solomonoff induction needs to find you in it. The more special you are, the easier you are to find and thus the easier it is to find your observational history.

Version 2 is the way I had approached Pascal's Mugging until I read this post, and I like the logical uncertainty-based answer. But does this mean I'm not getting flimple utility?

I'll give it to you for five dollars.

You live in a matrix universe, and you also know that matrix lords occasionally come down to offer strange requests to individuals. One of these matrix lords asks for you to give him 5 dollars, or else he will press a red button that has a one in a googol chance of killing a googolplex people. The red button's randomness is not determined by quantum effects. Do you hand over the five dollars in that case?

Intuitively? Yes, of course I do. I don't trust that intuition too strongly, but this thought experiment does make me update a lot about the value of a lot of ideas about Pascal's mugging.

I would have posted the short version to Main and the long version to Discussion.

Isn't it also reasonable in the last case to reevaluate the relative utility of five dollars and of a googleplex lives?

Is it inconsistent in the first two cases that you would believe him (after he shows the extraordinary evidence) if instead of a googleplex lives, he offered to save more than 10?

Here's a point of consideration: if you take Kurzweil's solution, then you can avoid Pascal's mugging when you are an agent, and your utility function is defined over similar agents. However, this solution wouldn't work on, for example, a paperclip maximizer, which would still be vulnerable - anthropiic reasoning does not apply over paperclips.

While it might be useful to have Friendly-style AIs be more resilient to P-mugging than simple maximizers, it's not exactly satisfying as an epistemological device.

Isn't there a leverage penalty built into a Kolmogorov complexity prior if the bit string you're trying to generate is a particular agent's sense data? Because the more stuff there is, the more bits are required to locate that agent? And does this solve a problem with just using normal anthropics, where the leverage penalty doesn't help a paperclip maximizer deal with the potential universe that is just it and a bunch of paperclips that could be destroyed, because paperclips aren't anthropic reasoners?

In all examples but the physicist one, what I would believe is that I'm dreaming (counting being a Boltzmann brain or similar as ‘dreaming’). But then again, if I'm dreaming I don't actually lose anything by giving the mugger five dollars, so...