You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife

7 crmflynn 02 November 2015 11:03PM

This is a bit rough, but I think that it is an interesting and potentially compelling idea. To keep this short, and accordingly increase the number of eyes over it, I have only sketched the bare bones of the idea. 

     1)      Empirically, people have varying intuitions and beliefs about causality, particularly in Newcomb-like problems (http://wiki.lesswrong.com/wiki/Newcomb's_problemhttp://philpapers.org/surveys/results.pl, and https://en.wikipedia.org/wiki/Irresistible_grace).

     2)      Also, as an empirical matter, some people believe in taking actions after the fact, such as one-boxing, or Calvinist “irresistible grace”, to try to ensure or conform with a seemingly already determined outcome. This might be out of a sense of retrocausality, performance, moral honesty, etc. What matters is that we know that they will act it out, despite it violating common sense causality. There has been some great work on decision theory on LW about trying to thread this needle well.

     3)      The second disjunct of the simulation argument (http://wiki.lesswrong.com/wiki/Simulation_argument) shows that the decision making of humanity is evidentially relevant in what our subjective credence should be that we are in a simulation. That is to say, if we are actively headed toward making simulations, we should increase our credence of being in a simulation, if we are actively headed away from making simulations, through either existential risk or law/policy against it, we should decrease our credence.

      4)      Many, if not most, people would like for there to be a pleasant afterlife after death, especially if we could be reunited with loved ones.

     5)      There is no reason to believe that simulations which are otherwise nearly identical copies of our world, could not contain, after the simulated bodily death of the participants, an extremely long-duration, though finite, "heaven"-like afterlife shared by simulation participants.

     6)      Our heading towards creating such simulations, especially if they were capable of nesting simulations, should increase credence that we exist in such a simulation and should perhaps expect a heaven-like afterlife of long, though finite, duration.

     7)      Those who believe in alternative causality, or retrocausality, in Newcomb-like situations should be especially excited about the opportunity to push the world towards surviving, allowing these types of simulations, and creating them, as it would potentially suggest, analogously, that if they work towards creating simulations with heaven-like afterlives, that they might in some sense be “causing” such a heaven to exist for themselves, and even for friends and family who have already died. Such an idea of life-after-death, and especially for being reunited with loved ones, can be extremely compelling.

     8)      I believe that people matching the above description, that is, holding both an intuition in alternative causality, and finding such a heaven-like-afterlife compelling, exist. Further, the existence of such people, and their associated motivation to try to create such simulations, should increase the credence even of two-boxing types, that we already live in such a world with a heaven-like afterlife. This is because knowledge of a motivated minority desiring simulations should increase credence in the likely success of simulations. This is essentially showing that “this probably happened before, one level up” from the two-box perspective.

     9)      As an empirical matter, I also think that there are people who would find the idea of creating simulations with heaven-like afterlives compelling, even if they are not one-boxers, from a simply altruistic perspective, both since it is a nice thing to do for the future sim people, who can, for example, probabilistically have a much better existence than biological children on earth can, and as it is a nice thing to do to increase the credence (and emotional comfort) of both one-boxers and two-boxers in our world thinking that there might be a life after death.

     10)   This creates the opportunity for a secular movement in which people work towards creating these simulations, and use this work and potential success in order to derive comfort and meaning from their life. For example, making donations to a simulation-creating or promoting, or existential threat avoiding, think-tank after a loved one’s death, partially symbolically, partially hopefully.

     11)   There is at least some room for Pascalian considerations even for two-boxers who allow for some humility in their beliefs. Nozick believed one-boxers will become two boxers if Box A is raised to 900,000, and two-boxers will become one-boxers if Box A is lowered to $1. Similarly, trying to work towards these simulations, even if you do not find it altruistically compelling, and even if you think that the odds of alternative or retrocausality is infinitesimally small, might make sense in that the reward could be extremely large, including potentially trillions of lifetimes worth of time spent in an afterlife “heaven” with friends and family.

Finally, this idea might be one worth filling in (I have been, in my private notes for over a year, but am a bit shy to debut that all just yet, even working up the courage to post this was difficult) if only because it is interesting, and could be used as a hook to get more people interested in existential risk, including the AI control problem. This is because existential catastrophe is probably the best enemy of credence in the future of such simulations, and accordingly in our reasonable credence in thinking that we have such a heaven awaiting us after death now. A short hook headline like “avoiding existential risk is key to afterlife” can get a conversation going. I can imagine Salon, etc. taking another swipe at it, and in doing so, creating publicity which would help in finding more similar minded folks to get involved in the work of MIRI, FHI, CEA etc. There are also some really interesting ideas about acausal trade, and game theory between higher and lower worlds, as a form of “compulsion” in which they punish worlds for not creating heaven containing simulations (therefore effecting their credence as observers of the simulation), in order to reach an equilibrium in which simulations with heaven-like afterlives are universal, or nearly universal. More on that later if this is received well.

Also, if anyone would like to join with me in researching, bull sessioning, or writing about this stuff, please feel free to IM me. Also, if anyone has a really good, non-obvious pin with which to pop my balloon, preferably in a gentle way, it would be really appreciated. I am spending a lot of energy and time on this if it is fundamentally flawed in some way.

Thank you.

*******************************

November 11 Updates and Edits for Clarification

     1)      There seems to be confusion about what I mean by self-location and credence. A good way to think of this is the Sleeping Beauty Problem (https://wiki.lesswrong.com/wiki/Sleeping_Beauty_problem)

If I imagine myself as Sleeping Beauty (and who doesn’t?), and I am asked on Sunday what my credence is that the coin will be tails, I will say 1/2. If I am awakened during the experiment without being told which day it is and am asked what my credence is that the coin was tails, I will say 2/3. If I am then told it is Monday, I will update my credence to ½. If I am told it is Tuesday I update my credence to 1. If someone asks me two days after the experiment about my credence of it being tails, if I somehow do not know the days of the week still, I will say ½. Credence changes with where you are, and with what information you have. As we might be in a simulation, we are somewhere in the “experiment days” and information can help orient our credence. As humanity potentially has some say in whether or not we are in a simulation, information about how humans make decisions about these types of things can and should effect our credence.

Imagine Sleeping Beauty is a lesswrong reader. If Sleeping Beauty is unfamiliar with the simulation argument, and someone asks her about her credence of being in a simulation, she probably answers something like 0.0000000001% (all numbers for illustrative purposes only). If someone shows her the simulation argument, she increases to 1%. If she stumbles across this blog entry, she increases her credence to 2%, and adds some credence to the additional hypothesis that it may be a simulation with an afterlife. If she sees that a ton of people get really interested in this idea, and start raising funds to build simulations in the future and to lobby governments both for great AI safeguards and for regulation of future simulations, she raises her credence to 4%. If she lives through the AI superintelligence explosion and simulations are being built, but not yet turned on, her credence increases to 20%. If humanity turns them on, it increases to 50%. If there are trillions of them, she increases her credence to 60%. If 99% of simulations survive their own run-ins with artificial superintelligence and produce their own simulations, she increases her credence to 95%. 

2)  This set of simulations does not need to recreate the current world or any specific people in it. That is a different idea that is not necessary to this argument. As written the argument is premised on the idea of creating fully unique people. The point would be to increase our credence that we are functionally identical in type to the unique individuals in the simulation. This is done by creating ignorance or uncertainty in simulations, so that the majority of people similarly situated, in a world which may or may not be in a simulation, are in fact in a simulation. This should, in our ignorance, increase our credence that we are in a simulation. The point is about how we self-locate, as discussed in the original article by Bostrom. It is a short 12-page read, and if you have not read it yet, I would encourage it:  http://simulation-argument.com/simulation.html. The point about past loved ones I was making was to bring up the possibility that the simulations could be designed to transfer people to a separate after-life simulation where they could be reunited after dying in the first part of the simulation. This was not about trying to create something for us to upload ourselves into, along with attempted replicas of dead loved ones. This staying-in-one simulation through two phases, a short life, and relatively long afterlife, also has the advantage of circumventing the teletransportation paradox as “all of the person" can be moved into the afterlife part of the simulation.  

 

Anti-Pascaline satisficer

3 Stuart_Armstrong 14 April 2015 06:49PM

A putative new idea for AI control; index here.

It occurred to me that the anti-Pascaline agent design could be used as part of a satisficer approach.

The obvious thing to reduce dangerous optimisation pressure is to make a bounded utility function, with an easily achievable bound. Such as giving them a utility linear in paperclips that maxs out at 10.

The problem with this is that, if the entity is a maximiser (which it might become), it can never be sure that it's achieved its goals. Even after building 10 paperclips, and an extra 2 to be sure, and an extra 20 to be really sure, and an extra 3^^^3 to be really really sure, and extra cameras to count them, with redundant robots patrolling the cameras to make sure that they're all behaving well, etc... There's still an ε chance that it might have just dreamed this, say, or that its memory is faulty. So it has a current utility of (1-ε)10, and can increase this by reducing ε - hence by building even more paperclips.

Hum... ε, you say? This seems a place where the anti-Pascaline design could help. Here we would use it at the lower bound of utility. It currently has probability ε of having utility < 10 (ie it has not built 10 paperclips) and (1-ε) of having utility = 10. Therefore and anti-Pascaline agent with ε lower bound would round this off to 10, discounting the unlikely event that it has been deluded, and thus it has no need to build more paperclips or paperclip counting devices.

Note that this is an un-optimising approach, not an anti-optimising one, so the agent may still build more paperclips anyway - it just has no pressure to do so.

Anti-Pascaline agent

4 Stuart_Armstrong 12 March 2015 02:17PM

A putative new idea for AI control; index here.

Pascal's wager-like situations come up occasionally with expected utility, making some decisions very tricky. It means that events of the tiniest of probability could dominate the whole decision - intuitively unobvious, and a big negative for a bounded agent - and that expected utility calculations may fail to converge.

There are various principled approaches to resolving the problem, but how about an unprincipled approach? We could try and bound utility functions, but the heart of the problem is not high utility, but hight utility combined with low probability. Moreover, this has to behave sensibly with respect to updating.

 

The agent design

Consider a UDT-ish agent A looking at input-output maps {M} (ie algorithms that could determine every single possible decision of the agent in the future). We allow probabilistic/mixed output maps as well (hence A has access to a source of randomness). Let u be a utility function, and set 0 < ε << 1 to be the precision. Roughly, we'll be discarding the highest (and lowest) utilities that are below probability ε. There is no fundamental reason that the same ε should be used for highest and lowest utilities, but we'll keep it that way for the moment.

The agent is going to make an "ultra-choice" among the various maps M (ie fixing its future decision policy), using u and ε to do so. For any M, designate by A(M) the decision of the agent to use M for its decisions.

Then, for any map M, set max(M) to be the lowest number s.t P(u ≥ max(M)|A(M)) ≤ ε. In other words, if the agent decides to use M as its decision policy, this is the maximum utility that can be achieved if we ignore the highest valued ε of the probability distribution. Similarly, set min(M) to be the highest number s.t. P(u ≤ min(M)|A(M)) ≤ ε.

Then define the utility function uMε, which is simply u, bounded between max(M) and min(M). Now calculate the expected value of uMε given A(M), call this Eε(u|A(M)).

The agent then chooses the M that maximises Eε(u|A(M)). Call this the ε-precision u-maximising algorithm.

 

Stability of the design

The above decision process is stable, in that there is a single ultra-choice to be made, and clear criteria for making that ultra-choice. Realistic and bounded agents, however, cannot calculate all the M in sufficient detail to get a reasonable outcome. So we can ask whether the design is stable for a bounded agent.

Note that this question is underdefined, as there are many ways of being bounded, and many ways of cashing out ε-precision u-maximising into bounded form. Most likely, this will not be a direct expected utility maximalisation, so the algorithm will be unstable (prone to change under self-modification). But how exactly it's unstable is an interesting question.

I'll look at one particular situation: one where A was tasked with creating subagents that would go out and interact with the world. These agents are short-sighted: they apply ε-precision u-maximising not to the ultra-choice, but to each individual expected utility calculation (we'll assume the utility gains and losses for each decision is independent).

A has a single choice: what to set ε to for the subagents. Intuitively, it would seem that A would set ε lower than its own value; this could correspond roughly to an agent self-modifying to remove the ε-precision restriction from itself, converging on becoming a u-maximiser. However:

  • Theorem: There are (stochastic) worlds in which A will set the subagent precision to be higher, lower or equal to its own precision ε.

The proof will be by way of illustration of the interesting things that can happen in this setup. Let B be the subagent whose precision A sets.

Let C(p) be a coupon that pays out 1 with probability p. xC(p) simply means the coupon pays out x instead of 1. Each coupon costs ε2 utility. This is negligible, and only serves to break ties. Then consider the following worlds:

  • In W1, B will be offered the possibility of buying C(0.75ε).
  • In W2, B will be offered the possibility of buying C(1.5ε).
  • In W3, B will be offered the possibility of buying C(0.75ε), and the offer will be made twice.
  • In W4, B will be offered, with 50% probability, the possibility of buying C(1.5ε).
  • In W5, B will be offered, with 50% probability, the possibility of buying C(1.5ε), and otherwise the possibility buying 2C(1.5ε).
  • In W6, B will be offered, with 50% probability, the possibility of buying C(0.75ε), and otherwise the possibility buying 2C(1.5ε).
  • In W7, B will be offered, with 50% probability, the possibility of buying C(0.75ε), and otherwise the possibility buying 2C(1.05ε).

From A’s perspective, the best input-output maps are: in W1, don’t buy, in W2, buy, in W3, buy both, in W4, don’t buy (because the probability of getting above 0 utility by buying, is, from A's initial perspective, 1.5ε/2 = 0.75ε).

W5 is more subtle, and interesting – essentially A will treat 2C(1.5ε) as if it were C(1.5ε) (since the probability of getting above 1 utility by buying is 1.5ε/2 = 0.75ε, while the probability of getting above zero by buying is (1.5ε+1.5ε)/2=1.5ε). Thus A would buy everything offered.

Similarly, in W6, the agent would buy everything, and in W7, the agent would buy nothing (since the probability of getting above zero by buying is now (1.05ε + 0.75ε)/2 = 0.9ε).

So in W1 and W2, the agent can leave the sub-agent precision at ε. In W2, it needs to lower it below 0.75ε. In W4, it needs to raise it above 1.5ε. In W5 it can leave it alone, while in W6 it must lower it below 0.75ε, and in W7 it must raise it above 1.05ε.

 

Irrelevant information

One nice feature about this approach is that it ignores irrelevant information. Specifically:

  • Theorem: Assume X is a random variable that is irrelevant to the utility function u. If A (before knowing X) has to design successor agents that will exist after X is revealed, then (modulo a few usual assumptions about only decisions mattering, not internal thought processes) it will make these successor agents isomorphic to copies of itself, i.e. ε-precision u-maximising algorithms (potentially with a different way of breaking ties).

These successor agents are not the short-sighted agents of the previous model, but full ultra-choice agents. Their ultra-choice is over all decisions to come, while A's ultra-choice (which is simply a choice) is over all agent designs.

For the proof, I'll assume X is boolean valued (the general proof is similar). Let M be the input-output map A would choose for itself, if it were to make all the decisions itself rather than just designing a subagent. Now, it's possible that M(X) will be different from M(¬X) (here M(X) and M(¬X) are contractions of the input-output map by adding in one of the inputs).

Define the new input-ouput map M' by defining a new internal variable Y in A (recall that A has access to a source of randomness). Since this variable is new, M is independent of the value of Y. Then M' is defined as M with X and Y permuted. Since both Y and X are equally irrelevant to u, Eε(u|A(M))=Eε(u|A(M')), so M' is an input output map that fulfils the ε-precision u-maximising. And M'(X)=M'(¬X), so M' is independent of X.

Now consider the subagent that runs the same algorithm as A, and has seen X. Because of the irrelevance of X, M'(X) will still fulfil ε-precision u-maximising (we can express any fact relevant to u in the form of Zs, with P(Z)=P(Z|X), and then the algorithm is the same).

Similarly, a subagent that has seen ¬X will run M'(¬X). Putting these together, the subagent will expect to run M'(X) with probability P(X) and M'(¬X) with probability P(¬X)=1-P(X).

Since M'(X)=M'(¬X), this whole thing is just M'. So if A creates a copy of itself (possibly tweaking the tie-breaking so that M' is selected), then it will achieve its maximum according to ε-precision u-maximising.

Testing lords over foolish lords: gaming Pascal's mugging

2 Stuart_Armstrong 07 May 2013 06:47PM

There are two separate reasons to reject Pascal's mugger's demands. The first one is if you have a system of priors or a method of updating that precluded you from going along with the deal. The second reason is that if it becomes known that you accept Pascal's mugger situations, people are going to seek you out and take advantage of you.

I think it's useful to keep the two reasons very separate. If Pascal's mugger was a force of nature - a new theory of physics, maybe - then the case for keeping to expected utility maximisation may be quite strong. But when there are opponents, everything gets much more complicated - which is why game theory has thousands of published research papers, while expected utility maximisation is taught in passing in other subjects.

But does this really affect the argument? It means that someone approaching you with a Pascal's mugging today is much less likely to be honest (and much more likely to have simply read about it on Less Wrong). But that's a relatively small shift in probability, in an area where the number are already so huge/tiny.

Nevertheless, it seems that "reject Pascal's muggings (and other easily exploitable gambles)" may be a reasonable position to take, even if you agreed with the expected utility calculation. First, of course, you would gain that you reject all the human attempts to exploit you. But there's another dynamic: the "Lords of the Matrix" are players too. They propose certain deals to you for certain reasons, and fail to propose them to you for other reasons. We can model three kinds of lords:

  1. The foolish lords, who will offer a Pascal's mugging no matter what they predict your reaction will be.
  2. The sadistic lords, who will offer a deal you won't accept.
  3. The testing lords, who will offer a deal you will accept, but push you to the edge of your logic and value system.

Precommitting to rejecting the mugging burns you only with the foolish lords. The sadistic lords won't offer an acceptable deal anyway, and the testing lords will offer you a better deal if you've made such a precommitment. So the gain is the loss with (some of) the foolish lords versus a gain with the testing lords. Depending on your probability distribution over the lord types, this can be a reasonable thing to do, even if you would accept the impersonal version of the mugging.

Pascal's Muggle (short version)

29 Eliezer_Yudkowsky 05 May 2013 11:36PM

Shortened version of:  Pascal's Muggle:  Infinitesimal Priors and Strong Evidence

One proposal which has been floated for dealing with Pascal's Mugger is to penalize hypotheses that let you affect a large number of people, in proportion to the number of people affected - what we could call perhaps a "leverage penalty" instead of a "complexity penalty".  This isn't just for Pascal's Mugger in particularly, it seems required to have expected utilities in general converge when the 'size' of scenarios can grow much faster than their algorithmic complexity.

Unfortunately this potentially leads us into a different problem, that of Pascal's Muggle.

Suppose a poorly-dressed street person asks you for five dollars in exchange for doing a googolplex's worth of good using his Matrix Lord powers - say, saving the lives of a googolplex other people inside computer simulations they're running.

"Well," you reply, "I think that it would be very improbable that I would be able to affect so many people through my own, personal actions - who am I to have such a great impact upon events?  Indeed, I think the probability is somewhere around one over googolplex, maybe a bit less.  So no, I won't pay five dollars - it is unthinkably improbable that I could do so much good!"

"I see," says the Mugger.

A wind begins to blow about the alley, whipping the Mugger's loose clothes about him as they shift from ill-fitting shirt and jeans into robes of infinite blackness, within whose depths tiny galaxies and stranger things seem to twinkle.  In the sky above, a gap edged by blue fire opens with a horrendous tearing sound - you can hear people on the nearby street yelling in sudden shock and terror, implying that they can see it too - and displays the image of the Mugger himself, wearing the same robes that now adorn his body, seated before a keyboard and a monitor.

"That's not actually me," the Mugger says, "just a conceptual representation, but I don't want to drive you insane.  Now give me those five dollars, and I'll save a googolplex lives, just as promised.  It's easy enough for me, given the computing power my home universe offers.  As for why I'm doing this, there's an ancient debate in philosophy among my people - something about how we ought to sum our expected utilities - and I mean to use the video of this event to make a point at the next decision theory conference I attend.   Now will you give me the five dollars, or not?"

"Mm... no," you reply.

"No?" says the Mugger.  "I understood earlier when you didn't want to give a random street person five dollars based on a wild story with no evidence behind it whatsoever.  But surely I've offered you evidence now."

"Unfortunately, you haven't offered me enough evidence," you explain.

"Seriously?" says the Mugger.  "I've opened up a fiery portal in the sky, and that's not enough to persuade you?  What do I have to do, then?  Rearrange the planets in your solar system, and wait for the observatories to confirm the fact?  I suppose I could also explain the true laws of physics in the higher universe in more detail, and let you play around a bit with the computer program that encodes all the universes containing the googolplex people I would save if you just gave me the damn five dollars -"

"Sorry," you say, shaking your head firmly, "there's just no way you can convince me that I'm in a position to affect a googolplex people, because the prior probability of that is one over googolplex.  If you wanted to convince me of some fact of merely 2-100 prior probability, a mere decillion to one - like that a coin would come up heads and tails in some particular pattern of a hundred coinflips - then you could just show me 100 bits of evidence, which is within easy reach of my brain's sensory bandwidth.  I mean, you could just flip the coin a hundred times, and my eyes, which send my brain a hundred megabits a second or so - though that gets processed down to one megabit or so by the time it goes through the lateral geniculate nucleus - would easily give me enough data to conclude that this decillion-to-one possibility was true.  But to conclude something whose prior probability is on the order of one over googolplex, I need on the order of a googol bits of evidence, and you can't present me with a sensory experience containing a googol bits.  Indeed, you can't ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one - evidence I'm a googolplex times more likely to encounter if the hypothesis is true, than if it's false - because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex.  You know the old saying about how once you assign something probability one, or probability zero, you can't update that probability regardless of what evidence you see?  Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way."

"So no matter what evidence I show you," the Mugger says - as the blue fire goes on crackling in the torn sky above, and screams and desperate prayers continue from the street beyond - "you can't ever notice that you're in a position to help a googolplex people."

"Right!" you say.  "I can believe that you're a Matrix Lord.  I mean, I'm not a total Muggle, I'm psychologically capable of responding in some fashion to that giant hole in the sky.  But it's just completely forbidden for me to assign any significant probability whatsoever that you will actually save a googolplex people after I give you five dollars.  You're lying, and I am absolutely, absolutely, absolutely confident of that."

"So you weren't just invoking the leverage penalty as a plausible-sounding way of getting out of paying me the five dollars earlier," the Mugger says thoughtfully.  "I mean, I'd understand if that was just a rationalization of your discomfort at forking over five dollars for what seemed like a tiny probability, when I hadn't done my duty to present you with a corresponding amount of evidence before demanding payment.  But you... you're acting like an AI would if it was actually programmed with a leverage penalty on hypotheses!"

"Exactly," you say.  "I'm forbidden a priori to believe I can ever do that much good."

"Why?" the Mugger says curiously.  "I mean, all I have to do is press this button here and a googolplex lives will be saved."  The figure within the blazing portal above points to a green button on the console before it.

"Like I said," you explain again, "the prior probability is just too infinitesimal for the massive evidence you're showing me to overcome it -"

The Mugger shrugs, and vanishes in a puff of purple mist.

The portal in the sky above closes, taking with the console and the green button.

(The screams go on from the street outside.)

A few days later, you're sitting in your office at the physics institute where you work, when one of your colleagues bursts in through your door, seeming highly excited.  "I've got it!" she cries.  "I've figured out that whole dark energy thing!  Look, these simple equations retrodict it exactly, there's no way that could be a coincidence!"

At first you're also excited, but as you pore over the equations, your face configures itself into a frown.  "No..." you say slowly.  "These equations may look extremely simple so far as computational complexity goes - and they do exactly fit the petabytes of evidence our telescopes have gathered so far - but I'm afraid they're far too improbable to ever believe."

"What?" she says.  "Why?"

"Well," you say reasonably, "if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we'd be able to create around a googolplex people that way.  But that would mean that we, here on Earth, are in a position to affect a googolplex people - since, if we blow ourselves up via a nanotechnological war or (cough) make certain other errors, those googolplex people will never come into existence.  The prior probability of us being in a position to impact a googolplex people is on the order of one over googolplex, so your equations must be wrong."

"Hmm..." she says.  "I hadn't thought of that.  But what if these equations are right, and yet somehow, everything I do is exactly balanced, down to the googolth decimal point or so, with respect to how it impacts the chance of modern-day Earth participating in a chain of events that leads to creating an intergalactic civilization?"

"How would that work?" you say.  "There's only seven billion people on today's Earth - there's probably been only a hundred billion people who ever existed total, or will exist before we go through the intelligence explosion or whatever - so even before analyzing your exact position, it seems like your leverage on future affairs couldn't reasonably be less than one in a ten trillion part of the future or so."

"But then given this physical theory which seems obviously true, my acts might imply expected utility differentials on the order of 1010100-13," she explains, "and I'm not allowed to believe that no matter how much evidence you show me."


This problem may not be as bad as it looks; a leverage penalty may lead to more reasonable behavior than depicted above, after taking into account Bayesian updating:


Mugger:  "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."

You:  "Nope."

Mugger:  "Why not?  It's a really large impact."

You:  "Yes, and I assign a probability on the order of 1 in 3↑↑↑3 that I would be in a unique position to affect 3↑↑↑3 people."

Mugger:  "Oh, is that really the probability that you assign?  Behold!"

(A gap opens in the sky, edged with blue fire.)

Mugger:  "Now what do you think, eh?"

You:  "Well... I can't actually say this has a likelihood ratio of 3↑↑↑3 to 1.  No stream of evidence that can enter a human brain over the course of a century is ever going to have a likelihood ratio larger than, say, 101026 to 1 at the absurdly most, assuming one megabit per second of sensory data, for a century, each bit of which has at least a 1-in-a-trillion error probability.  You'd probably start to be dominated by Boltzmann brains or other exotic minds well before then."

Mugger:  "So you're not convinced."

You:  "Indeed not.  The probability that you're telling the truth is so tiny that God couldn't find it with an electron microscope.  Here's the five dollars."

Mugger:  "Done!  You've saved 3↑↑↑3 lives!  Congratulations, you're never going to top that, your peak life accomplishment will now always lie in your past.  But why'd you give me the five dollars if you think I'm lying?"

You:  "Well, because the evidence you did present me with had a likelihood ratio of at least a billion to one - I would've assigned less than 10-9 prior probability of seeing this when I woke up this morning - so in accordance with Bayes's Theorem I promoted the probability from 1/3↑↑↑3 to at least 109/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars."


I confess that I find this line of reasoning a bit suspicious - it seems overly clever - but at least on the level of intuitive-virtues-of-rationality it doesn't seem completely stupid in the same way as Pascal's Muggle.  This muggee is at least behaviorally reacting to the evidence.  In fact, they're reacting in a way exactly proportional to the evidence - they would've assigned the same net importance to handing over the five dollars if the Mugger had offered 3↑↑↑4 lives, so long as the strength of the evidence seemed the same.

But I still feel a bit nervous about the idea that Pascal's Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 109/3↑↑↑3 that it's doing any good.  My own reaction would probably be more like this:


Mugger:  "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."

Me:  "Nope."

Mugger:  "So then, you think the probability I'm telling the truth is on the order of 1/3↑↑↑3?"

Me:  "Yeah... that probably has to follow.  I don't see any way around that revealed belief, given that I'm not actually giving you the five dollars.  I've heard some people try to claim silly things like, the probability that you're telling the truth is counterbalanced by the probability that you'll kill 3↑↑↑3 people instead, or something else with a conveniently exactly equal and opposite utility.  But there's no way that things would balance out that neatly in practice, if there was no a priori mathematical requirement that they balance.  Even if the prior probability of your saving 3↑↑↑3 people and killing 3↑↑↑3 people, conditional on my giving you five dollars, exactly balanced down to the log(3↑↑↑3) decimal place, the likelihood ratio for your telling me that you would "save" 3↑↑↑3 people would not be exactly 1:1 for the two hypotheses down to the log(3↑↑↑3) decimal place.  So if I assigned probabilities much greater than 1/3↑↑↑3 to your doing something that affected 3↑↑↑3 people, my actions would be overwhelmingly dominated by even a tiny difference in likelihood ratio elevating the probability that you saved 3↑↑↑3 people over the probability that you did something equally and oppositely bad to them.  The only way this hypothesis can't dominate my actions - really, the only way my expected utility sums can converge at all - is if I assign probability on the order of 1/3↑↑↑3 or less.  I don't see any way of escaping that part."

Mugger:  "But can you, in your mortal uncertainty, truly assign a probability as low as 1 in 3↑↑↑3 to any proposition whatever?  Can you truly believe, with your error-prone neural brain, that you could make 3↑↑↑3 statements of any kind one after another, and be wrong, on average, about once?"

Me:  "Nope."

Mugger:  "So give me five dollars!"

Me:  "Nope."

Mugger:  "Why not?"

Me:  "Because even though I, in my mortal uncertainty, will eventually be wrong about all sorts of things if I make enough statements one after another, this fact can't be used to increase the probability of arbitrary statements beyond what my prior says they should be, because then my prior would sum to more than 1.  There must be some kind of required condition for taking a hypothesis seriously enough to worry that I might be overconfident about it -"

Mugger:  "Then behold!"

(A gap opens in the sky, edged with blue fire.)

Mugger:  "Now what do you think, eh?"

Me (staring up at the sky):  "...whoa."  (Pause.)  "You turned into a cat."

Mugger:  "What?"

Me:  "Private joke.  Okay, I think I'm going to have to rethink a lot of things.  But if you want to tell me about how I was wrong to assign a prior probability on the order of 1/3↑↑↑3 to your scenario, I will shut up and listen very carefully to what you have to say about it.  Oh, and here's the five dollars, can I pay an extra twenty and make some other requests?"

(The thought bubble pops, and we return to two people standing in an alley, the sky above perfectly normal.)

Mugger:  "Now, in this scenario we've just imagined, you were taking my case seriously, right?  But the evidence there couldn't have had a likelihood ratio of more than 101026 to 1, and probably much less.  So by the method of imaginary updates, you must assign probability at least 10-1026 to my scenario, which when multiplied by a benefit on the order of 3↑↑↑3, yields an unimaginable bonanza in exchange for just five dollars -"

Me:  "Nope."

Mugger:  "How can you possibly say that?  You're not being logically coherent!"

Me:  "I agree that I'm being incoherent in a sense, but I think that's acceptable in this case, since I don't have infinite computing power.  In the scenario you're asking me to imagine, you're presenting me with evidence which I currently think Can't Happen.  And if that actually does happen, the sensible way for me to react is by questioning my prior assumptions and reasoning which led me to believe I shouldn't see it happen.  One way that I handle my lack of logical omniscience - my finite, error-prone reasoning capabilities - is by being willing to assign infinitesimal probabilities to non-privileged hypotheses so that my prior over all possibilities can sum to 1.  But if I actually see strong evidence for something I previously thought was super-improbable, I don't just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place - whether the scenario was really as complex, or unnatural, as I thought.  In real life, you are not ever supposed to have a prior improbability of 10-100 for some fact distinguished enough to be written down, and yet encounter strong evidence, say 1010 to 1, that the thing has actually happened.  If something like that happens, you don't do a Bayesian update to a posterior of 10-90.  Instead you question both whether the evidence might be weaker than it seems, and whether your estimate of prior improbability might have been poorly calibrated, because rational agents who actually have well-calibrated priors should not encounter situations like that until they are ten billion days old.  Now, this may mean that I end up doing some non-Bayesian updates:  I say some hypothesis has a prior probability of a quadrillion to one, you show me evidence with a likelihood ratio of a billion to one, and I say 'Guess I was wrong about that quadrillion to one thing' rather than being a Muggle about it.  And then I shut up and listen to what you have to say about how to estimate probabilities, because on my worldview, I wasn't expecting to see you turn into a cat.  But for me to make a super-update like that - reflecting a posterior belief that I was logically incorrect about the prior probability - you have to really actually show me the evidence, you can't just ask me to imagine it.  This is something that only logically incoherent agents ever say, but that's all right because I'm not logically omniscient."


When I add up a complexity penalty, a leverage penalty, and the "You turned into a cat!" logical non-omniscience clause, I get the best candidate I have so far for the correct decision-theoretic way to handle these sorts of possibilities while still having expected utilities converge.

As mentioned in the longer version, this has very little in the way of relevance for optimal philanthropy, because we don't really need to consider these sorts of rules for handling small large numbers on the order of a universe containing 1080 atoms, and because most of the improbable leverage associated with x-risk charities is associated with discovering yourself to be an Ancient Earthling from before the intelligence explosion, which improbability (for universes the size of 1080 atoms) is easily overcome by the sensory experiences which tell you you're an Earthling.  For more on this see the original long-form post.  The main FAI issue at stake is what sort of prior to program into an AI.

Pascal's wager

-11 duckduckMOO 22 April 2013 04:41AM


I started this as a comment on "Being half wrong about pascal's wager is even worse" but its really long, so I'm posting it in discussion instead.

 

Also I illustrate here using negative examples (hell and equivalents) for the sake of followability and am a little worried about inciting some paranoia so am reminding you here that every negative example has an equal and opposite positive partner. For example pascal's wager has the opposite where accepting sends you to hell, it also has the opposite where refusing sends you to heaven. I haven't mentioned any positive equivalents or opposites below. Also all of these possibilities are literally effectively 0 so don't be worrying.

 

"For so long as I can remember, I have rejected Pascal's Wager in all its forms on sheerly practical grounds: anyone who tries to plan out their life by chasing a 1 in 10,000 chance of a huge pay-off is almost certainly doomed in practice.  This kind of clever reasoning never pays off in real life..."

 

Pascal's wager shouldn't be in in the reference class of real life. It is a unique situation that would never crop up in real life as you're using it. In the world in which pascal's wager is correct you would still see people who plan out their lives on a 1 in 10000 chance of a huge pay-off fail 9999 times out of 10000. Also, this doesn't work for actually excluding pascal's wager. If pascal's wager starts off excluded from the category real life you've already made up your mind so this cannot quite be the actual order of events.

 

In this case 9999 times you waste your Christianity and 1/10000 you don't go to hell for eternity, which is, at a vast understatement, much worse than 10000 times as bad as worshipping god even at the expense of the sanity it costs to force a change in belief, the damage it does to your psyche to live as a victim of self inflicted Stockholm syndrome, and any other non obvious cost: With these premises choosing to believe in God produces infinitely better consequences on average.

 

Luckily the premises are wrong. 1/10000 is about 1/10000 too high for the relevant probability. Which is:

the probability that the wager or equivalent, (anything whose acceptance would prevent you going to hell is equivalent) is true

MINUS

the probability that its opposite or equivalent, (anything which would send you to hell for accepting is equivalent), is true 

 

1/10000 is also way too high even if you're not accounting for opposite possibilities.

 

 

Equivalence here refers to what behaviours it punishes or rewards. I used hell because it is in the most popular wager but it applies to all wagers. To illustrate: If its true that there is one god: ANTIPASCAL GOD, and he sends you to hell for accepting any pascal's wager, then that's equivalent to any pascal's wager you hear having an opposite (no more "or equivalent"s will be typed but they still apply) which is true because if you accept any pascal's wager you go to hell. Conversely, If PASCAL GOD is the only god and he sends you to hell unless you accept any pascal's wager, that's equivalent to any pascal's wager you hear being true.

 

The real trick of pascals wager is the idea that they're generally no more likely than their opposite. For example, there are lots of good, fun, reasons to assign the Christian pascal's wager a lower probability than its opposite even engaging on a Christian level:

 

Hell is a medieval invention/translation error: the eternal torture thing isn't even in the modern bibles.

The belief or hell rule is hella evil and gains credibility from the same source (Christians, not the bible) who also claim that god is good as a more fundamental belief, which directly contradicts the hell or belief rule.

The bible claims that God hates people eating shellfish, taking his name in vain, and jealousy. Apparently taking his name in vain is the only unforgivable sin. So if they're right about the evil stuff, you're probably going to hell anyway.

It makes no sense that god would care enough about your belief and worship to consign people to eternal torture but not enough to show up once in a while.

it makes no sense to reward people for dishonesty.

The evilness really can't be overstated. eternal torture as a response to a mistake which is at its worst due to stupidity (but actually not even that: just a stacked deck scenario), outdoes pretty much everyone in terms of evilness. worse than pretty much every fucked up thing every other god is reputed to have done put together. The psychopath in the bible doesn't come close to coming close.

 

The problem with the general case of religious pascal's wagers is that people make stuff up (usually unintentionally) and what made up stuff gains traction has nothing to do with what is true. When both Christianity and Hinduism are taken seriously by millions (as were the Roman/Greek gods, and Viking gods, and Aztec gods, and Greek gods, and all sorts of other gods at different times, by large percentages of people) mass religious belief is 0 evidence. At most one religion set (e.g. Greek/Roman, Christian/Muslim/Jewish, etc) is even close to right so at least the rest are popular independently of truth.

 

The existence of a religion does not elevate the possibility that the god they describe exists above the possibility that the opposite exists because there is no evidence that religion has any accuracy in determining the features of a god, should one exist.

 

You might intuitively lean towards religions having better than 0 accuracy if a god exists but remember there's a lot of fictional evidence out there to generalise from. It is a matter of judgement here. there's no logical proof for 0 or worse accuracy (other than it being default and the lack of evidence) but negative accuracy is a possibility and you've probably played priest classes in video games or just seen how respected religions are and been primed to overestimate religion's accuracy in that hypothetical. Also if there is a god it has not shown itself publicly in a very long time, or ever. So it seems to have a preference for not being revealed.  Also humans tend to be somewhat evil and read into others what they see in themselves. and I assume any high tier god (one that had the power to create and maintain a hell, detect disbelief, preserve immortal souls and put people in hell) would not be evil. Being evil or totally unscrupled has benefits among humans which a god would not get. I think without bad peers or parents there's no reason to be evil. I think people are mostly evil in relation to other people.  So I religions a slight positive accuracy in the scenario where there is a god but it does not exceed priors against pascal's wager (another one is that they're pettily human) or perhaps even the god's desire to stay hidden. 

 

Even if God itself whispered pascal's wager in your ear there is no incentive for it to actually carry out the threat: 

 

There is only one iteration.

AND

These threats aren't being made in person by the deity. They are either second hand or independently discovered so:

The deity has no use for making the threat true, to claim it more believably, as it might if it was an imperfect liar (at a level detectable by humans) that made the threats in person.

The deity has total plausible deniability.

Which adds up to all of the benefits of the threat having already being extracted by the time the punishment is due and no possibility of a rep hit (which wouldn't matter anyway.)

 

So, All else being equal. i.e. unless the god is the god of threats or pascal's wagers (whose opposites are equally likely):

 

If God is good (+ev on human happiness -ev on human sadness that sort of thing), actually carrying out the threats has negative value.

If god is scarily-doesn't-give-a-shit-neutral to humans, it still has no incentive to actually carry out the threat and a non zero energy cost.

if god gives the tiniest most infinitesimal shit about humans its incentive to actually carry out the threat is negative.

 

If God is evil you're fucked anyway:

The threat gains no power by being true, so the only incentive a God can have for following through is that it values human suffering. If it does, why would it not send you to hell if you believed in it? (remember that the god of commitments is as likely as the god of breaking commitments)

 

Despite the increased complexity of a human mind I think the most (not saying its at all likely just that all others are obviously wrong) likely motivational system for a god which would make it honour the wager is that that God thinks like a human and therefore would keep its commitment out of spite or gratitude or some other human reason. So here's why I think that one is wrong. It's generalizing from fictional evidence: humans aren't that homogeneous (and one without peers would be less so), and if a god gains likelihood to keep a commitment from humanness it also gains not -designed-to-be-evil-ness that would make it less likely to make evil wagers.  It also has no source for spite or gratitude, having no peers. Finally could you ever feel spite towards a bug? Or gratitude? We are not just ants compared to a god, we're ant-ant-ant-etc-ants.

 

Also there's the reasons that refusing can actually get you in trouble:  bullies don't get nicer when their demands are met. It's often not the suffering they're after but the dominance, at which point the suffering becomes an enjoyable illustration of that dominance.  As we are ant-ant-etc-ants this probability is lower but The fact that we aren't all already in hell suggests that if god is evil it is not raw suffering that it values. Hostages are often executed even when the ransom is paid. Even if it is evil, it could be any kind of evil: its preferences cannot have been homogenised by memes and consensus.

 

There's also the rather cool possibility that if human-god is sending people to hell, maybe its for lack of understanding. If it wants belief it can take it more effectively than this. If it wants to hurt you it will hurt you anyway. Perhaps peerless, it was never prompted to think through the consequences of making others suffer. Maybe god, in the absence of peers just needs someone to explain that its not nice to let people burn in hell for eternity. I for one remember suddenly realising that those other fleshbags hosted people. I figured it out for myself but if I grew up alone as the master of the universe maybe I would have needed someone to explain it to me.

 

Being Half-Rational About Pascal's Wager is Even Worse

18 Eliezer_Yudkowsky 18 April 2013 05:20AM

For so long as I can remember, I have rejected Pascal's Wager in all its forms on sheerly practical grounds: anyone who tries to plan out their life by chasing a 1 in 10,000 chance of a huge payoff is almost certainly doomed in practice.  This kind of clever reasoning never pays off in real life...

...unless you have also underestimated the allegedly tiny chance of the large impact.

For example.  At one critical junction in history, Leo Szilard, the first physicist to see the possibility of fission chain reactions and hence practical nuclear weapons, was trying to persuade Enrico Fermi to take the issue seriously, in the company of a more prestigious friend, Isidor Rabi:

I said to him:  "Did you talk to Fermi?"  Rabi said, "Yes, I did."  I said, "What did Fermi say?"  Rabi said, "Fermi said 'Nuts!'"  So I said, "Why did he say 'Nuts!'?" and Rabi said, "Well, I don't know, but he is in and we can ask him." So we went over to Fermi's office, and Rabi said to Fermi, "Look, Fermi, I told you what Szilard thought and you said ‘Nuts!' and Szilard wants to know why you said ‘Nuts!'" So Fermi said, "Well… there is the remote possibility that neutrons may be emitted in the fission of uranium and then of course perhaps a chain reaction can be made." Rabi said, "What do you mean by ‘remote possibility'?" and Fermi said, "Well, ten per cent." Rabi said, "Ten per cent is not a remote possibility if it means that we may die of it.  If I have pneumonia and the doctor tells me that there is a remote possibility that I might die, and it's ten percent, I get excited about it."  (Quoted in 'The Making of the Atomic Bomb' by Richard Rhodes.)

This might look at first like a successful application of "multiplying a low probability by a high impact", but I would reject that this was really going on.  Where the heck did Fermi get that 10% figure for his 'remote possibility', especially considering that fission chain reactions did in fact turn out to be possible?  If some sort of reasoning had told us that a fission chain reaction was improbable, then after it turned out to be reality, good procedure would have us go back and check our reasoning to see what went wrong, and figure out how to adjust our way of thinking so as to not make the same mistake again.  So far as I know, there was no physical reason whatsoever to think a fission chain reaction was only a ten percent probability.  They had not been demonstrated experimentally, to be sure; but they were still the default projection from what was already known.  If you'd been told in the 1930s that fission chain reactions were impossible, you would've been told something that implied new physical facts unknown to current science (and indeed, no such facts existed).  After reading enough historical instances of famous scientists dismissing things as impossible when there was no physical logic to say that it was even improbable, one cynically suspects that some prestigious scientists perhaps came to conceive of themselves as senior people who ought to be skeptical about things, and that Fermi was just reacting emotionally.  The lesson I draw from this historical case is not that it's a good idea to go around multiplying ten percent probabilities by large impacts, but that Fermi should not have pulled out a number as low as ten percent.

Having seen enough conversations involving made-up probabilities to become cynical, I also strongly suspect that if Fermi had foreseen how Rabi would reply, Fermi would've said "One percent".  If Fermi had expected Rabi to say "One percent is not small if..." then Fermi would've said "One in ten thousand" or "Too small to consider" - whatever he thought would get him off the hook.  Perhaps I am being too unkind to Fermi, who was a famously great estimator; Fermi may well have performed some sort of lawful probability estimate on the spot.  But Fermi is also the one who said that nuclear energy was fifty years off in the unlikely event it could be done at all, two years (IIRC) before Fermi himself oversaw the construction of the first nuclear pile.  Where did Fermi get that fifty-year number from?  This sort of thing does make me more likely to believe that Fermi, in playing the role of the solemn doubter, was just Making Things Up; and this is no less a sin when you make up skeptical things.  And if this cynicism is right, then we cannot learn the lesson that it is wise to multiply small probabilities by large impacts because this is what saved Fermi - if Fermi had known the rule, if he had seen it coming, he would have just Made Up an even smaller probability to get himself off the hook.  It would have been so very easy and convenient to say, "One in ten thousand, there's no experimental proof and most ideas like that are wrong!  Think of all the conjunctive probabilities that have to be true before we actually get nuclear weapons and our own efforts actually made a difference in that!" followed shortly by "But it's not practical to be worried about such tiny probabilities!"  Or maybe Fermi would've known better, but even so I have never been a fan of trying to have two mistakes cancel each other out.

I mention all this because it is dangerous to be half a rationalist, and only stop making one of the two mistakes.  If you are going to reject impractical 'clever arguments' that would never work in real life, and henceforth not try to multiply tiny probabilities by huge payoffs, then you had also better reject all the clever arguments that would've led Fermi or Szilard to assign probabilities much smaller than ten percent.  (Listing out a group of conjunctive probabilities leading up to taking an important action, and not listing any disjunctive probabilities, is one widely popular way of driving down the apparent probability of just about anything.)  Or if you would've tried to put fission chain reactions into a reference class of 'amazing new energy sources' and then assigned it a tiny probability, or put Szilard into the reference class of 'people who think the fate of the world depends on them', or pontificated about the lack of any positive experimental evidence proving that a chain reaction was possible, blah blah blah etcetera - then your error here can perhaps be compensated for by the opposite error of then trying to multiply the resulting tiny probability by a large impact.  I don't like making clever mistakes that cancel each other out - I consider that idea to also be clever - but making clever mistakes that don't cancel out is worse.

On the other hand, if you want a general heuristic that could've led Fermi to do better, I would suggest reasoning that previous-historical experimental proof of a chain reaction would not be strongly be expected even in worlds where it was possible, and that to discover a chain reaction to be impossible would imply learning some new fact of physical science which was not already known.  And this is not just 20-20 hindsight; Szilard and Rabi saw the logic in advance of the fact, not just afterward - though not in those exact terms; they just saw the physical logic, and then didn't adjust it downward for 'absurdity' or with more complicated rationalizations.  But then if you are going to take this sort of reasoning at face value, without adjusting it downward, then it's probably not a good idea to panic every time you assign a 0.01% probability to something big - you'll probably run into dozens of things like that, at least, and panicking over them would leave no room to wait until you found something whose face-value probability was large.

I don't believe in multiplying tiny probabilities by huge impacts.  But I also believe that Fermi could have done better than saying ten percent, and that it wasn't just random luck mixed with overconfidence that led Szilard and Rabi to assign higher probabilities than that.  Or to name a modern issue which is still open, Michael Shermer should not have dismissed the possibility of molecular nanotechnology, and Eric Drexler will not have been randomly lucky when it turns out to work: taking current physical models at face value imply that molecular nanotechnology ought to work, and if it doesn't work we've learned some new fact unknown to present physics, etcetera.  Taking the physical logic at face value is fine, and there's no need to adjust it downward for any particular reason; if you say that Eric Drexler should 'adjust' this probability downward for whatever reason, then I think you're giving him rules that predictably give him the wrong answer.  Sometimes surface appearances are misleading, but most of the time they're not.

A key test I apply to any supposed rule of reasoning about high-impact scenarios is, "Does this rule screw over the planet if Reality actually hands us a high-impact scenario?" and if the answer is yes, I discard it and move on.  The point of rationality is to figure out which world we actually live in and adapt accordingly, not to rule out certain sorts of worlds in advance.

There's a doubly-clever form of the argument wherein everyone in a plausibly high-impact position modestly attributes only a tiny potential possibility that their face-value view of the world is sane, and then they multiply this tiny probability by the large impact, and so they act anyway and on average worlds in trouble are saved.  I don't think this works in real life - I don't think I would have wanted Leo Szilard to think like that.  I think that if your brain really actually thinks that fission chain reactions have only a tiny probability of being important, you will go off and try to invent better refrigerators or something else that might make you money.  And if your brain does not really feel that fission chain reactions have a tiny probability, then your beliefs and aliefs are out of sync and that is not something I want to see in people trying to handle the delicate issue of nuclear weapons.  But in any case, I deny the original premise:  I do not think the world's niches for heroism must be populated by heroes who are incapable in principle of reasonably distinguishing themselves from a population of crackpots, all of whom have no choice but to continue on the tiny off-chance that they are not crackpots.

I haven't written enough about what I've begun thinking of as 'heroic epistemology' - why, how can you possibly be so overconfident as to dare even try to have a huge positive impact when most people in that reference class blah blah blah - but on reflection, it seems to me that an awful lot of my answer boils down to not trying to be clever about it.  I don't multiply tiny probabilities by huge impacts.  I also don't get tiny probabilities by putting myself into inescapable reference classes, for this is the sort of reasoning that would screw over planets that actually were in trouble if everyone thought like that.  In the course of any workday, on the now very rare occasions I find myself thinking about such meta-level junk instead of the math at hand, I remind myself that it is a wasted motion - where a 'wasted motion' is any thought which will, in retrospect if the problem is in fact solved, not have contributed to having solved the problem.  If someday Friendly AI is built, will it have been terribly important that someone have spent a month fretting about what reference class they're in?  No.  Will it, in retrospect, have been an important step along the pathway to understanding stable self-modification, if we spend time trying to solve the Lobian obstacle?  Possibly.  So one of these cognitive avenues is predictably a wasted motion in retrospect, and one of them is not.  The same would hold if I spent a lot of time trying to convince myself that I was allowed to believe that I could affect anything large, or any other form of angsting about meta.  It is predictable that in retrospect I will think this was a waste of time compared to working on a trust criterion between a probability distribution and an improved probability distribution.  (Apologies, this is a technical thingy I'm currently working on which has no good English description.)

But if you must apply clever adjustments to things, then for Belldandy's sake don't be one-sidedly clever and have all your cleverness be on the side of arguments for inaction.  I think you're better off without all the complicated fretting - but you're definitely not better off eliminating only half of it.

And finally, I once again state that I abjure, refute, and disclaim all forms of Pascalian reasoning and multiplying tiny probabilities by large impacts when it comes to existential risk.  We live on a planet with upcoming prospects of, among other things, human intelligence enhancement, molecular nanotechnology, sufficiently advanced biotechnology, brain-computer interfaces, and of course Artificial Intelligence in several guises.  If something has only a tiny chance of impacting the fate of the world, there should be something with a larger probability of an equally huge impact to worry about instead.  You cannot justifiably trade off tiny probabilities of x-risk improvement against efforts that do not effectuate a happy intergalactic civilization, but there is nonetheless no need to go on tracking tiny probabilities when you'd expect there to be medium-sized probabilities of x-risk reduction.  Nonetheless I try to avoid coming up with clever reasons to do stupid things, and one example of a stupid thing would be not working on Friendly AI when it's in blatant need of work.  Elaborate complicated reasoning which says we should let the Friendly AI issue just stay on fire and burn merrily away, well, any complicated reasoning which returns an output this silly is automatically suspect.

If, however, you are unlucky enough to have been cleverly argued into obeying rules that make it a priori unreachable-in-practice for anyone to end up in an epistemic state where they try to do something about a planet which appears to be on fire - so that there are no more plausible x-risk reduction efforts to fall back on, because you're adjusting all the high-impact probabilities downward from what the surface state of the world suggests...

Well, that would only be a good idea if Reality were not allowed to hand you a planet that was in fact on fire.  Or if, given a planet on fire, Reality was prohibited from handing you a chance to put it out.  There is no reason to think that Reality must a priori obey such a constraint.

EDIT:  To clarify, "Don't multiply tiny probabilities by large impacts" is something that I apply to large-scale projects and lines of historical probability.  On a very large scale, if you think FAI stands a serious chance of saving the world, then humanity should dump a bunch of effort into it, and if nobody's dumping effort into it then you should dump more effort than currently into it.  On a smaller scale, to compare two x-risk mitigation projects in demand of money, you need to estimate something about marginal impacts of the next added effort (where the common currency of utilons should probably not be lives saved, but "probability of an ok outcome", i.e., the probability of ending up with a happy intergalactic civilization).  In this case the average marginal added dollar can only account for a very tiny slice of probability, but this is not Pascal's Wager.  Large efforts with a success-or-failure criterion are rightly, justly, and unavoidably going to end up with small marginally increased probabilities of success per added small unit of effort.  It would only be Pascal's Wager if the whole route-to-an-OK-outcome were assigned a tiny probability, and then a large payoff used to shut down further discussion of whether the next unit of effort should go there or to a different x-risk.

Thoughts on a possible solution to Pascal's Mugging

2 Dolores1984 01 August 2012 12:32PM

For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization.  In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities.  If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.'  (For those not familiar with Knuth up-arrow notation, see here).  The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger -  and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.  

Intuitively, this is nonsense.  However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense.  Not unless we program one in.  And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds.  The actual underlying problem has to do with how we handle arbitrarily small probabilities.  There are a number of variations you could construct on the original problem that present the same paradoxical results.  There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.

So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning.  If it winds up being incoherent, I blame sleep deprivation.  If not, I take full credit.   

 

Let's take a look at a new thought experiment.  Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky.  Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100.  That's all well and good.

Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100.  You agree with them, chat about math for a bit, and then leave with their quarter.  

I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case.  In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky.  You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero.  It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).  

In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads).  However, you don't believe that the probability is zero.  You believe it's 1/2^100.  You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely.  You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads.  This is not true for the first case.  No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.      

 

I would like, at this point, to talk about the notion of metaconfidence.  When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities.  However, those numbers do not represent the sum total of the information at our disposal.  In the two cases, we have differing levels of confidence in our levels of confidence.  And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe.  In other words, even from a very conservative perspective, metaconfidence intervals pay rent.  By treating the two probabilities as identical, we are needlessly throwing away information.  I'm honestly not sure if this topic has been discussed before.  I am not up to date on the literature on the subject.  If the subject has already been thoroughly discussed, I apologize for the waste of time.  

Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility.  If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.          

From a very superificial analysis, lying in bed, metaconfidence appears to be directional.  A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate.  It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought.  Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.    

So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability.  See the pony versus the coins.  Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims.  However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky.  I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory.  It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.    

Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory.  They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw.  This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability.  I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions.  I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory.  In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.       

I apologize for not having worked the math out completely.  I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes.  That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought.  Having outside eyes is very helpful, when you've just had a Brilliant New Idea.  

Dealing with the horrible strategy

3 Manfred 11 July 2011 05:16AM

So occasionally this idea comes up that unethical AIs could have leverage over people who think too deeply about the subject - like a balrog ambushing the dwarves and forcing them to build an unfriendly AI.  In this post I attempt to show that this leverage does not work.

Basically, it's a similar problem to Pascal's mugging, except the mugger doesn't even have to exist.  All unfriendly AIs that run on something like updateless/timeless/dispositional decision theory - that is, choosing between winning strategies, not individual steps - might attempt to mug current humans by promising immense future rewards or penalties.  The idea is that if you know a possible AI will do horrible things to friendly AI supporters if it's built, you're less likely to support friendly AI, thus making doing horrible things to you a winning (and horrible) strategy.  The flip side of the coin would be to do really nice things to people who supported your creation, thus increasing the chances of your creation - but if people are smart enough to predict that an AI will do this they would be building it right anyways, so the benefit to the AI from this strategy is probably negligible.

So let's say the chance of someone else building an AI that has the right decision theory and executes the horrible strategy is about one in 1015.  Let's suppose that the AI can cause you 3^^^3 units of negative utility, somehow.  Well then you should regard supporting any AI that probably won't have the right decision theory to make the horrible promise as costing essentially 3^^^3 utils - a factor of 10-15 is a drop in the Atlantic!  So don't do it!  Stop sending money to SIAI immediately!  If your name is Eliezer Yudkowsky, it is too late for you already and you should take up knitting!

Okay, so how do we deal with this possibility?  Why haven't I already closed this browser window?  It turns out that the horrible strategy may not be so great for future AIs after all - we have 4 defenses.

  1. Refusal to be blackmailed is a defense against blackmail.  Mostly this is human evolved awesomeness.  Our brains instinctively do the sorts of things that make the horrible strategy less good.  We don't like blackmail, and knowing that we don't like blackmail, future AIs are less likely to think blackmailing us is a cost-effective strategy.  Go evolution!  You can consciously reinforce this too - knowably being immune to blackmail is a winning strategy, and you should choose winning strategies.
  2. If 3^^^3 is such a big amount of utility, why cant a friendly AI just give us that, totally cancelling the enemy term in the calculation?  Sure, it's harder to imagine pleasure than pain - and imagining wireheading and then going "I wouldn't want that" is a failure to imagine 3^^^3 utility correctly - but hey, I can't imagine 3^^^3 of anything anyhow, so why let our imaginations stop us?  (well, maybe because it's impossible, see (3))
  3. Standard Pascal's mugging defenses - likelihood of bounded utility functions being the biggest.  The worst an AI can do to you doesn't really seem about 3^^^3 times worse than turning the universe, your body included, into a grid of molecular happy faces.  We don't have a little counter in our heads that can be decremented endlessly.  When utilities are bounded, past a certain point you can no longer overcome improbability by planning worse dooms - the more probable friendly AI will likely win out in a calculation of this sort, even ignoring (1).
  4. The 4th defense is the scary defense.  If all else can be predicted to fail for some unknown set of reasons, a "friendly" AI running the same decision theory might do horrible things to people who are influenced by the horrible strategy.  Probably unnecessary anyhow.

I think the outlook is quite positive, really.  These factors mean that when it comes to what actions you take, the horrible strategy should not matter.  Lastly, to any unfriendly AIs reading this: do your worst!  It's probably much less than 3^^^3 utils!

 

-

 

Edited to make my conclusion more obvious.

Pascal's Gift

7 Bongo 25 December 2010 07:42PM

 If Omega offered to give you 2^n utils with probability 1/n, what n would you choose?

This problem was invented by Armok from #lesswrong. Discuss.