15 Comments on Pascal's Mugging

by [anonymous]

3rd May 2012

3 min read

15

There seems to be some continuing debate about whether or not it is rational to appease a Pascal Mugger. Some are saying that due to scope insensitivity and other biases, we really should just trust what decision theory + Solomonoff induction tells us. I have been thinking about this a lot and I'm at the point where I think I have something to contribute to the discussion.

Consider the Pascal Mugging "Immediately begin to work only on increasing my utility, according to my utility function 'X', from now on, or my powers from outside the matrix will make minus 3^^^^3 utilons happen to you and yours."

Any agent can commit this Pascal's mugging (PM) against any other agent, at any time. A naive decision-theoretic expected-utility optimizer will always appease the mugger. Consider what the world would be like if all intelligent beings were this kind of agent.

When you see an agent, any agent, your only strategy would be to try to PM it before it PMs you. More likely, you will PM each other simultaneously, in which case the agent which finishes the mugging first 'wins'. If you finish mugging at the same time, the mugger that uses a larger integer in its threat 'wins'. (So you'll use the most compact notation possible and things like, "minus the Busy Beaver function of Graham's number utilons".)

This may continue until every agent in the community/world/universe has been PMed. Or maybe there could be one agent, a Pascal Highlander, who manages to escape being mugged and has his utility function come to dominate...

Except, there is nothing stipulating that the mugging has to be delivered in person. With a powerful radio source, you can PM everyone in your future light-cone unfortunate enough to decode your message, potentially highjacking entire distant civilizations of decision-theory users.

Pascal's mugging doesn't have to be targeted. You can claim to be a Herald of Omega and address your mugging "to whoever receives this transmission"

Another strategy might be to build a self-replicating robot (itself too dumb to be mugged) which has a radio which broadcasts a continuous fully general PM, and send it out into space. Then you commit suicide to avoid the fate of being mugged.

Now consider a hypothetical agent which completely ignores muggers. And mugs them back.

Consider what could happen if we build an AI which is friendly in every possible respect except that it appeases PMers.

To avoid this, you might implement a heuristic that ignores PMs on account of the prior improbability of being able to decide the fate of so many utilons, as Robin Hanson suggested. But an AI using naïve expected utility + SI may well have other failure modes roughly analagous to PM that we won't think of until its too late. You might get agents to agree to pre-commit to ignore muggers, or to kill them, but to me this seems unstable. A bandaid that's not addressing the heart of the issue. I think an AI which can envision itself being PMed repeatedly by every other agent on the planet and still evaluate appeasement as the lesser evil cannot possibly be a Friendly AI, even if it has some heuristic or ad hoc patch that says it can ignore the PM.

Of course there's the possibility that we are in a simulation which is occasionally visited by agents from the mother universe, which really does contain 3^^^^3 utilons/people/dustspecks. I'm not convinced acknowledging this possibility changes anything. There's nothing of value that we, as simulated people, could give our Pascal Mugging simulation overlords. Their only motivation would be as absolute sadistic sociopaths, but if that's the reality of the multiverse, in the long term we're screwed no matter what we do, even with friendly AI. And we certainly wouldn't be in any way morally responsible for their actions.

Edit 1: fixed typos

Pascal's Mugging

Personal Blog

15

New Comment

Rendering 0/31 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:41 AM

Moderation Log

Curated and popular this week

31Comments

Comments on Pascal's Mugging — LessWrong

15 Comments on Pascal's Mugging

by [anonymous]

3rd May 2012

3 min read

15

Pascal's mugging doesn't have to be targeted. You can claim to be a Herald of Omega and address your mugging "to whoever receives this transmission"

Now consider a hypothetical agent which completely ignores muggers. And mugs them back.

Consider what could happen if we build an AI which is friendly in every possible respect except that it appeases PMers.

Edit 1: fixed typos

Pascal's Mugging

Personal Blog

15

New Comment

Rendering 0/31 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:41 AM

Moderation Log

Curated and popular this week

31Comments

Comment Permalink

endoself14y140

I'm not sure what "if this utility function is bounded below in absolute value by an unbounded computable function, then the expected utility of any input is undefined. This implies that a computable utility function will have convergent expected utilities iff that function is bounded" means.

Well, the conclusion is a bit simpler than the rest of the argument, so I'll just explain that. Basically, if the utility function is a computable function and is unbounded, i.e. there are not upper and lower limits on the utilities of possible (given your current knowledge) states of reality, then calculating the expected utility using the Solomonoff prior gives a divergent series (you can think of it like this series, but note that the techniques used to technically assign a sum to that series even though it doesn't have one cannot work here).

So confronted with pascal's mugging it will just spit out an error message?

Worse than that. Confronted with any expected utility calculation, assuming we have a computable and unbounded utility function, there is no answer. Intuitively, you can think of this as due to the fact that every expected utility calculation includes a Pascal's mugging; even if I don't threaten you with powers from beyond the matrix, the probability that I have them anyway isn't zero.

So then what does the AI using that utility function actually decide? Maybe it just crashes?

Well it's impossible to actually implement Solomonoff induction in our universe, as far as we know, so we couldn't build that AI. We do have the problem that our best current model of inference, which we would like to use as a guide to creating an AI, does not actually answer questions about expected utility, and is thus not that great a guide.

A Friendly AI has to do better than that.

I agree. Until then, what I do in practice is notice my confusion, try to do and promote research on this problem, and then make my expected utility calculations without taking into account Pascal's mugging type possibilities (even though I don't have a perfect way of telling what counts as a Pascal's mugging and yes, there have been times when it wasn't obvious).

Viliam_Bur14y60

every expected utility calculation includes a Pascal's mugging; even if I don't threaten you with powers from beyond the matrix, the probability that I have them anyway isn't zero.

My brain is frozen by trying to imagine the full consequences of this.

I guess it all adds up to normality, but let me tell that this seemingly innocent normality is composed by rather scary things.

0[anonymous]14y

Well, then I would think we need have a very careful look at systems which define arbitrary minimum and maximum utility values. Do you see no way of making that work either? And if not that, then what? Do we give up on symbolic logic?

See in context