A Thought on Pascal's Mugging

komponisto

For background, see here.

In a comment on the original Pascal's mugging post, Nick Tarleton writes:

[Y]ou could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

Coming across this again recently, it occurred to me that there might be a way to generalize Vassar's suggestion in such a way as to deal with Tarleton's more abstract formulation of the problem. I'm curious about the extent to which folks have thought about this. (Looking further through the comments on the original post, I found essentially the same idea in a comment by g, but it wasn't discussed further.)

The idea is that the Kolmogorov complexity of "3^^^^3 units of disutility" should be much higher than the Kolmogorov complexity of the number 3^^^^3. That is, the utility function should grow only according to the complexity of the scenario being evaluated, and not (say) linearly in the number of people involved. Furthermore, the domain of the utility function should consist of low-level descriptions of the state of the world, which won't refer directly to words uttered by muggers, in such a way that a mere discussion of "3^^^^3 units of disutility" by a mugger will not typically be (anywhere near) enough evidence to promote an actual "3^^^^3-disutilon" hypothesis to attention.

This seems to imply that the intuition responsible for the problem is a kind of fake simplicity, ignoring the complexity of value (negative value in this case). A confusion of levels also appears implicated (talking about utility does not itself significantly affect utility; you don't suddenly make 3^^^^3-disutilon scenarios probable by talking about "3^^^^3 disutilons").

What do folks think of this? Any obvious problems?

For background, see here.

In a comment on the original Pascal's mugging post, Nick Tarleton writes:

[Y]ou could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

What do folks think of this? Any obvious problems?

The way around Pascal's mugging is to have a bounded utility function. Even if you are a paperclip-maximizer, your utility function is not the number of paperclips in the universe, it is some bounded function that is monotonic in the number of paperclips but asymptotes out. You are only linear in paperclips over small numbers of paperclips. This is not due to exponential discounting but because utility doesn't mean anything other than the function that we are maximizing the expected value of. It has an unfortunate namespace collision with the other utility, which is some intuitive quantification of our preferences that is probably closer to something like a description of the trades we would be willing to make. If you are unwilling to be mugged by Pascal's mugger then it simply follows as a mathematical fact that your utility is bounded by something on the order of the reciprocal of the probability that you would be un-muggable at.

For more of a description, see my post here, which originally got downvoted to oblivion because it argued from the position of a lack of knowledge of the VNM utility theorem. The post has since been fixed, and while it is not super-detailed, lays out an argument for why Pascal's mugging is resolved once we stop trying to make our utility functions look intuitive.

Incidentally, Pascal's mugging does lay out a good argument of why we need to be careful about an AGI's utility function; if we make it unbounded then we can get weird behavior indeed.

EDIT: Of course, perhaps I am still wrong somehow and there are unresolvable subtleties that I am missing. But I, at least, am simply unwilling to care about events occurring with probability 10^(-100), regardless of how bad they are.

The way around Pascal's mugging is to have a bounded utility function.

Way around? If my utility function suggests that being mugged by Pascal is the best thing for me to do then I'll be delighted to do it.

Utility functions determine our decisions, not the reverse!

1komponisto15y

The post you're commenting on argues that Pascal's mugging is already solved by merely letting the utility function be bounded by Kolmogorov complexity. Obviously, having it be uniformly bounded also solves the problem, but why resort to something so drastic if you don't need to?