A Thought on Pascal's Mugging
For background, see here.
In a comment on the original Pascal's mugging post, Nick Tarleton writes:
[Y]ou could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)
Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.
Coming across this again recently, it occurred to me that there might be a way to generalize Vassar's suggestion in such a way as to deal with Tarleton's more abstract formulation of the problem. I'm curious about the extent to which folks have thought about this. (Looking further through the comments on the original post, I found essentially the same idea in a comment by g, but it wasn't discussed further.)
The idea is that the Kolmogorov complexity of "3^^^^3 units of disutility" should be much higher than the Kolmogorov complexity of the number 3^^^^3. That is, the utility function should grow only according to the complexity of the scenario being evaluated, and not (say) linearly in the number of people involved. Furthermore, the domain of the utility function should consist of low-level descriptions of the state of the world, which won't refer directly to words uttered by muggers, in such a way that a mere discussion of "3^^^^3 units of disutility" by a mugger will not typically be (anywhere near) enough evidence to promote an actual "3^^^^3-disutilon" hypothesis to attention.
This seems to imply that the intuition responsible for the problem is a kind of fake simplicity, ignoring the complexity of value (negative value in this case). A confusion of levels also appears implicated (talking about utility does not itself significantly affect utility; you don't suddenly make 3^^^^3-disutilon scenarios probable by talking about "3^^^^3 disutilons").
What do folks think of this? Any obvious problems?
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (159)
Is your utility function such that there is some scenario for which you assign -3^^^^3 utils? If so, then the Kolmogorov complexity of "3^^^^3 units of disutility" can't be greater than K(your brain) + K(3^^^^3), since I can write a program to output such a scenario by iterating through all possible scenarios until I find one which your brain assigns -3^^^^3 utils.
A prior of 2^-(K(your brain) + K(3^^^^3)) is not nearly small enough, compared to the utility -3^^^^3, to make this problem go away.
Come to think of it, the problem with this argument is that it assumes that my brain can compute the utility it assigns. But if it's assigning utility according to Kolmogorov complexity (effectively the proposal in the post), that's impossible.
The same issue arises with having probability depend on complexity.
Ok, I think in that case my argument doesn't work. Let me try another approach.
Suppose some stranger appears to you and says that you're living in a simulated world. Out in the real world there is another simulation that contains 3^^^^3 identical copies of a utopian Earth-like planet plus another 3^^^^3 identical copies of a less utopian (but still pretty good) planet.
Now, if you press this button, you'll turn X of the utopian planets into copies of the less utopian planet, where X is a 10^100 digit random number. (Note that K(X) is of order 10^100 which is much larger than K(3^^^^3) and so pressing the button would increase the Kolmogorov complexity of that simulated world by about 10^100.)
What does your proposed utility function say you should do (how much would you pay to either press the button or prevent it being pressed), and why?
Utility is monotonic, even though complexity isn't. (Thus X downgrades out of the 3^^^^3 wouldn't be as bad as, say, 3^^^3 downgrades.) However, utility is bounded by complexity: the complexity of a scenario with utility N must be at least N. (Asymptotically, of course.)
Probably not, if "you" is interpreted strictly to refer to my current human brain, as opposed to including more complex "enhancements" of the latter.
This requirement (large numbers that refer to sets have large kolmogorov complexity) is a weaker version of my and RichardKenneway's versions of the anti-mugging axiom. However, it doesn't work for all utility functions; for example, Clippy would still be vulnerable to Pascal's Mugging if using this strategy, since he doesn't care whether the paperclips are distinct.
Hm, that solution seems like the one I gave (ironically, on a Clippy post), where I said that if you're allowed to posit these huge utilities from complex (and thus improbable) hypotheses, you also have to consider hypotheses that are just as complex but give the opposite utility. But in the link I gave, people seemed to find something wrong with it: specifically, that the mugger gives an epsilon of evidence favoring the "you should pay"-supporting hypotheses, making them come out ahead.
So ... what's the deal?
Arranging your probability estimates so that predictions of opposite utility cancel out is one way to satisfy the anti-mugging axiom. It's not the only way to do so, though; you can also require that the prior probabilities of statements (without corresponding opposite-utility statements) shrink at least as fast as utilities grow. There's no rule that says that similar statements with positive and negative utilities have to have the same prior probabilities, unless you introduce it specifically for the purpose of anti-mugging defense.
My favored solution. Incidentally, if your prior shrinks faster, then you can still be vulnerable. The mugger can simply split his offer up into a billion smaller offers, which will avoid the penalty of big offers disproportionately being discounted. So unless you would reject every single mugging offer of any magnitude (in which case isn't that kind of arbitrary?), the faster shrinking doesn't buy you anything.
I believe a set of smaller offers would imply the existence of a statement which aggregates them and violates this formalization of the anti-mugging axiom.
On the other hand, you can potentially be forced to search the space of all functions for the one that diverges, and it might be possible (I don't know whether it is) to mug in a way that makes finding that function computationally hard.
I take the aggregating thing as a constructive proof that that class of priors + utility function is vulnerable; your version just seems to put it another way. We agree on that part, I think.
I believe there is such a rule, which doesn't have to be introduced ad hoc, and which follows from the tenets of algorithmic information theory. Per the reasoning I gave in the linked post, an arbitrary complex conclusion you locate (like the one in Pascal's mugging) necessarily has a corresponding conclusion of equal complexity, but with the right predicate(s) inverted so that the inferred utility is reversed.
Because (by assumption) the conclusion is reached through arbitrary reasoning, disentangled from any real-world observation, you need no additional complexity for a hypothesis that critically inverts the first one. Since no other evidence supports either conclusion, their probability weights are determined by their complexity, and are thus equal.
That's why I don't think you need to introduce this reasoning as an additional axiom. However, as a separate matter (and whether or not you need it as an axiom), I thought this argument was refuted by the fact that the mugger, simply through assertion, introduces an arbitrarily small amount of evidence favoring one hypothesis over its inverse. If it refutes the defense I gave in the link, it should work against the anti-mugging axiom you're using as well.
Thanks for the links; I seem to have missed that post.
There is an idea here, but it's a little muddled. Why should complexity matter for Pascal's mugging?
Well, the obvious answer to me is that, behind the scenes, you're calculating an expected value, for which you need a probability of the antagonist actually following through. More complex claims are harder to carry out, so they have lower probability.
A separate issue is that of having bounded utility, which is possible, but it should be possible to do Pascal's mugging even then, if the expected value of giving them money is higher than the expected value of not.
Anyhow, just "complexity" isn't quite a way around Pascal's mugging. It would be better to do a more complete assessment of the likelihood that the threat is carried out.
Among other things, the ability of the mugger to communicate the threat depends on the complexity of the threat.
This isn't really the limiting reagent in the reaction, though. I can communicate all sorts of awful things (sorry, had to share - it's totally my fault if you end up reading the entire thread) much more easily than I can do them.
Not for things with values in the range of 3^^^^3 -- in such a case the difference between ability-to-communicate and ability-to-carry-out is pretty much negligible. (The complexity of an action with 3^^^^3 units of disutility is right around 3^^^^3, under my proposal.)
Ah shoot, I read this post, and then I read SewingMachine's post, and then I realized my reply to this post was wrong.
I'll repeat my other comment. log(N) is an upper bound for the complexity of N, but complexity of N can be much smaller. Complexity of 3^^^3 is tiny compared to log(3^^^3).
Oh, you totally got ninja'd.
Given that there's no definition for the value of a util, arguments about how many utils the universe contains aren't likely to get anywhere.
So let's make it easier. Suppose the mugger asks you for $1, or ey'll destroy the Universe. Suppose we assume the Universe to have 50 quadrillion sapient beings in it, and to last for another 25 billion years ( = 1 billion generations if average aliens have similar generation time to us) if not destroyed. That means the mugger can destroy 50 septillion beings. If we assign an average being's life as worth $100000, then the mugger can destroy $5 nonillion (= 5 * 10^30).
Given that there have been reasonable worries about ie the LHC destroying the Universe, I think the probability that a person can destroy the universe is rather greater than 1 in 5 nonillion (to explain why it hasn't been done already, assume the Great Filter comes at the stage of industrialization). I admit that the probability of someone with an LHC-level device being willing to destroy the Universe for the sake of $1 would be vanishingly low, but until today I wouldn't have thought someone would kill 6,790 people to protest a blog's comment policy either.
Citation needed.
Looking at:
http://en.wikipedia.org/wiki/Safety_of_particle_collisions_at_the_Large_Hadron_Collider
...the defenders are doing a PR whitewash job. They can't even bring themselves to mention probabilities!
Maybe its because there would be no point to mentioning probabilities smaller than e^(-10^9) (the evidence you get from the fact that the sun still exists) citation, since humans don't deal well with small numbers.
But no, "whitewash job" :P
IMO, this is most likely to do with the percieved difference between "no risk" and "some risk". I am sure the authors were capable of producing a quantitative report - and understand that that is the scientific approach - but sat on any figures they might have had - after being instructed about the presentation of the desired conclusion.
This sounds a bit conspiracy-ey. Any evidence for your claims, e.g. a trend of similar papers using probability assessments rather than just stopping at "these collisions have happened a very large number of times and we ain't dead yet"?
Risk assessments are commonly quantitave.
Fair enough. So we might have enough data for the analysis. But "are commonly quantitative" isn't even weak evidence either way - that is to say, this paper being less quantitative doesn't ring any alarm bells per se, since it's not unusual. But we can get evidence by looking closer: are qualitative risk assessments more likely to be "instructed about the desired conclusion" than quantitative ones? What complicating variables can we prune out to try and get the causal relationship whitewash->qualitative?
Basically what I'm trying to communicate is that there are two ways you could convince me this was a fraud: you could have better knowledge of the subject matter than me and demonstrate directly how it was a fraud, or you could have detailed evidence on frauds, good enough to overcome my prior probability that this isn't a fraud. Saying "they were probably able to produce a more quantitative report, but didn't, so it's a fraud" is neither.
I never used the term "fraud". You seem to be reading more into this than was intended. I just think it is funny that an official LHC risk assessment paper presumably designed to reassure fails to come up with any probabilities - and just says: "it's safe". To someone like me, that makes it look as though it is primarily a PR exercise.
IIRC, others have observed this before me - though I don't have the reference handy.
I would classify a supposedly scientific paper that "sat on figures" and "was instructed about the desired conclusion" as a fraud. If you would prefer "whitewash" (a word you did use) instead of "fraud" I would be happy to change in the future.
But the paper was quite a bit longer than "it's safe," seemed quite correct (though particle physics isn't my field), and indeed gave you enough information to calculate approximate probabilities yourself if you wanted to. So to me it looks like you're judging on only a tiny part of the information you actually have.
Because it doesn't actually say the words "not greater than 1 in 3E22 and that's just calculating using the cosmic rays that have hit the earth in the last 4.5E9 years" means it should be ignored?
I am most disappointed the Brian Cox quote didn't make it into that article. The quote was actually newsworthy, too.
Lifeboat Foundation fits my criteria of "reasonable", as do some of the commenters here. Even if there's only a one in a million risk of destroying the world, that's still equivalent to killing 6,000 people with probability one; potentially destroying the Universe should require even more caution.
There's not even a one in a million; it's closer to "But there's still a chance, right?"
And you're still dealing in probabilities too small to sensibly calculate in this manner and be saying anything meaningful - "switching on the LHC is equivalent to killing 6,000 people for certain" is a statement that isn't actually sensible when rendered in English, and I don't see another way to render in English your calculated result that switching it on is "equivalent to killing 6,000 people with probability one". But please do enlighten me.
(I realise you're multiplying 6E9 by 1E-6 and asserting that six billion conceptual millionth-of-a-person slivers equals six thousand actual existing people. "Shut up and multiply" doesn't stop me balking at this, and that the result says "switching on the LHC is equivalent to killing 6,000 people for certain" seems to constitute a reductio ad absurdum for however one gets there.)
Rees estimated the probability of the LHC destroying the world at 1 in 50 million, and it would be surprising if he were one of the few people in the world without overconfidence bias, or one of the few people in the world who doesn't underestimate global existential risks.
I assume from the first sentence that you believe an appropriate probability to have for the LHC destroying the world is less than one in a billion. Trusting anyone, even the world scientific consensus, with one in a billion probability, seems excessive to me - the world scientific consensus has been wrong on more than one in every billion issues it thinks it's sure about. If you're working not off the world scientific consensus but off your own intuition, that seems even stranger - if, for example, the LHC will destroy the world if and only if strangelets are stable at 10 TeEV, then you just discovered important properties about the stability of strangelets to p = < .000000001 certainty, which seems like the sort of thing you shouldn't be able to do without any experiments or mathematics. If you're working off of a general tendency for the world not to be destroyed, well, there were five mass extinction events in the past billion years, so ignoring for the moment the tendency of mass extinctions to take multiple years, that means the probability of a mass extinction beginning in any particular year is about 5/billion. If I were to tell you "The human race will become extinct the year the LHC is switched on", would you really tell me "Greater than 80% chance it has nothing to do with the LHC" and go about your business?
I am still uncomfortable with the whole "shut up and multiply" concept too. But I think that's where the "shut up" part comes in. You don't have to be comfortable with it. You don't have to like it. But if the math checks out, you just shut up and keep your discomfort to yourself, because math is math and bad things happen when you ignore it.
Here we run into the problem of "garbage in, garbage out."
He assigned 50% extinction risk for the 21st century in his book. His overall estimates of risk are quite high.
What your probability discussion there seems to me to be saying is "these numbers are too small to think about in any sensible way, let alone calculate." Trying to think about them closely resembles an argument that the way to deal with technological existential risk is to give up technology and go back to the savannah (caves are too techy).
But the math leads to statements like "switching on the LHC is equivalent to killing 6,000 people for certain", which seems to constitute a reductio ad absurdum of whatever process led to such a sentence.
(You could justify it philosophically, but you're likely to get an engineer's answer: "No it isn't. Here, I'll show you. (click) Now, how many individuals did that just kill?")
One day I would like to open up an inverse casino.
The inverse casino would be full of inverse slot machines. Playing the inverse slot machines costs negative twenty-five cents - that is, each time you pull the bar on the machine, it gives you a free quarter. But once every few thousand bar pulls, you will hit the inverse jackpot, and be required to give the casino several thousand dollars (you will, of course, have signed a contract to comply with this requirement before being allowed to play).
You can also play the inverse lottery. There are ten million inverse lottery tickets, and anyone who takes one will get one dollar. But if your ticket is drawn, you must pay me fifteen million dollars. If you don't have fifteen million dollars, you will have various horrible punishments happen to you until fifteen million dollars worth of disutility have been extracted from you.
If you believe what you are saying, it seems to me that you should be happy to play the inverse lottery, and believe there is literally no downside. And it seems to me that if you refused, I could give you the engineer's answer "Look, (buys ticket) - a free dollar, and nothing bad happened to me!"
And if you are willing to play the inverse lottery, then you should be willing to play the regular lottery, unless you believe the laws of probability work differently when applied to different numbers.
The hedge fund industry called. They want their idea of selling far out-of-the-money options back.
Doesn't this describe the standard response to cars?
Just think of all the low-probability risks cars subsume! Similarly, if you take up smoking you no longer need to worry about radon in your walls, pesticides in your food, air pollution or volcano dust. It's like a consolidation loan! Only dumber.
Sorry, I don't understand. Response to cars?
Most of life is structured as a negative lottery. You get in a car, you get where you're going much faster- but if the roulette ball lands on 00, you're in the hospital or dead. (If it only lands on 0, then you're just facing lost time and property.)
And so some people are mildly afraid of cars, but mostly people are just afraid of bad driving or not being in control- the negative lottery aspect of cars is just a fact of life, taken for granted and generally ignored when you turn the key.
This is plausible and I shall contemplate it.
By the way, and a little bit on topic, I think it's not a coincidence that an inverse casino would be more expensive to run than a regular casino.
What Sewing-Machine said. A solution of the Pascal's mugging problem certainly doesn't imply that existential risks aren't to be worried about!
But Komponisto's idea is not to do with how many utils the universe contains.
Incidentally, it seems to me that if it's possible to make a credible threat to destroy the universe, then our main problem is not Pascal's mugging but the fragility of the universe.
As I understand it, komponisto's idea is that we don't have to worry about Pascal's Mugging because the probability of anyone being able to control 3^^^^3 utils is even lower than one would expect simply looking at the number 3^^^^3, and is therefore low enough to cancel out even this large a number.
What I am trying to respond is that there are formulations of Pascal's Mugging which do not depend on the number 3^^^^3. The idea that someone could destroy a universe worth of utils is more plausible than destroying 3^^^^3 utils, and it's not at all obvious there that the low probability cancels out the high risk.
Well, it may not be obvious what to do in that case! But the original formulation of the Pascal's Mugging problem, as I understand it, was to formally explain why it is obvious in the case of large numbers like 3^^^^3:
The answer proposed here is that a "friendly" utility function does not in fact allow utility to increase faster than complexity increases.
I don't claim this tells us what to do about the LHC.
A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly. This kills Pascal's mugging and paperclip maximizers with the same stone. It even sounds simple and formal enough to imagine testing it on a given piece of code.
How does that work at all? They're not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn't up for grabs. If our actual true utility function violates this rule, I don't want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that "our actual true utility function" does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let "our actual true utility function" refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
The question and Weissman's answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn't actually have (or doesn't require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
I think I agree.
Perhaps one way to state the complexity-of-value thesis would be to say that the utility function should be bounded by Kolmogorov complexity.
It doesn't quite kill Pascal's mugging - the threat does have to have some minimum level of credibility, but that minimum credibility can still be low enough that hand over the cash. Pascal's mugging only is killed if the expected utility of handing over the cash is negative. To show this I think you really do need to evaluate the probability to the end.
Neither does it kill paperclip maximizers. A bunch of paperclips requires about log2(N) bits to describe, plus the description of the properties of a paperclip. So the paperclip maximizer can still have a constantly-increasing utility as they make more paperclips, your rule would just bound it to growing like log(N).
Good line of thought though: there may still be something in here.
To "kill Pascal's mugging" one doesn't have to give advice on how to deal with threats generally.
I think that N paperclips takes about complexity-of-N, plus complexity of a paperclip, bits to describe. "Complexity of N" can be much lower than log(N), e.g. complexity of 3^^^3 is smaller than the wikipedia article on Knuth's notation. "3^^^3 paperclips" has very low complexity and very high utility.
Ah, you're right.
But I think that a decision theory is better (better fulfills desiterata of universality, simplicity, etc. etc.) if it treats Pascal's mugging with the same method it uses for other threats.
Why? Is "threat" a particularly "natural" category?
From my perspective, Pascal's mugging is simply an argument showing that a human-friendly utility function should have a certain property, not a special class of problem to be solved.
Hah. Well, we can apply my exact same argument with different words to show why I agree with you:
This will be the case in the scenario under discussion, due to the low probability of the mugger's threat (in the "3^^^^3 disutilons" version), or the (relatively!) low disutility (in the "3^^^^3 persons" version, under Michael Vassar's proposal).
Yes; it would be a "less pure" paperclip maximizer, but still an unfriendly AI.
The rule is (proposed to be) necessary for friendliness, not sufficient by any means.
I think that the more general problem is that if the absolute value of the utility that you attach to a world-state increases faster than does its complexity decreases given the current situation then the very possibility of that world-state existing will cause it to hijack the entirety of your utility function (assuming that there are no other world-states in your utility function which go FOOM in a similar fashion.)
Of course, utility functions are not constructed to avoid this problem, so I think that it's incredibly likely that each unbounded utility function has at least one world-state which would render it hijackable in such a manner.
Yes, that's exactly the problem.
Well, they had better be, or they will fall victim to it.
You have to choose one of the following: (1) Pascal's Mugging; (2) Scope Insensitivity (bounding utility by improbability); or (3) Wishful Thinking (bounding improbability by utility).
We often call such things a 'problem' yet by very definition it is exactly how it should be. If your utility function genuinely represents your preferences (including preferences with respect to risk) then rejoice in the opportunity to devote all your resources to the possibility in question! If it doesn't then the only 'problem' is that your 'utility function', well, isn't your actual utility function. It's the same problem that you get when you think you like carrots when you really like peaches.
Voluntary dedication is not 'hijacking'.
(Response primarily directed to quoted text and only a response to the parent in as much as it follows the problem frame.)
Agreed.
Our heuristics hijack our volition?
Don't see how your idea defeats this:
Having a bounded utility function defeats that.
See Sewing-Machine's comment. The smallness of the probability isn't fixed, if the probability is controlled by complexity, and complexity controls utility.
More precisely, the probability that the mugger can produce arbitrary amounts of utility is dominated by (the probability that the mugger can produce more than N units of utility), for every N; and as the latter is arbitrarily small for N sufficiently large, the former must be zero.
Without invoking complexity, one can say that an agent is immune to this form of Pascal's mugging if, for fixed I, the quantity P(x amount of utility | I) goes to zero as x grows.
If the agent's utility function is such that "x amount of utility" entails "f(x) amount of complexity," f(x) --> infinity, then this will hold for priors that are sensitive to complexity.
When denizens here say "value is complex" what they mean is something like "the things which humans want have no concise expression". They don't literally mean that a utility counter measuring the extent to which those values are met is difficult to compress. That would not make any sense.
I don't understand what you mean. Say more?
I stumbled across this fix and unfortunately discovered what I consider to be a massive problem with it - it would imply that your utility function is non-computable.
OK. So in order for this to work, it needs to be the case that your prior has the property that: P(3^^^3 disutility | I fail to give him $5) << 1/3^^^3.
Unfortunately, if we have an honest Kolmogorov prior and utility is computable via a complexity << 3^^^3 Turing machine, this cannot possibly be the case. In particular, it is a Theorem that for any computable function C (whose Turing machine has complexity K(C)), so that there are x with C(x) > N, then under the Kolmogorov prior for x we have that: P( C(x) > N ) >> 2^{ - K(C) - K(N) } Now, since K(3^^^3) is small, as long as utility is computed by a small Turing machine, and it is possible to have 3^^^3 disutility, such a circumstance will not be too unlikely under a Kolmogorov prior.
For those interested, here's how the theorem is proved. I will produce a Turing machine of size K(C) + K(N) +O(1) that outputs an x (in fact, the smallest x) so that C(x) > N. By definition, I can encode C and N in size K(C) + K(N) +O(1). I then have a Turing machine enumerate all x until it finds one so that C(x) > N and output's that x. This provides a lower bound.
I guess the problem is that if you just have a Kolmogorov prior, there is a relatively simple universe that is actually out to get you. In fact, being the shortest computation causing 3^^^3 disutility is actually a pretty simple condition.
The problem, as stated, seems to me like it can be solved by precommitting not to negotiate with terrorists--this seems like a textbook case.
So switch it to Pascal's Philanthropist, who says "I offer you a choice: either you may take this $5 bill in my hand, or I will use my magic powers outside the universe to grant you 3^^^^3 units of utility."
But I'm actually not intuitively bothered by the thought of refusing the $5 in that case. It's an eccentric thing to do, but it may be rational. Can anybody give me a formulation of the problem where taking the magic powers claim seriously is obviously crazy?
The two situations are not necessarily equivalent.
See my most recent response in the Pascal's Mugging thread - taking into account the Mugger's intentions & motives is relevant to the probability calculation.
Having said that, probably the two situations ARE equivalent - in both cases an increasingly high number indicates a higher probability that you are being manipulated.
That can work when the mugger is a terrorist. Unfortunately most muggers aren't. They're businessmen. Since the 'threat' issue isn't intended to be the salient feature of the question we can perhaps specify that the mugger would be paid $3 to run the simulation and is just talking to you in a hope of getting a better offer. You do negotiate under those circumstances.
For my part I don't like the specification of the problem as found on the wiki at all:
Quite aside from the 'threat' issue I just don't care what some schmuck simulates on a Turing machine outside the matrix. That is a distraction.
No responses and a downvote. Clearly I'm missing something obvious.
I wasn't the downvoter (nor the upvoter), and wouldn't have downvoted; but I would suggest considering the abstract version of the problem:
The way around Pascal's mugging is to have a bounded utility function. Even if you are a paperclip-maximizer, your utility function is not the number of paperclips in the universe, it is some bounded function that is monotonic in the number of paperclips but asymptotes out. You are only linear in paperclips over small numbers of paperclips. This is not due to exponential discounting but because utility doesn't mean anything other than the function that we are maximizing the expected value of. It has an unfortunate namespace collision with the other utility, which is some intuitive quantification of our preferences that is probably closer to something like a description of the trades we would be willing to make. If you are unwilling to be mugged by Pascal's mugger then it simply follows as a mathematical fact that your utility is bounded by something on the order of the reciprocal of the probability that you would be un-muggable at.
For more of a description, see my post here, which originally got downvoted to oblivion because it argued from the position of a lack of knowledge of the VNM utility theorem. The post has since been fixed, and while it is not super-detailed, lays out an argument for why Pascal's mugging is resolved once we stop trying to make our utility functions look intuitive.
Incidentally, Pascal's mugging does lay out a good argument of why we need to be careful about an AGI's utility function; if we make it unbounded then we can get weird behavior indeed.
EDIT: Of course, perhaps I am still wrong somehow and there are unresolvable subtleties that I am missing. But I, at least, am simply unwilling to care about events occurring with probability 10^(-100), regardless of how bad they are.
Way around? If my utility function suggests that being mugged by Pascal is the best thing for me to do then I'll be delighted to do it.
Utility functions determine our decisions, not the reverse!
A utility function shouldn't suggest anything. It is simply an abstract mathematical function that is guaranteed to exist by the VNM utility theorem. If you're letting an unintuitive mathematical theorem tell you to do things that you don't want to do, then something is wrong.
Again, the problem is there is a namespace collision between the utility function guaranteed by VNM, which we are maximizing the expected value of, and the utility function that we intuitively associate with our preferences, which we (probably) aren't maximizing the expected value of. VNM just says that if you have consistent preferences, then there is some function whose expected value you are maximizing. It doesn't say that this function has anything to do with the degree to which you want various things to happen.
I seem to be having a lot of trouble getting this point across, so let me try to put it another way: Ignore Kolmogorov complexity, priors, etc. for a moment, and if you can, forget about your utility function and just ask yourself what you would want. Now imagine the worst possible thing that could happen (you can even suppose that both time and space are potentially infinite, so infinitely many people being tortured for infinite extents of time is fine). Let us call this thing X. Suppose that you have somehow calculated that, with probability 10^(-100), the mugger will cause X to happen if you don't pay him $5. Would you pay him? If you would pay him, then why?
I am actually quite interested in the answer to this question, because I am having trouble diagnosing the precise source of my disagreement on this issue. And even though I said to forget about utility functions, if you really think that is the answer to the "why" question, feel free to use them in your argument. As I said, at this point I am most interested in determining why we disagree, because previous discussions with other people suggest that there is some hidden inferential distance afoot.
As an aside, if you wouldn't pay him then the definition of utility implies that u($5) > 10^(-100) u(X), which implies that u(X), and therefore the entire utility function, is bounded.
As was pointed out in the other subthread, you are assuming the conclusion you wish to prove here, viz. that the utility function is (necessarily) bounded.
Fine, I was slightly sloppy in my original proof (not only in the way you pointed out, but also in keeping track of signs). Here is a rigorous version:
Suppose that there is nothing so bad that you would pay $5 to stop it from happening with probability 10^(-100). Let X be a state of the universe. Then u(-$5) < 10^(-100) u(X), so u(X) > 10^(100) u(-$5). Since u(X) > 10^(100) u(-$5) for all X, u is bounded below.
Similarly, suppose that there is nothing so good that you would pay $5 to have a 10^(-100) chance of it happening. Then u($5) > 10^(100) u(X) for all X, so u(X) < 10^(100) u($5), hence u is also bounded above.
Now I've given proofs that u is bounded both above and below, without looking at argmax u or argmin u (which incidentally probably don't exist even if u is bounded; it is much more likely that u asymptotes out).
My proof is still not entirely rigorous, for instance u(-$5) and u($5) will in general depend on my current level of income / savings. If you really want me to, I can write everything out completely rigorously, but I've been trying to avoid it because I find that diving into unnecessary levels of rigor only obscures the underlying intuition (and I say this as someone who studies math).
Again, why assume this?
Your question has two possible meanings to me, so I'll try to answer both.
Meaning 1: Why is this a reasonable assumption in the context of the current debate?
Answer: Because if there was something that bad, then you get Pascal's mugged in my hypothetical situation. What I have shown is that either you would give Pascal $5 in that scenario, or your utility function is bounded.
Meaning 2: Why is this a reasonable assumption in general?
Answer: Because things that occur with probability 10^(-100) don't actually happen. Actually, 10^(-100) might be a bit high, but certainly things that occur with probability 10^(-10^(100)) don't actually happen.
You seem not to have understood the post. The worse something is, the more difficult it is for the mugger to make the threat credible. There may be things that are so bad that I (or my hypothetical AI) would pay $5 not to raise their probability to 10^(-100), but such things have prior probabilities that are lower than 10^(-100), and a mugger uttering the threat will not be sufficient evidence to raise the probability to 10^(-100).
We don't need to declare 10^(-100) equal to 0. 10^(-100) is small enough already.
I have to admit that I did find the original post somewhat confusing. However, let me try to make sure that I understood it. I would summarize your idea as saying that we should have u(X) = O(1/p(X)), where u is the utility function and p is our posterior estimate of X. Is that correct? Or do you want p to be the prior estimate? Or am I completely wrong?
Yes, p should be the prior estimate. The point being that the posterior estimate is not too different from the prior estimate in the "typical" mugging scenario (i.e. someone says "give me $5 or I'll create 3^^^^3 units of disutility" without specifying how in enough detail).
This doesn't actually imply that the entire utility function is bounded. It is still possible that u(Y) is infinite, where Y is something that is valued positively.
As an aside we can now consider the possibility of Pascal's Samaritan.
Assume a utility function such that u(Y) is infinite (and neutral with respect to risk). Further assume that you predict that $5 would increase your chance of achieving Y by 1/3^^^3. A Pascal Samaritan can offer to pay you $5 for the opportunity to give you a 90% chance of sending the entire universe into the hell state X. Do you take the $5?
From my reply to komponisto (incidentally, both you and he seem to be making the same objections in parallel, which suggests that I'm not doing a very good job of explaining myself, sorry):
The meaning of a phrase, primarily. And slightly about the proper use of an abstract concept.
A utility function should be a representation of my values. If my values are such that paying a mugger is the best option then I am glad to pay a mugger.
If I were to pay him it would be because I happen to value not having a 10^(-100) chance of X happening more than I value $5.
My utility function quite likely is bounded. Not because that is a way around pascal's mugging. Simply because that happens to be what the arbitrary value system represented by this particular bunch of atoms happens to be.
Hm...it sounds like we agree on far more than I thought, then.
What I am saying is that my utility function is bounded because it would be ridiculous to be Pascal's mugged, even in the hypothetical universe I created that disobeys komponisto's priors. Put another way, I am simply not willing to seriously consider events at probabilities of, say, 10^(-10^(100)), because such events don't happen. For this same reason, I have a hard time taking anyone seriously who claims to have an unbounded utility function, because they would then care about events that can't happen in a sense at least as strong as the sense that 1 is not equal to 2.
Would you object to anything in the above paragraph? Thanks for bearing with me on this, by the way.
P.S. Am I the only one who is always tempted to write "mugged by Pascal" before realizing that this is comically different from being "Pascal's mugged"?
As far as I know they do happen. To know that such a number cannot represent an altogether esoteric feature of the universe that can nevertheless be the legitimate subject of infinite value I would need to know the smallest number that can be assigned to a quantum state.
(This objection is purely tangential. See below for significant disagreement.)
That isn't true. Someone can assign infinite utility to Australia winning the ashes if that is what they really want. I'd think them rather silly but that is just my subjective evaluation, nothing to do with maths.
I think you are conflating quantum probabilities with Bayesian probabilities here, but I'm not sure. Unless you think this point is worth discussing further I'll move on to your more significant disagreement.
Hm...I initially wrote a two-paragraph explanation of why you were wrong, then deleted it because I changed my mind. So, I think we are making progress!
I initially thought I accorded disdain to unbounded utility functions for the same reason that I accorded disdain to ridiculous priors. But the difference is that your priors affect your epistemic state, and in the case of beliefs there is only one right answer. On the other hand, there is nothing inherently wrong with being a paperclip maximizer.
I think the actual issue I'm having is that I suspect that most people who claim to have unbounded utility functions would have been unwilling to make the trades implied by this before reading about VNM utility / "Shut up and multiply". So my objection is not that unbounded utility functions are inherently wrong, but that they cannot possibly reflect the preferences of a human.
On this I believe we approximately agree.
The post you're commenting on argues that Pascal's mugging is already solved by merely letting the utility function be bounded by Kolmogorov complexity. Obviously, having it be uniformly bounded also solves the problem, but why resort to something so drastic if you don't need to?
The OP is not living in the least convenient possible world. In particular, let X be the worst thing that could happen. Suppose that at the end of the day you have calculated that X will occur with probability 10^(-100) if you don't pay the mugger $5. Assuming that you wouldn't pay the mugger, then by definition of the utility function it follows that u($5) > 10^(-100) u(X). So u(X) < 10^(100) u($5) and is therefore bounded. Since u(X) is the worst thing that could happen, this means that your entire utility function is bounded.
See also my reply to wedrifid where this argument is slightly expanded.
If your utility function is not bounded (below), then there is no "worst thing that could happen."
See my reply to komponisto in the comment above.
I really like this suggestion. One esthetic thing it has going for it: complexity should be a terminal value for human-relatable intelligent agents anyway. It seems gauche for simple pleasures (orgasms, paperclips) to yield unbounded utility.