This is very good post. The real question that has not explicitly been asked is the following:
How can utility be maximised when there is no maximum utility?
The answer of course is that it can't.
Some of the ideas that are offered as solutions or approximations of solutions are quite clever, but because for any agent you can trivially construct another agent who will perform better and there is no metrics other than utility itself for determining how much better an agent is than another agent, solutions aren't even interesting here. Trying to find limits such as storage capacity or computing power is only avoiding the real problem.
These are simply problems that have no solutions, like the problem of finding the largest integer has no solution. You can get arbitrarily close, but that's it.
And since I'm at it, let me quote another limitation of utility I very recently wrote about in a comment to Pinpointing Utility:
Assuming you assign utility to lifetime as a function of life quality in such a way that for any constant quality longer life has strictly higher (or lower) utility than shorter life, then either you can't assign any utility to actually infinite immortality, or you can't differentiate between higher-quality and lower-quality immortality, or you can't represent utility as a real number.
Suppose that you die, and God offers you a deal. You can spend 1 day in Hell, and he will give you 2 days in Heaven, and then you will spend the rest of eternity in Purgatory (which is positioned exactly midway in utility between heaven and hell). You decide that it's a good deal, and accept. At the end of your first day in Hell, God offers you the same deal: 1 extra day in Hell, and you will get 2 more days in Heaven. Again you accept. The same deal is offered at the end of the second day.
This isn't a paradox about unbounded utility functions but a paradox about how to do decision theory if you expect to have to make infinitely many decisions. Because of the possible failure of the ability to exchange limits and integrals, the expected utility of a sequence of infinitely many decisions can't in general be computed by summing up the expected utility of each decision separately.
This is like the supremum-chasing Alex Mennen mentioned. It's possible that normative rationality simply requires that your utility function satisfy the condition he mentioned, just as it requires the VNM axioms.
I'm honestly not sure. It's a pretty disturbing situation in general.
I like this point of view.
ETA: A couple commenters are saying it is bad or discouraging that you can't optimize over non-compact sets, or that this exposes a flaw in ordinary decision theory. My response is that life is like an infinitely tall drinking-glass, and you can put as much water as you like in it. You could look at the glass and say, "it will always be mostly empty", or you could look at it and say "the glass can hold an awful lot of water".
You're immortal. Tell Omega any real number r > 0, and he'll give you 1-r utility. On top of that, he will give you any utility you may have lost in the decision process (such as the time wasted choosing and specifying your number). Then he departs. What number will you choose?
This is rather tangential to the point, but I think that by refunding utility you are pretty close to smuggling in unbounded utility. I think it is better to assume away the cost.
An agent who only recognises finitely many utility levels doesn't have this problem. However, there's an equivalent problem for such an agent where you ask them to name a number n, and then you send them to Hell with probability 1/n and Heaven otherwise.
You're immortal. Tell Omega any natural number, and he will give you that much utility.
You could generate a random number using a distribution that has infinite expected value, then tell Omega that number. Your expected utility of following this procedure is infinite.
But if there is a non-zero chance of an Omega existing that can grant you an arbitrary amount of utility, then there must also a non-zero chance of some Omega deciding on its own at some future time to grant you a random amount of utility using the above distribution, so you've already got...
I don't expect extreme examples to lead to good guidance for non-extreme ones.
Two functions may both approach infinity, and yet have a finite ratio between them.
Hard cases make bad law.
This suggests a new explanation for the Problem of Evil: God could have created a world that had no evil and no suffering which would have been strictly better than our world, but then He could also have created a world that was strictly better than that one and so on, so He just arbitrarily picked a stopping point somewhere and we ended up with the world as we know it.
Depends on how well I can store information in hell. I imagine that hell is a little distracting.
Alternately, how reliably I can generate random numbers when being offered the deal (I'm talking to God here, not Satan, so I can trust the numbers). Then I don't need to store much information. Whenever I lose count, I ask for a large number of dice of N sides where N is the largest number I can specify in time (there we go with bounding the options again - I'm not saying you were wrong). If they all come up 1, I take the deal. Otherwise I reset my count.
The o...
Infinite utilities violate VNM-rationality. Unbounded utility functions do too, because they allow you to construct gambles that have infinite utility. For instance, if the utility function is unbounded, then there exists a sequence of outcomes such that for each n, the utility of the nth outcome is at least 2^n. Then the utility of the gamble that, for each positive integer n, gives you a 1/2^n chance of getting the nth outcome, has infinite utility.
In the case of utility functions that are bounded but do not have a maximum, the problem is not particularl...
This may be one of those times where it is worth pointing out once again that if you are a utility-maximizer because you follow Savage's axioms then you are not only a utility-maximizer[0], but a utility-maximizer with a bounded utility function.
[0]Well, except that your notion of probability need only be finitely additive.
This may be one of those times where it is worth pointing out once again that if you are a utility-maximizer because you follow Savage's axioms then you are not only a utility-maximizer[0], but a utility-maximizer with a bounded utility function.
[0]Well, except that your notion of probability need only be finitely additive.
What should you do?
Figure out that I'm not a perfectly rational agent and go on with the deal for as long as I feel like it.
Bail out when I subjectively can't stand any more of Hell or when I'm fed up with writing lots of numbers on an impossibly long roll of paper.
Of course, these aren't answers that help in developing a decision theory for an AI ...
First, the original question seems incomplete. Presumably the alternative to accepting the deal is something better than the guaranteed hell forever, say, 50/50 odds of ending up in either hell or haven.
Second, the initial evaluation of utilities is based on a one-shot setup, so you effectively precommit to not accepting any new deals which screw up the original calculation, like spending an extra day in hell.
Busy Beaver numbers are extremely uncomputable, so some agents, by pure chance, may be capable of acquiring much greater utility than others.
Pure chance is one path, divine favor is another. Though I suppose to the extent divine favor depends on one's policy bits of omega begotten of divine favor would show up as a computably-anticipatable consequence, even if omega isn't itself computable. Still, a heuristic you didn't mention: ask God what policy He would adopt in your place.
I've heard hell is pretty bad. I feel like after some amount of time in hell I would break down like people who are being tortured often do and tell God "I don't even care, take me straight to purgatory if you have to, anything is better than this!" TBH, I feel like that might even happen at the end of the first day. (But I'd regret it forever if I never even got to check heaven out at least once.) So it seems extremely unlikely that I would ever end up "accidentally" spending an eternity in hell. d:
In all seriousness, I enjoyed the post.
Alas, the stereotypical images of Heaven and Hell aren't perfectly setup for our thought experiments! I shall complain to the pope.
You're immortal. Tell Omega any real number r > 0, and he'll give you 1-r utility.
This problem is obviously isomorphic to the previous one under the transformation r=1/s and rescaling the utility: pick a number s > 0 and rescale the utility by s/(1-r), both are valid operations on utilities.
There are many paradoxes with unbounded utility functions. For instance, consider whether it's rational to spend eternity in Hell:
Suppose that you die, and God offers you a deal. You can spend 1 day in Hell, and he will give you 2 days in Heaven, and then you will spend the rest of eternity in Purgatory (which is positioned exactly midway in utility between heaven and hell). You decide that it's a good deal, and accept. At the end of your first day in Hell, God offers you the same deal: 1 extra day in Hell, and you will get 2 more days in Heaven. Again you accept. The same deal is offered at the end of the second day.
And the result is... that you spend eternity in Hell. There is never a rational moment to leave for Heaven - that decision is always dominated by the decision to stay in Hell.
Or consider a simpler paradox:
You're immortal. Tell Omega any natural number, and he will give you that much utility. On top of that, he will give you any utility you may have lost in the decision process (such as the time wasted choosing and specifying your number). Then he departs. What number will you choose?
Again, there's no good answer to this problem - any number you name, you could have got more by naming a higher one. And since Omega compensates you for extra effort, there's never any reason to not name a higher number.
It seems that these are problems caused by unbounded utility. But that's not the case, in fact! Consider:
You're immortal. Tell Omega any real number r > 0, and he'll give you 1-r utility. On top of that, he will give you any utility you may have lost in the decision process (such as the time wasted choosing and specifying your number). Then he departs. What number will you choose?
Again, there is not best answer - for any r, r/2 would have been better. So these problems arise not because of unbounded utility, but because of unbounded options. You have infinitely many options to choose from (sequentially in the Heaven and Hell problem, all at once in the other two) and the set of possible utilities from your choices does not possess a maximum - so there is no best choice.
What should you do? In the Heaven and Hell problem, you end up worse off if you make the locally dominant decision at each decision node - if you always choose to add an extra day in Hell, you'll never get out of it. At some point (maybe at the very beginning), you're going to have to give up an advantageous deal. In fact, since giving up once means you'll never be offered the deal again, you're going to have to give up arbitrarily much utility. Is there a way out of this conundrum?
Assume first that you're a deterministic agent, and imagine that you're sitting down for an hour to think about this (don't worry, Satan can wait, he's just warming up the pokers). Since you're deterministic, and you know it, then your ultimate life future will be entirely determined by what you decide right now (in fact your life history is already determined, you just don't know it yet - still, by the Markov property, your current decision also determines the future). Now, you don't have to reach any grand decision now - you're just deciding what you'll do for the next hour or so. Some possible options are:
There are many other options - in fact, there are precisely as many options as you've considered during that hour. And, crucially, you can put an estimated expected utility to each one. For instance, you might know yourself, and suspect that you'll always do the same thing (you have no self discipline where cake and Heaven are concerned), so any decision apart from immediately rejecting all of God's deals will give you -∞ utility. Or maybe you know yourself, and have great self discipline and perfect precommitments- therefore if you pick a number N in the coming hour, you'll stick to it. Thinking some more may have a certain expected utility - which may differ depending on what directions you direct your thoughts. And if you know that you can't direct your thoughts - well then they'll all have the same expected utility.
But notice what's happening here: you've reduced the expected utility calculation over infinitely many options, to one over finitely many options - namely, all the interim decisions that you can consider in the course of an hour. Since you are deterministic, the infinitely many options don't have an impact: whatever interim decision you follow, will uniquely determine how much utility you actually get out of this. And given finitely many options, each with expected utility, choosing one doesn't give any paradoxes.
And note that you don't need determinism - adding stochastic components to yourself doesn't change anything, as you're already using expected utility anyway. So all you need is an assumption of naturalism - that you're subject to the laws of nature, that your decision will be the result of deterministic or stochastic processes. In other words, you don't have 'spooky' free will that contradicts the laws of physics.
Of course, you might be wrong about your estimates - maybe you have more/less willpower than you initially thought. That doesn't invalidate the model - at every hour, at every interim decision, you need to choose the option that will, in your estimation, ultimately result in the most utility (not just for the next few moments or days).
If we want to be more formal, we can say that you're deciding on a decision policy - choosing among the different agents that you could be, the one most likely to reach high expected utility. Here are some policies you could choose from (the challenge is to find a policy that gets you the most days in Hell/Heaven, without getting stuck and going on forever):
But why spend a whole hour thinking about it? Surely the same applies for half an hour, a minute, a second, a microsecond? That's entirely a convenience choice - if you think about things in one second increments, then the interim decision "think some more" is nearly always going to be the dominant one.
The mention of the Busy Beaver number hints at a truth - given the limitations of your mind and decision abilities, there is one policy, among all possible policies that you could implement, that gives you the most utility. More complicated policies you can't implement (which generally means you'd hit a loop and get -∞ utility), and simpler policies would give you less utility. Of course, you likely won't find that policy, or anything close to it. It all really depends on how good your policy finding policy is (and your policy finding policy finding policy...).
That's maybe the most important aspect of these problems: some agents are just better than others. Unlike finite cases where any agent can simply list all the options, take their time, and choose the best one, here an agent with a better decision algorithm will outperform another. Even if they start with the same resources (memory capacity, cognitive shortcuts, etc...) one may be a lot better than another. If the agents don't acquire more resources during their time in Hell, then their maximal possible utility is related to their Busy Beaver number - basically the maximal length that a finite-state agent can survive without falling into an infinite loop. Busy Beaver numbers are extremely uncomputable, so some agents, by pure chance, may be capable of acquiring much greater utility than others. And agents that start with more resources have a much larger theoretical maximum - not fair, but deal with it. Hence it's not really an infinite option scenario, but an infinite agent scenario, with each agent having a different maximal expected utility that they can extract from the setup.
It should be noted that God, or any being capable of hypercomputation, has real problems in these situations: they actually have infinite options (not a finite options of choosing their future policy), and so don't have any solution available.
This is also related to theoretical maximally optimum agent that is AIXI: for any computable agent that approximates AIXI, there will be other agents that approximate it better (and hence get higher expected utility). Again, it's not fair, but not unexpected either: smarter agents are smarter.
What to do?
This analysis doesn't solve the vexing question of what to do - what is the right answer to these kind of problems? These depend on what type of agent you are, but what you need to do is estimate the maximal integer you are capable of computing (and storing), and endure for that many days. Certain probabilistic strategies may improve your performance further, but you have to put the effort into finding them.