This problem makes more sense if you strip out time and the doubling, and look at this one:
Choose an integer N. Receive N utilons.
This problem has no optimal solution (because there is no largest integer). You can compare any two strategies to each other, but you cannot find a supremum; the closest thing available is an infinite series of successively better strategies, which eventually passes any single strategy.
In the original problem, the options are "don't open the box" or "wait N days, then open the box". The former can be crossed off; the latter has the same infinite series of successively better strategies. (The apparent time-symmetry is a false one, because there are only two time-invariant strategies, and they both lose.)
The way to solve this in decision theory is to either introduce finiteness somewhere that caps the number of possible strategies, or to output an ordering over choices instead of a single choice. The latter seems right; if you define and prove an infinite sequence of successively better options, you still have to pick one; and lattices seem like a good way to represent the results of partial reasoning.
I've never met an infinite decision tree in my life so far, and I doubt I ever will. It is a property of problems with an infinite solution space that they can't be solved optimally, and it doesn't reveal any decision theoretic inconsistencies that could come up in real life.
Consider this game with a tree structure: You pick an arbitrary natural number, and then, your opponent does as well. The player who chose the highest number wins. Clearly, you cannot win this game, as no matter which number you pick, the opponent can simply add one to that number. This also works with picking a positive rational number that's closest to 1 - your opponent here adds one to the denominator and the numerator, and wins.
The idea to use a busy beaver function is good, and if you can utilize the entire universe to encode the states of the busy beaver with the largest number of states possible (and a long enough tape), then that constitutes the optimal solution, but that only takes us further out into the realm of fiction.
After considering this problem, what I found was surprisingly fast, the specifics of the boxes physical abilities and implementation becomes relevant. I mean, let's say Clippy is given this box, and has already decided to wait a mere 1 year from day 1, which is 365.25 days of doubling, and 1 paperclip is 1 utilon. At some point, during this time, before the end of it, There are more paperclips then there used to be every atom in the visible universe. Since he's predicted to gain 2^365.25 paperclips, (which is apparently close to 8.9*10^109) and the observable universe is only estimated to contain 10^80 atoms. So to make up for that, let's say the box converts every visible subatomic particle into paperclips instead.
That's just 1 year, and the box has already announced it will convert approximately every visible subatomic particle into pure paperclip bliss!
And then another single doubling... (1 year and 1 day) Does what? Even if Clippy has his utility function unbounded, it should presumably still link back to some kind of physical state, and at this point the box starts having to implement increasingly physically impossible ideas to have to double paperclip utility, like:
Breaking t...
Your other option is to sell the box to the highest bidder. That will probably be someone who's prepared to wait longer than you, and will therefore be able to give you a higher price than the utilons you'd have got out of the box yourself. You get the utilons today.
If you can use mixed strategies (i.e. are not required to be deterministically predictable), you can use the following strategy for the doubling-utility case: every day, toss a coin; if it comes up heads, open the box, otherwise wait another day. Expected utility of each day is constant 1/2, since the probability of getting heads on a particular day halves with each subsequent day, and utility doubles, so the series diverges and you get infinite total expected utility.
So I don't really know how utilons work, but here is an example of a utility function which is doubling box-proof. It is bounded; furthermore, it discounts the future by changing the bound for things that only affect the future. So you can get up to 1000 utilons from something that happens today, up to 500 utilons from something that happens tomorrow, up to 250 utilons from something that happens two days from now, and so on.
Then the solution is obvious: if you open the box in 4 days, you get 16 utilons; if you open the box in 5 days, you'd get 32 but you...
Am I correct in assessing that your solution is to stop when you can no longer comprehend the value in the box? That is, when an additional doubling has no subjective meaning to you? (Until that point, you're not in a state loop, as the value with each doubling provides an input you haven't encountered before.)
I was about to suggest stopping when you have more utilons than your brain has states (provided you could measure such), but then it occurred to me the solutions might be analogous, even if they arrive at different numbers.
I wait until there are so many utiltons in the box that I can use them to get two identical boxes and have some utiltions left over. Every time a box has more than enough utilitons to make two identical boxes, I repeat that step. Any utilitons not used to make new boxes are the dividend of the investment.
What if instead of growing exponentially without bound, it decays exponentially to the bound of your utility function?
I think you mean 'asymptotically'.
I way to think about this problem to put you in near mode is to imagine what the utility might look like. Ex:
Day 1: Finding a quarter on the ground
Day 2: A child in Africa getting $5
.....
Day X: Curing cancer
Dax X+1: Curing cancer, Alzheimers, and AIDS.
On one hand, by waiting a day, more people would die of cancer. On the other, by not waiting, you'd doom all those future people to die of AIDS and Alzheimers.
How exactly do the constant utilons in the box compensate me for how I feel the day after I open the box (I could have doubled my current utility!)? The second day after (I could have quadrupled my current utility!!)? The Nth day after (FFFFFFFFFFFFUUUUU!!!)? I'm afraid the box will rewrite me with a simple routine that says "I have 2^(day-I-opened-the-box - 1) utility! Yay!"
If you return to a state you have already been at, you know you are going to be waiting forever and lose and get nothing.
You seem to be assuming here that returning to a state you have already been at is equivalent to looping your behavior, so that once a Turing machine re-enters a previously instantiated state it cannot exhibit any novel behavior. But this isn't true. A Turing machine can behave differently in the same state provided the input it reads off its tape is different. The behavior must loop only if the the combination of Turing machine state...
You have given reasons why requiring bounded utility functions and discounting the future are not adequate responses to the problem if considered individually. But your objection to the bounded utility function response assumes that future utility isn't discounted, and your objection to the discounting response assumes that the utility function is unbounded. So what if we require both that the utility function must be bounded and that future utility must be discounted exponentially? Doesn't that get around the paradox?
...I remember reading a while ago about
If you know the probability distribution P(t) of you dying on day t, then you can solve exactly for optimal expected lifetime utilons out of the box. If you don't know P(t), you can do some sort of adaptive estimation as you go.
You could build a machine that opens the box far in the future, at the moment when the machine's reliability starts degrading faster than the utilons increase. This maximizes your expected utility.
Or if you're not allowed to build a machine, you simply do the same with yourself (depending on our model, possibly multiplying by your expected remaining lifespan).
Bringing together what others have said, I propose a solution in three steps:
Adopt a mixed strategy where, for each day, you open the box on that day with probability p. The expected utility of this strategy is the sum of (p (1-p)^n 2^n), for n=0... which diverges for any p in the half-open interval (0,0.5]. In other words, you get infinite EU as long as p is in (0,0.5]. This is paradoxical, because it means a strategy with a 0.5 risk of ending up with only 1 utilon is as good as any other.
Extend the range of our utility function to a number syste
There is a good reason to use a bounded utility function, note -- if the conditions of Savage's theorem hold, the utility function you get from it is bounded.
How long do you wait before opening it? If you never open it, you get nothing (you lose! Good day, sir or madam!) and whenever you take it, taking it one day later would have been twice as good.
When do I "lose" precisely? When I never take it? By happy coincidence 'never' happens to be the very next day after I planned to open the box!
There are no other ways to get utilons.
Is a weakness in your argument. Either you can survive without utilons, a contradiction to utility theory, or you wait until your "pre-existing" utilons are used up and you need more to survive.
This suggests a joke solution: Tell people about the box, then ask them for a loan which you will repay with proceeds from the box. Then you can live off the loan and let your creditors worry about solving the unsolvable.
If I am actually immortal, and there is no other way to get Utilions then each day, the value of me opening the box is something like:
Value=Utilions/Future Days
Since my Future Days are supposedly infinite, we are talking about at best an infinitesimal difference between me opening the box on Day 1 and me opening the box on Day 3^^^^3. There is no actual wrong day to open the box. If that seems implausible, it is because the hypothetical itself is implausible.
Let's say you have a box that has a token in it that can be redeemed for 1 utilon. Every day, its contents double. There is no limit on how many utilons you can buy with these tokens. You are immortal. It is sealed, and if you open it, it becomes an ordinary box. You get the tokens it has created, but the box does not double its contents anymore. There are no other ways to get utilons.
How long do you wait before opening it? If you never open it, you get nothing (you lose! Good day, sir or madam!) and whenever you take it, taking it one day later would have been twice as good.
I hope this doesn't sound like a reductio ad absurdum against unbounded utility functions or not discounting the future, because if it does you are in danger of amputating the wrong limb to save yourself from paradox-gangrene.
What if instead of growing exponentially without bound, it decays exponentially to the bound of your utility function? If your utility function is bounded at 10, what if the first day it is 5, the second 7.5, the third 8.75, etc. Assume all the little details, like remembering about the box, trading in the tokens, etc, are free.
If you discount the future using any function that doesn't ever hit 0, then the growth rate of the tokens can be chosen to more than make up for your discounting.
If it does hit 0 at time T, what if instead of doubling, it just increases by however many utilons will be adjusted to 1 by your discounting at that point every time of growth, but the intervals of growth shrink to nothing? You get an adjusted 1 utilon at time T - 1s, and another adjusted 1 utilon at T - 0.5s, and another at T - 0.25s, etc? Suppose you can think as fast as you want, and open the box at arbitrary speed. Also, that whatever solution your present self precommits to will be followed by the future self. (Their decision won't be changed by any change in what times they care about)
EDIT: People in the comments have suggested using a utility function that is both bounded and discounting. If your utility function isn't so strongly discounting that it drops to 0 right after the present, then you can find some time interval very close to the present where the discounting is all nonzero. And if it's nonzero, you can have a box that disappears, taking all possible utility with it at the end of that interval, and that, leading up to that interval, grows the utility in intervals that shrink to nothing as you approach the end of the interval, and increasing the utility-worth of tokens in the box such that it compensates for whatever your discounting function is exactly enough to asymptotically approach your bound.
Here is my solution. You can't assume that your future self will make the optimal decision, or even a good decision. You have to treat your future self as a physical object that your choices affect, and take the probability distribution of what decisions your future self will make, and how much utility they will net you into account.
Think if yourself as a Turing machine. If you do not halt and open the box, you lose and get nothing. No matter how complicated your brain, you have a finite number of states. You want to be a busy beaver and take the most possible time to halt, but still halt.
If, at the end, you say to yourself "I just counted to the highest number I could, counting once per day, and then made a small mark on my skin, and repeated, and when my skin was full of marks, that I was constantly refreshing to make sure they didn't go away...
...but I could let it double one more time, for more utility!"
If you return to a state you have already been at, you know you are going to be waiting forever and lose and get nothing. So it is in your best interest to open the box.
So there is not a universal optimal solution to this problem, but there is an optimal solution for a finite mind.
I remember reading a while ago about a paradox where you start with $1, and can trade that for a 50% chance of $2.01, which you can trade for a 25% chance of $4.03, which you can trade for a 12.5% chance of $8.07, etc (can't remember where I read it).
This is the same paradox with one of the traps for wannabe Captain Kirks (using dollars instead of utilons) removed and one of the unnecessary variables (uncertainty) cut out.
My solution also works on that. Every trade is analogous to a day waited to open the box.