I've written before about the difficulty of distinguishing values from errors, from algorithms, and from context. Now I have to add to that list: How can we distinguish our utility function from the parameters we use to apply it?
In my recent discussion post, "Rationalists don't care about the future", I showed that exponential time-discounting, plus some assumptions about physics and knowledge, leads to not caring about the future. Many people responded by saying that, if I care about the future, this shows that my utility function does not use exponential time-discounting.
This response assumes that the shape of my time-discounting function is part of my utility function. In other words, the way you time-discount is one of your values.
By contrast, Eliezer wrote an earlier post saying that we should use human values, but without time-discounting. Eliezer is aware that humans appear to use time discounting. Therefore, this implicitly claims that the time-discounting function is not one of our values. It's a parameter for how we implement them.
(Some of the arguments Eliezer used were value-based arguments, suggesting that we can use our values to set the parameters that we use to implement our values... I suspect this recursive approach could introduce bogus solutions, like multiplying both sides of an equation by a variable, or worse; but that would take a longer post to address. I will note that some recursive equations do have unique solutions.)
The program of CEV assumes that a transhuman can use some extrapolated version of values currently used by some humans. If that transhuman has a life expectancy of a billion years, it will likely view time discounting differently. Eliezer's post against time discounting suggests, to me, a God-like view of the universe, in which we eliminate time discounting in the same way (and for the same reasons) that many people want to eliminate space-discounting (not caring about far-away people) in contemporary ethics. This is taking an ethical code that evolved agents have, which is constructed to promote the propagation of those agents' genes, and applying it without reference to any particular set of genes. This is also pretty much what folk-morality says a social moral code is. So the idea that you can apply the same utility function from a radically different context, is inherent in CEV, and is common to much public discourse on ethics which assumes that you can construct a social morality that is based on the morality we find in individual agents.
On the other hand, I have argued that assuming that social ethics and individual ethics are the same, is either merely sloppy thinking, or an evolved (or deliberately constructed) lie. People who believed this would probably subscribe to a social-contract theory of ethics. (This view also has problems, beyond the scope of this post.)
I have one heuristic that I think is pretty good for telling when something is not a value: If it's mathematically wrong, it's an error, not a value. So my inclination is to point out that exponential time-discounting is correct. All other forms of time-discounting lead to inconsistencies. You can time-discount exponentially; or you can not time-discount at all, as Eliezer suggested; or you can be in error.
But my purpose in this post is not to continue the arguments from that other post. It's to point out this additional challenge in isolating what values are. Is your time-discounting function a value, or a value parameter?
As Manfred said, it does not appear that the results of the paper are affected by time discounting.
Let's be a bit more explicit about this. The model in the paper is that an action (or perhaps a pair (action,context)) is represented by a single natural number; this is provided to the environment and it returns a single natural number; the agent feeds that number into its utility function, and out comes a utility. The agent has measured what the environment does in response to a finite set of actions; what it cares about is the expected utility (over computable environments whose behaviour is consistent with the agent's past observations, with some not-necessarily-computable but not too ill-behaved probability distribution on them) of the environment's response to its actions.
The paper says that the expected utilities don't exist, if the utility function is unbounded and computable (or merely bounded below in absolute value by an unbounded computable function).
(Remark: It seems to me that this isn't necessarily fatal; if the agent cannot exactly repeat previous actions, and if it happens that the expected utility difference between two of the agent's actions exists, then it can still decide between actions. However, (1) the "cannot repeat previous actions" condition seems a bit artificial and (2) I'd guess that the arguments in the paper can be adjusted to show that expected utility differences are also divergent. But I could be wrong, and it would be interesting to know.)
So, anyway. How does time discounting fit into this? It seems to me that this is meant to model the immediate response of the environment to the agent's action; time doesn't come into it at all. And the conclusion is that even then -- even without considering the possible infinite future -- the relevant expectations don't exist.
The pathology described in the paper doesn't seem to me to have anything to do with not discounting in time. Turning the consequences of an action into an infinite stream rather than a single result might make things worse, but it can't possibly make them better.
Actually, that's not quite fair. Here's one way it could make them better. One way to avoid the divergence described in the paper is to have a bounded utility function. That seems troublesome to many people. But it's not so unreasonable to say that the utility you attach to what happens in any bounded region of spacetime should be bounded. So maybe there's some mileage to taking the bounded-utility case of this model (where the expectations all exist happily), then representing an agent's actual deliberations as involving (say) some kind of limit as the spacetime region gets larger, and hoping that lim { larger spacetime region } E { utility } converges even though E { lim { larger spacetime region} utility } doesn't. Which might be the case; I haven't thought about it carefully enough to have much idea how plausible that is.
In that scenario, you might well get saved by exponential time-discounting. Or even something weaker like 1/t^2. (Probably not 1/t, though.) But it seems to me that filling in the details is a big job; and I don't think it can possibly be right to assert that time discounting makes the result of the paper go away, without doing that work.
Hi, downvoter! If you happen not to be PhilGoetz (whose objections I already know from his reply), could you please let me know what you didn't like about what I wrote? Did I make a mistake or express something unclearly?
Thanks.