I've written before about the difficulty of distinguishing values from errors, from algorithms, and from context. Now I have to add to that list: How can we distinguish our utility function from the parameters we use to apply it?
In my recent discussion post, "Rationalists don't care about the future", I showed that exponential time-discounting, plus some assumptions about physics and knowledge, leads to not caring about the future. Many people responded by saying that, if I care about the future, this shows that my utility function does not use exponential time-discounting.
This response assumes that the shape of my time-discounting function is part of my utility function. In other words, the way you time-discount is one of your values.
By contrast, Eliezer wrote an earlier post saying that we should use human values, but without time-discounting. Eliezer is aware that humans appear to use time discounting. Therefore, this implicitly claims that the time-discounting function is not one of our values. It's a parameter for how we implement them.
(Some of the arguments Eliezer used were value-based arguments, suggesting that we can use our values to set the parameters that we use to implement our values... I suspect this recursive approach could introduce bogus solutions, like multiplying both sides of an equation by a variable, or worse; but that would take a longer post to address. I will note that some recursive equations do have unique solutions.)
The program of CEV assumes that a transhuman can use some extrapolated version of values currently used by some humans. If that transhuman has a life expectancy of a billion years, it will likely view time discounting differently. Eliezer's post against time discounting suggests, to me, a God-like view of the universe, in which we eliminate time discounting in the same way (and for the same reasons) that many people want to eliminate space-discounting (not caring about far-away people) in contemporary ethics. This is taking an ethical code that evolved agents have, which is constructed to promote the propagation of those agents' genes, and applying it without reference to any particular set of genes. This is also pretty much what folk-morality says a social moral code is. So the idea that you can apply the same utility function from a radically different context, is inherent in CEV, and is common to much public discourse on ethics which assumes that you can construct a social morality that is based on the morality we find in individual agents.
On the other hand, I have argued that assuming that social ethics and individual ethics are the same, is either merely sloppy thinking, or an evolved (or deliberately constructed) lie. People who believed this would probably subscribe to a social-contract theory of ethics. (This view also has problems, beyond the scope of this post.)
I have one heuristic that I think is pretty good for telling when something is not a value: If it's mathematically wrong, it's an error, not a value. So my inclination is to point out that exponential time-discounting is correct. All other forms of time-discounting lead to inconsistencies. You can time-discount exponentially; or you can not time-discount at all, as Eliezer suggested; or you can be in error.
But my purpose in this post is not to continue the arguments from that other post. It's to point out this additional challenge in isolating what values are. Is your time-discounting function a value, or a value parameter?
I can't agree with any of your caveats. (This is not the same as saying that I think everything in PdB's paper is correct. I haven't looked at it carefully enough to have an opinion on that point.)
"my second counterexample still looks solid to me ... It only applies if the number of ... is uncountable":
The function U in PdB's paper doesn't take integers as arguments, it takes infinite sequences of "perceptions". Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?
One way would be to use finite instead of infinite sequences. That corresponds to considering only a finite number of time-steps. (In that case, obviously temporal discounting isn't going to have any impact on the convergence of anything.)
Another possibility would be to use a utility function that's zero almost everywhere. Aside from the obvious remark that this seems unlikely to be a good model for real agents' (actual or idealized) utility functions, I think (but this is just pure intuition; I haven't tried to construct a proof or looked for refutations) that a utility function satisfying PdB's conditions about being bounded away from zero by a computable function is necessarily nonzero on an uncountable set of arguments.
"It is impossible to construct a probability distribution satisfying his requirements"":
I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn't insist that each individual possible environment have nonzero probability. What he does insist on is a nonzero probability for each possible computable environment (note: an environment is a function from agent action-sequences to agent perception-sequences), and there are only countably many of those.
"It doesn't prove that the expected utility isn't bounded."
I think it does, actually, although he hasn't said so explicitly. He has this noncomputable function B, and he uses the assertion that B(n) > rho(n) infinitely often when rho is computable. But then consider, say, 2^n rho(n), which is also computable; B(n) / rho(n) is then > 2^n infinitely often. Unless I'm confused (which I might be) I think this leads not only to the conclusion (which he states) that the absolute values of the terms in his series are >=1 infinitely often, but that you can find an infinite subsequence of the terms that grows faster than 2^n, or indeed faster than any computable function.
In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn't enable you to say things like "the expected utility is a divergent series but its value oscillates only between -13 and +56" unless there were One True Order of summation for the terms of the series. Which I can't see that there is or could be.
The domain of U must not only be countable; it must be finite. Remember, an agent needs to compute this sum once for every possible action! Both the number of possible worlds, and the num... (read more)