I've written before about the difficulty of distinguishing values from errors, from algorithms, and from context. Now I have to add to that list: How can we distinguish our utility function from the parameters we use to apply it?
In my recent discussion post, "Rationalists don't care about the future", I showed that exponential time-discounting, plus some assumptions about physics and knowledge, leads to not caring about the future. Many people responded by saying that, if I care about the future, this shows that my utility function does not use exponential time-discounting.
This response assumes that the shape of my time-discounting function is part of my utility function. In other words, the way you time-discount is one of your values.
By contrast, Eliezer wrote an earlier post saying that we should use human values, but without time-discounting. Eliezer is aware that humans appear to use time discounting. Therefore, this implicitly claims that the time-discounting function is not one of our values. It's a parameter for how we implement them.
(Some of the arguments Eliezer used were value-based arguments, suggesting that we can use our values to set the parameters that we use to implement our values... I suspect this recursive approach could introduce bogus solutions, like multiplying both sides of an equation by a variable, or worse; but that would take a longer post to address. I will note that some recursive equations do have unique solutions.)
The program of CEV assumes that a transhuman can use some extrapolated version of values currently used by some humans. If that transhuman has a life expectancy of a billion years, it will likely view time discounting differently. Eliezer's post against time discounting suggests, to me, a God-like view of the universe, in which we eliminate time discounting in the same way (and for the same reasons) that many people want to eliminate space-discounting (not caring about far-away people) in contemporary ethics. This is taking an ethical code that evolved agents have, which is constructed to promote the propagation of those agents' genes, and applying it without reference to any particular set of genes. This is also pretty much what folk-morality says a social moral code is. So the idea that you can apply the same utility function from a radically different context, is inherent in CEV, and is common to much public discourse on ethics which assumes that you can construct a social morality that is based on the morality we find in individual agents.
On the other hand, I have argued that assuming that social ethics and individual ethics are the same, is either merely sloppy thinking, or an evolved (or deliberately constructed) lie. People who believed this would probably subscribe to a social-contract theory of ethics. (This view also has problems, beyond the scope of this post.)
I have one heuristic that I think is pretty good for telling when something is not a value: If it's mathematically wrong, it's an error, not a value. So my inclination is to point out that exponential time-discounting is correct. All other forms of time-discounting lead to inconsistencies. You can time-discount exponentially; or you can not time-discount at all, as Eliezer suggested; or you can be in error.
But my purpose in this post is not to continue the arguments from that other post. It's to point out this additional challenge in isolating what values are. Is your time-discounting function a value, or a value parameter?
Hypothesis space is not mysterious; the paper is only about computing the expected value of all of a possible set of infinite series. Look on page 4, at Theorem 1:
If U is completely unbounded from above on D, then E(U(x1..n psi(Q,p,y1..n, x1..n)) | gamma_Q(y1..n) = x1..n) is either undefined or positive infinity.
E is the expected value of the utility function U. x1..n psi(...) is a sequence of values of x ranging from one to infinity, where each x_t is a perception at time t. Gamma is what the agent has perceived of the past. U(x1..infinity | agent's perception up to time n)), computes the sum of utility U(x1..t) as t goes to infinity for one particular infinite series. E(U) is the expected value over all possible such series.
I assume this means the expected value of the sum, rather than the expected point the series converges on. The latter would mean our decision-making agent cared only about the infinite future, and not at all about the present. Making it a sum means the agent does no time-discounting.
The reason this value is unbounded is because each U(xt) is unbounded. If we use the term "unbounded" to mean "completely free to vary in any way", then the theorem's results would hold regardless of time-discounting. But the paper does not use the term "unbounded" in that way - and it should not. It uses the term "computably unbounded from above" to mean (Definition 1),
Two observations about this definition:
Point A is strange, and I consider it a bug in the paper. It isn't just saying that all sequences of events have "unbounded" utility; it requires, by definition, that U(d=x1..t) goes to positive infinity as t goes to infinity for all possible sequences of events. In other words, our utility always increases to infinity no matter what we do. This is a problem both because it rules out any cases where we care what our agent does, and also because it assumes something at least as strong as the result the entire paper is trying to prove.
If we correct this, to read that for every s in X^p there exists some d in X^N such that U(s) <= U(d) (or if we don't, but let's fix it anyway), then we can construct a sequence that is "unbounded" according to the definition, but that is exponentially bounded, meaning that we can find some n, p such that U(x1..n) = c^n, and for all t > n, U(x1..t) < c^t. Given that all possible sequences U(x1), U(x2), ... conform to this - and I expect that any real-world sequence of environments and utilities can be made to conform to that requirement - then, with exponential time-discounting with a base larger than c, the sum of the infinite series converges. That us, E(U(x1) + U(x2)/c + U(x3)/c^2 + ...) converges.
I would rather have seen the paper address the more interesting and relevant question of under what conditions the expectation of a bounded utility function can converge or diverge (if you don't do time-discounting, as in the paper). Even the expected utility of a utility function bounded between -1 and 1 can diverge when you're computing the sum - and you should be computing either the sum, or a time-discounted sum; no other computations are relevant to decision theory.
You appear to be describing a different paper, probably this one by the same author. The paper cited by endoself doesn't have a page 4 (its pages are numbered 0..3), has no theorem on its page 3, doesn't use the notation you describe, doesn't contain a Definition 1, and in fact seems to have little in common with whatever it is you're talking about.