Edit: Added clarification of the limit in response to gwern's comment.
For recent examples, see this post by MileyCyrus, or this post from XiXiDu (where I reply with unbounded utility functions, which is not the general solution).
I encountered this issue again while reading through a fascinating discussion thread on John Baez's blog from earlier this year where Greg Egan jumped in with a "Yudkowsky/Bostrom" criticism:
The Yudkowsky/Bostrom strategy is to contrive probabilities for immensely unlikely scenarios, and adjust the figures until the expectation value for the benefits of working on — or donating to — their particular pet projects exceed the benefits of doing anything else. Combined with the appeal to vanity of “saving the universe”, some people apparently find this irresistible, but frankly, their attempt to prescribe what rational altruists should be doing with their time and money is just laughable, and it’s a shame you’ve given it so much air time.
In short, Egan is indirectly accusing SIAI and FHI of Pascal Mugging(among else): something serious indeed. Egan in particular presents the following (presumably Yudkowsky) quote as evidence:
Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.
Yudkowsky responds with his Pascal's Wager Fallacy Fallacy, and points out that in fact he agrees there is no case for investing in defense against highly improbable existential risks:
And I don’t think the odds of us being wiped out by badly done AI are small. I think they’re easily larger than 10%. And if you can carry a qualitative argument that the probability is under, say, 1%, then that means AI is probably the wrong use of marginal resources – not because global warming is more important, of course, but becauseother ignored existential risks like nanotech would be more important. I am not trying to play burden-of-proof tennis. If the chances are under 1%, that’s low enough, we’ll drop the AI business from consideration until everything more realistic has been handled.
The rest of the thread makes for an entertaining read, but the takeaway I'd like to focus on is the original source of Egan's criticism: the apparent domination of immensely unlikely scenarios of immensely high utility.
It occurred to me that the expected value of any action - properly summed over subsets of integrated futures - necessarily converges to zero as the probability of those considered subsets goes to zero. Critically this convergence occurs for *all* utility functions, as it is not dependent on any particular utility assignments. Alas LW is vast enough that there may be little new left under the sun: In researching this idea, I encountered an earlier form of it in a post by SilasBart here, as well as some earlier attempts by RichardKennaway, Komponisto, and jimrandomh.
Now that we've covered the background, I'll jump to the principle:
The Infinitesimal Probability Utility Convergence Principle (IPUP): For any action A, utility function U, and a subset of possible post-action futures F, EU(F) -> 0 as p(F) -> 0.
In Pascal's Mugging scenarios we are considering possible scenarios (futures) that have some low probability. It is important to remember that rational agents compute expected reward over all possible futures, not just the one scenario we may be focusing on.
The principle can be formalized in the theoretical context of perfect omniscience-approaching agents running on computers approaching infinite power.
The AIXI formalization provides a simple mathematical model of such agents. It's single line equation has a concise English summary:
If the environment is modeled by a deterministic program q, then the future perceptions ...okrk...omrm = U(q,a1..am) can be computed, where U is a universal (monotone Turing) machine executing q given a1..am. Since q is unknown, AIXI has to maximize its expected reward, i.e. average rk+...+rm over all possible future perceptions created by all possible environments q that are consistent with past perceptions. The simpler an environment, the higher is its a-priori contribution 2-l(q), where simplicity is measured by the length l of program q. AIXI effectively learns by eliminating Turing machines q once they become inconsistent with the progressing history. Since noisy environments are just mixtures of deterministic environments, they are automatically included.
AIXI is just a mathematical equation. We must be very careful in mapping it to abstract scenarios lest we lose much in translation. It is best viewed as a family of agent-models, the reward observations it seeks to maximize could be anything.
When one ponders: "What would AIXI/Omega do?" There are a couple of key points to keep in mind:
- AIXI like models (probably) simulate the entire complete infinitely branching multiverse from the beginning of time to infinity (as particular simulation programs). This is often lost in translation.
- AIXI like models compute 1 (the infinite totality of existence), not once, but for each of an infinite number of programs (corresponding to what we would call universal physics: theories of everything) in parallel. Thus AIXI computes (in parallel) the entire Tegmark multiverse: every possible universe that could exist in principle.
- AIXI 'learns' by eliminating sub-universes (and theories) that do not perfectly agree with it's observation history to date. Of course this is only ever a finite reduction, it never collapses the multiverse from an infinite set into a finite set.
- AIXI finally picks an action A that maximizes expected reward. It computes this measure by summing over, for each observation-valid universe (computed by a particular theory-program 1) in the multiverse ensemble (2), the total accumulated reward in the sub-universes branching off from that action, weighted by a scoring term for each valid universe that decreases with the negative exponent of the theory's program length.
In other words the perfectly rational agent considers everything that could possibly happen as a consequence of it's action in every possible universe it could be in, weighted by an exponential penalty against high-complexity universes.
Here is a sketch of how the limit convergence (IPUP above) can be derived: When considering a possible action A, such as giving $5 to a Pascal Mugger, an optimal agent considers all possible dependent futures for all possible physics-universes. As we advance into scenarios of infinitesimal probability, we are advancing up the complexity ladder into increasingly chaotic universes which feature completely random rewards which approach positive/negative infinity. As we advance into this regime of infinitesimal probability, causality itself breaks down completely and expected reward of any action goes to zero.
The convergence principle can be derived from the program length prior 2^-l(q). An agent which has accumulated P perception bits so far can fully explain those perceptions by completely random programs of length P, thus 2^-l(P) forms a probability limit at which the agent's perceptions start becoming irrelevant, and chaotic non-causal physics dominate. Chaos should dominate expected reward for actions where p(A) << 2^-l(P).
Thinking as a limited human, we impose abstractions and collapse all extremely similar (to us) futures. All the tiny random quantum-dependent variations of a particular future correspond to "giving the Mugger $5" we collapse into a single set of futures which we assign a probability to based on counting the subinstances in that set as a fraction of the whole.
AIXI does not do this: it actually computes each individual future path.
But as we can't hope to think that way, we have to think in terms of probability categorizations. Fine. Imagine collapsing any futures that are sufficiently indistinguishable such that humans would consider them identical: described by the same natural language. We then get subsets of futures which we assign probabilities as relative size measures.
Now consider ranking all of those future-sets in decreasing probability order. Most of the early list is dominated by Mugger is (joking/lying/crazy/etc). Farther down the list you get into scenarios where we do live in a multi-level Simulation (AIXI only ever considers itself in some simulation), but the Mugger is still (joking/lying/crazy/etc).
By the time you get down the list to scenarios described where the Mugger says "Or else I will use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people" and what the Mugger says actually happens, we are almost certainly down in infinitesimal probability land.
Infinitesimal probability land is a wierd place. It is a regime where the physics that we commonly accept is wrong - which is to say simply that the exponential complexity penalty no longer rules out ultra-complex universes. It is dominated by chaos: universes of every possible fancy, where nothing is as what it seems, where everything you possibly thought is completely wrong, where there is no causality, etc. etc.
At the complete limit of improbability, we just get universes where our entire observation history is completely random - generated by programs more complex than our observations. You give the mugger $5 and the universe simply dissolves in white noise and nothing happens (or god appears and gives you infinite heaven, or infinite hell, or the speed of light goes to zero, or a black hole forms near your nose, or the Mugger turns into jellybeans, etc. etc., an infinite number of stories, over which the net reward summation necessarily collapses to zero.)
Remember AIXI doesn't consider the mugger's words as 'evidence', they are simply observations. In the more complex universes they are completely devoid of meaning, as causality itself collapses.
There are number of utility terms in the AIXI equation. The utility function is evaluated for every hypothesis/program/universe forward evaluated for all future action paths, giving one best utility for just that universe, and the total expected utility is then the sum over all valid universes weighted by their complexity penalty.
By 'mean of the utility function', I meant the mean of the utility function over all possible universes rather than just valid universes. The validity constraint forces the expected utility to diverge from the mean of the utility function - it must for the agent to make any useful decisions!
So the total expected utility is not normally the mean utility, but it reduces to it in the case where the observation filter is removed.
My entire post concerns the subset of universes with probabilities approaching 1/infinity, corresponding to programs with length going to infinity. The high probability scenarios (shorter program universes) don't matter in mugger scenarios, we categorically assume they all have boring extremely low utilities (the mugger is jokin/lying/crazy).
In AIXI-models, hypothesis acceptance is not probabilistic, it is completely binary: a universe program either perfectly fits the observation history or it does not. If even 1 bit is off, the program is ignored.
It's unfortunate I started using N for program length in my prior post, that was a mistake, L was the term for program length in the EU equation. L (program length) matters because of the solomonoff prior complexity penalty: 2^-L.
This simply comes from the fact that an observation history O can at most filter out only a fraction of the space of programs that are longer than it.
For example, start with an empty observation history O: {}. Clearly, this filters nothing. The space of valid programs of length L, for any L, is simply all possible programs of length L, which is expected to be a set of around 2^L in size. The sum over all programs for L going to infinity is thus the space of everything, the full Tegmark. In this case, the expected utility is simply the mean of the utility function over the full Tegmark.
Now consider O:{1}. We have cut out exactly half of the program space. O:{11}, cuts out 3/4th of the tegmark, and in general an observation history with length(O) filters the universe space down to 2^-length(O) of it's previous size, removing 1 - 2^-length(O) possible universes - but there are an infinite number of total universes.
Now, let's say we are ONLY interested in the contribution of universes of a certain prior likelihood (corresponding to a certain program length). These are the subsets of the tegmark with programs P where length(P) = L for some L. This is a FINITE, enumerable set.
Then for JUST the subset of universes with length(P)=L, there are 2^L universes in this set. For an observation history O with length(O) > L, it is not guaranteed that there are any valid programs that match the observation history. It could be 1, could be 0.
However, for length(P) > length(O) + C, for some small C, valid programs are absolutely guaranteed. Specifically for some constant C there are programs which simply directly encode random strings which happen to align with O. This set of programs correspond to 'chaos'.
Now consider the limit behavior as complexity goes to infinity. For any fixed observation history with length(O), as length(P) goes to infinity, the chaos set grows at the maximum possible rate, with 2^length(P), and dominates (because the chaos programs just fill extra length with any random bits).
In particular, for observation set O and the subset of universes with length(P)=L, there are expected to be roughly 2^-(length(O)+C) * 2^L observationally valid chaos universes. This simplifies to 2^(L-length(O)-C) valid chaos universes.
So when length(O)+C > L, there are unlikely to be any valid chaos universes. So the expected utility over this subset, EU[L], will be averaged over a small number of universes, possibly even 1 (if there are any at all that match O), or none. But as L grows larger than length(O)+C, the chaos universes suddenly appear (guaranteed) and their number grow exponentially with L, and the expected utility over that exponentially growing set quickly converges to the mean of the utility function (because the chaos universes are random).
Assuming a utility function with positive/negative bounds normalized around zero, the convergence should be to zero.
Okay. In that case there are two reasons that mugger hypotheses are still important: the unupdated expected utility is not necessarily anywhere near the naive tail-less expected utility and that while the central limit theorem shows that updating based on observation... (read more)