Edit:  Added clarification of the limit in response to gwern's comment.

Recently I've found several instances of this Pascal Mugging/Wager problem come up, and I'm somewhat surprised that 1.) a number of people apparently still consider it a real issue, and 2.) the rest don't rely on a standard goto defense.

For recent examples, see this post by MileyCyrus, or this post from XiXiDu (where I reply with unbounded utility functions, which is not the general solution).

I encountered this issue again while reading through a fascinating discussion thread on John Baez's blog from earlier this year where Greg Egan jumped in with a "Yudkowsky/Bostrom" criticism:

The Yudkowsky/Bostrom strategy is to contrive probabilities for immensely unlikely scenarios, and adjust the figures until the expectation value for the benefits of working on — or donating to — their particular pet projects exceed the benefits of doing anything else. Combined with the appeal to vanity of “saving the universe”, some people apparently find this irresistible, but frankly, their attempt to prescribe what rational altruists should be doing with their time and money is just laughable, and it’s a shame you’ve given it so much air time.

In short, Egan is indirectly accusing SIAI and FHI of Pascal Mugging(among else): something serious indeed.  Egan in particular presents the following (presumably Yudkowsky) quote as evidence:

Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.

Yudkowsky responds with his Pascal's Wager Fallacy Fallacy, and points out that in fact he agrees there is no case for investing in defense against highly improbable existential risks:  

And I don’t think the odds of us being wiped out by badly done AI are small. I think they’re easily larger than 10%. And if you can carry a qualitative argument that the probability is under, say, 1%, then that means AI is probably the wrong use of marginal resources – not because global warming is more important, of course, but becauseother ignored existential risks like nanotech would be more important. I am not trying to play burden-of-proof tennis. If the chances are under 1%, that’s low enough, we’ll drop the AI business from consideration until everything more realistic has been handled.

The rest of the thread makes for an entertaining read, but the takeaway I'd like to focus on is the original source of Egan's criticism: the apparent domination of immensely unlikely scenarios of immensely high utility.

It occurred to me that the expected value of any action - properly summed over subsets of integrated futures - necessarily converges to zero as the probability of those considered subsets goes to zero.  Critically this convergence occurs for *all* utility functions, as it is not dependent on any particular utility assignments.  Alas LW is vast enough that there may be little new left under the sun: In researching this idea, I encountered an earlier form of it in a post by SilasBart here, as well as some earlier attempts by RichardKennaway, Komponisto, and jimrandomh.

Now that we've covered the background, I'll jump to the principle:

The Infinitesimal Probability Utility Convergence Principle (IPUP):  For any action A, utility function U, and a subset of possible post-action futures F, EU(F) -> 0 as p(F) -> 0. 

In Pascal's Mugging scenarios we are considering possible scenarios (futures) that have some low probability.  It is important to remember that rational agents compute expected reward over all possible futures, not just the one scenario we may be focusing on.

The principle can be formalized in the theoretical context of perfect omniscience-approaching agents running on computers approaching infinite power.

The AIXI formalization provides a simple mathematical model of such agents.  It's single line equation has a concise English summary:

 If the environment is modeled by a deterministic program q, then the future perceptions ...okrk...omrm = U(q,a1..am) can be computed, where U is a universal (monotone Turing) machine executing q given a1..am. Since q is unknown, AIXI has to maximize its expected reward, i.e. average rk+...+rm over all possible future perceptions created by all possible environments q that are consistent with past perceptions. The simpler an environment, the higher is its a-priori contribution 2-l(q), where simplicity is measured by the length l of program qAIXI effectively learns by eliminating Turing machines q once they become inconsistent with the progressing history. Since noisy environments are just mixtures of deterministic environments, they are automatically included. 

AIXI is just a mathematical equation.  We must be very careful in mapping it to abstract scenarios lest we lose much in translation.  It is best viewed as a family of agent-models, the reward observations it seeks to maximize could be anything.

When one ponders: "What would AIXI/Omega do?"  There are a couple of key points to keep in mind:

  1. AIXI like models (probably) simulate the entire complete infinitely branching multiverse from the beginning of time to infinity (as particular simulation programs).  This is often lost in translation.
  2. AIXI like models compute 1 (the infinite totality of existence), not once, but for each of an infinite number of programs (corresponding to what we would call universal physics: theories of everything) in parallel.  Thus AIXI computes (in parallel) the entire Tegmark multiverse: every possible universe that could exist in principle.
  3. AIXI 'learns' by eliminating sub-universes (and theories) that do not perfectly agree with it's observation history to date.  Of course this is only ever a finite reduction, it never collapses the multiverse from an infinite set into a finite set.
  4. AIXI finally picks an action A that maximizes expected reward.  It computes this measure by summing over, for each observation-valid universe (computed by a particular theory-program 1) in the multiverse ensemble (2), the total accumulated reward in the sub-universes branching off from that action, weighted by a scoring term for each valid universe that decreases with the negative exponent of the theory's program length. 

 

In other words the perfectly rational agent considers everything that could possibly happen as a consequence of it's action in every possible universe it could be in, weighted by an exponential penalty against high-complexity universes.

Here is a sketch of how the limit convergence (IPUP above) can be derived:  When considering a possible action A, such as giving $5 to a Pascal Mugger, an optimal agent considers all possible dependent futures for all possible physics-universes.  As we advance into scenarios of infinitesimal probability, we are advancing up the complexity ladder into increasingly chaotic universes which feature completely random rewards which approach positive/negative infinity.  As we advance into this regime of infinitesimal probability, causality itself breaks down completely and expected reward of any action goes to zero.

The convergence principle can be derived from the program length prior 2^-l(q).  An agent which has accumulated P perception bits so far can fully explain those perceptions by completely random programs of length P, thus 2^-l(P) forms a probability limit at which the agent's perceptions start becoming irrelevant, and chaotic non-causal physics dominate.  Chaos should dominate expected reward for actions where p(A) <<  2^-l(P).

In the the vast vast majority of futures, the agent gives $5 to the Mugger and nothing much of extreme importance happens (the agent is lying or crazy, etc.)

Thinking as a limited human, we impose abstractions and collapse all extremely similar (to us) futures.  All the tiny random quantum-dependent variations of a particular future correspond to "giving the Mugger $5" we collapse into a single set of futures which we assign a probability to based on counting the subinstances in that set as a fraction of the whole.  

AIXI does not do this: it actually computes each individual future path.

But as we can't hope to think that way, we have to think in terms of probability categorizations.  Fine.  Imagine collapsing any futures that are sufficiently indistinguishable such that humans would consider them identical: described by the same natural language.  We then get subsets of futures which we assign probabilities as relative size measures.  

Now consider ranking all of those future-sets in decreasing probability order.  Most of the early list is dominated by Mugger is (joking/lying/crazy/etc).  Farther down the list you get into scenarios where we do live in a multi-level Simulation (AIXI only ever considers itself in some simulation), but the Mugger is still (joking/lying/crazy/etc).  

By the time you get down the list to scenarios described where the Mugger says "Or else I will use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people" and what the Mugger says actually happens, we are almost certainly down in infinitesimal probability land.

Infinitesimal probability land is a wierd place.  It is a regime where the physics that we commonly accept is wrong - which is to say simply that the exponential complexity penalty no longer rules out ultra-complex universes.  It is dominated by chaos: universes of every possible fancy, where nothing is as what it seems, where everything you possibly thought is completely wrong, where there is no causality, etc. etc.  

At the complete limit of improbability, we just get universes where our entire observation history is completely random - generated by programs more complex than our observations.  You give the mugger $5 and the universe simply dissolves in white noise and nothing happens (or god appears and gives you infinite heaven, or infinite hell, or the speed of light goes to zero, or a black hole forms near your nose, or the Mugger turns into jellybeans, etc. etc., an infinite number of stories, over which the net reward summation necessarily collapses to zero.)

Remember AIXI doesn't consider the mugger's words as 'evidence', they are simply observations.  In the more complex universes they are completely devoid of meaning, as causality itself collapses.


 


 


 

New to LessWrong?

New Comment
19 comments, sorted by Click to highlight new comments since: Today at 4:34 PM

That was very evocative. But you leave unanswered the basic point: does the probability dive fast enough to overcome the reward of the universes where you pay the mugger and he delivers?

Yes good point, I didn't consider how fast the solomonoff prior weight goes down in relation to reward, but I did mention that eventually everything is dominated by programs longer than your observation bit-sequence.

So perhaps that is a good route for tightening the zero limit. For example a 30yr old human has an observation history of perhaps 10^16 bits, which sets an upper bound on the zero limit at probabilities around 1/2^10^16. (not sure what that is in knuth notation?)

But again it has nothing to do with the mugger's words or claimed reward: causality breakdown occurs when the programs are longer than the observation history. At this limit the expected reward of any action is zero.

I'll edit the OP to include your point and how the zero limit depends on observation history.

nothing to do with the mugger's words or claimed reward

Yvain's rebuttal in one of the threads you linked to is pretty good, and can be cashed out in terms of programs simulating worlds and sampling the simulations to generate observations for AIXI. You will not get a complexity penalty in bits anywhere close to the size of your observation history.

Disclaimer: one shouldn't actually hand over money to a mugger in such cases. Human preferences can usually be better summed up in terms of bounded utility functions (which may assign [bounded] utility to infinite quantities of stuff), and there would be less implausible ways to convert $5 into vast utility than handing it over to such a character.

Yvain's argument doesn't hold water, it's just an intuition pump. The mugger is irrelevant and the threat is irrelevant, it's the tiny probabilities that matter. There is a minimum probability regime beyond which expected rewards goes to zero.

Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound? Do you think such an AI should give in to muggers?

Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound?

That phrasing suggests a certain structure, which I am suspicious of. For instance, we can have a bounded utility function which is a sum of terms reflecting different features of the world. We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever), another for average welfare in the multiverse according to some measure, and so forth.

Do you think such an AI should give in to muggers?

It's quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.

We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever)

So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?

It's quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.

I think you're not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?

So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?

A lottery with a chance of producing 10 happy people could sometimes outweigh a lottery with a payoff of infinite happy people, but in a direct comparison of certain payoffs producing infinite happy people includes producing 10 happy people.

I think you're not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?

That's not inconvenient enough. Feeding a starving African kid probabilistically displaces efforts by folk like Bill Gates or Giving What We Can enthusiasts to reducing existential risk and increasing our likelihood of surviving to exploit wacky physics. But I'll iron-man the dilemma to have infinite certainty that the African kid's life will have no other effects on anything, the incentive effects of encouraging people to perform Pascal's muggings are nil, etc. I'll assume that I'm a causal decision theorist (otherwise in a Big World choosing to save the child can mean that my infinite counterparts do so as well, for an infinite payoff).

After that iron-manning, my current best guess is that I would prefer the AI feed the kid, unless the Mugging was more credible than it is described as in, e.g. Nick Bostrom's write-up.

For recent examples, see this post by MileyCyrus, or this post from XiXiDu (where I reply with unbounded utility functions, which is not the general solution).

I can't parse the parenthetical comment.
Even clicking through, I'm not sure what you mean.

Your IUPU is trivial if the utility function is bounded. I don't see what role AIXI plays in this post, except to smuggle in the hypothesis of bounded utility function. And for you to whine that people aren't thinking clearly.

That you reject unbounded utility functions (from the XiXiDu comment) really belongs in this post. Many people don't. Peter de Blanc shows that unbounded utility functions have much worse problems, but I don't think that convinces many people to give them up.

I hadn't yet read de Blanc's paper about unbounded utility functions.

When writing the post I was considering a limit for all utility functions: bounded and unbounded. Reading his paper it now seems that was an obvious mistake, a limit should hold only for the bounded variety.

Peter de Blanc shows that unbounded utility functions have much worse problems, but I don't think that convinces many people to give them up.

What could then?

I just read through the linked paper by Peter De Blanc (thanks), but I don't see your point. The paper shows:

"After making some plausible assumptions (and maybe one not-so-plausible assumption), we show that if the utility function is unbounded, then the expected utility of any policy is undefined."

If De Blanc's conclusion is correct it only shows that unbounded utility functions result in potentially unbounded/undefined expected utilities. The limit I explore in the OP should still hold for bounded utility functions, regardless of how they are bounded.

I've run a bunch of simulations calculating expected utility over different distributions and the ones that seem most like reality, ie fat tails of utility that wouldn't converge if the utility function were unbounded, seem to be rather similar to the standard random walk, which is very hard to not think obvious in retrospect. If you have a mugger scenario where the high-utility possibilities are chaotic in the sense described in the post, +/-(typical utility magnitude)*P(typical crazy scenario)*sqrt(# of crazy scenarios) therefore seems like a qualitatively correct model. This is greater in absolute value than the naive model +/-(typical utility magnitude)*P(typical crazy scenario), so I don't think that this 'solves' the thought experiment. I can provide more mathematical details if anything is unclear.

I'm not following your math. Consider an agent on cycle 0, or an agent fed only pure white noise. It's observations don't limit the space of universes, so It's computed utilities will then be random samples from the utility function over the space of all programs, and should then converge to the mean of the utility function by the central limit theorem.

Simplified AIXI type expected utility for an action A and observation history O is something like: EU(A) = SUM[L=[0,+inf]] ( 2^-L PROGSUM(L, A, O)) PROGSUM(L, A, O) = SUM[P, length(P)==L] ( T(P(A),O) U(P(A)))

PROGSUM sums the utility for every program of length L. The T term is 1 for programs that match the observation history, 0 for those that don't. It filters out hypothesizes.

Mapping the concept of probability on to this, considering scenarios of low probability such as 2^-N, where N is very large in comparison to the length(O), is considering the subsets of the equation starting at programs of length N. For very large N in comparison to length(O), the T terms filters less and less of the hypothesis space, and the result converges again to an infinite random sampling of the utility function, and thus the mean of that function.

Specifically for N > length(O), T can at most filter, and thus contribute, 2^-(N - length(0)) of the hypothesis space, and it's contribution thus goes to zero as N grows.

it's computed utilities will then be random samples from the utility function over the space of all programs, and should then converge to the mean of the utility function by the central limit theorem.

Well the mean of the utility function is just the expected utility. The way I'm approaching this is to ask whether most of the expected utility comes from high probability events or low probability ones - does the distribution look basically the same if we cut off the tails? We are specifically referring to distributions where even some small events from the tails - the 'mugger hypotheses' - would make large contributions if they were considered individually. Then, if we model them as 'positive mugger hypotheses' and 'negative mugger hypotheses', the expected absolute value of the contribution from the tails just gets larger.

Simplified AIXI type expected utility for an action A and observation history O is something like [equation].

That is unnormalized, but otherwise accurate enough.

For very large N in comparison to length(O), the T terms filters less and less of the hypothesis space, and the result converges again to an infinite random sampling of the utility function, and thus the mean of that function.

Do you mean "less of the hypothesis space" in a relative or absolute sense? I don't think either would hold. Your observations have some probability P(T|N) to retain a hypothesis of length N. I don't see why this would depend that strongly on the value of N. In that case, the ratio of unfalsified to falsified hypotheses of length N would be about the same for all N. How did you get the number 2^-(N - length(0)) as a limit on the amount of the hypothesis space that is filtered (and do you mean retained by the filter or removed by the filter when you say 'filtered').

it's computed utilities will then be random samples from the utility function over the space of all programs, and should then converge to the mean of the utility function by the central limit theorem.

Well the mean of the utility function is just the expected utility.

There are number of utility terms in the AIXI equation. The utility function is evaluated for every hypothesis/program/universe forward evaluated for all future action paths, giving one best utility for just that universe, and the total expected utility is then the sum over all valid universes weighted by their complexity penalty.

By 'mean of the utility function', I meant the mean of the utility function over all possible universes rather than just valid universes. The validity constraint forces the expected utility to diverge from the mean of the utility function - it must for the agent to make any useful decisions!

So the total expected utility is not normally the mean utility, but it reduces to it in the case where the observation filter is removed.

The way I'm approaching this is to ask whether most of the expected utility comes from high probability events or low probability ones

My entire post concerns the subset of universes with probabilities approaching 1/infinity, corresponding to programs with length going to infinity. The high probability scenarios (shorter program universes) don't matter in mugger scenarios, we categorically assume they all have boring extremely low utilities (the mugger is jokin/lying/crazy).

Your observations have some probability P(T|N) to retain a hypothesis of length N. I don't see why this would depend that strongly on the value of N.

In AIXI-models, hypothesis acceptance is not probabilistic, it is completely binary: a universe program either perfectly fits the observation history or it does not. If even 1 bit is off, the program is ignored.

It's unfortunate I started using N for program length in my prior post, that was a mistake, L was the term for program length in the EU equation. L (program length) matters because of the solomonoff prior complexity penalty: 2^-L.

How did you get the number 2^-(L - length(O)) as a limit on the amount of the hypothesis space that is filtered (and do you mean retained by the filter or removed by the filter when you say 'filtered').

This simply comes from the fact that an observation history O can at most filter out only a fraction of the space of programs that are longer than it.

For example, start with an empty observation history O: {}. Clearly, this filters nothing. The space of valid programs of length L, for any L, is simply all possible programs of length L, which is expected to be a set of around 2^L in size. The sum over all programs for L going to infinity is thus the space of everything, the full Tegmark. In this case, the expected utility is simply the mean of the utility function over the full Tegmark.

Now consider O:{1}. We have cut out exactly half of the program space. O:{11}, cuts out 3/4th of the tegmark, and in general an observation history with length(O) filters the universe space down to 2^-length(O) of it's previous size, removing 1 - 2^-length(O) possible universes - but there are an infinite number of total universes.

Now, let's say we are ONLY interested in the contribution of universes of a certain prior likelihood (corresponding to a certain program length). These are the subsets of the tegmark with programs P where length(P) = L for some L. This is a FINITE, enumerable set.

Then for JUST the subset of universes with length(P)=L, there are 2^L universes in this set. For an observation history O with length(O) > L, it is not guaranteed that there are any valid programs that match the observation history. It could be 1, could be 0.

However, for length(P) > length(O) + C, for some small C, valid programs are absolutely guaranteed. Specifically for some constant C there are programs which simply directly encode random strings which happen to align with O. This set of programs correspond to 'chaos'.

Now consider the limit behavior as complexity goes to infinity. For any fixed observation history with length(O), as length(P) goes to infinity, the chaos set grows at the maximum possible rate, with 2^length(P), and dominates (because the chaos programs just fill extra length with any random bits).

In particular, for observation set O and the subset of universes with length(P)=L, there are expected to be roughly 2^-(length(O)+C) * 2^L observationally valid chaos universes. This simplifies to 2^(L-length(O)-C) valid chaos universes.

So when length(O)+C > L, there are unlikely to be any valid chaos universes. So the expected utility over this subset, EU[L], will be averaged over a small number of universes, possibly even 1 (if there are any at all that match O), or none. But as L grows larger than length(O)+C, the chaos universes suddenly appear (guaranteed) and their number grow exponentially with L, and the expected utility over that exponentially growing set quickly converges to the mean of the utility function (because the chaos universes are random).

Assuming a utility function with positive/negative bounds normalized around zero, the convergence should be to zero.

By 'mean of the utility function', I meant the mean of the utility function over all possible universes rather than just valid universes. The validity constraint forces the expected utility to diverge from the mean of the utility function - it must for the agent to make any useful decisions!

Okay. In that case there are two reasons that mugger hypotheses are still important: the unupdated expected utility is not necessarily anywhere near the naive tail-less expected utility and that while the central limit theorem shows that updating based on observations is unlikely to produce a shift in the utility of the tails that is large relative to the bounds on the utility function, it will still be large relative to the actual utility.

The way I'm approaching this is to ask whether most of the expected utility comes from high probability events or low probability ones

My entire post concerns the subset of universes with probabilities approaching 1/infinity, corresponding to programs with length going to infinity. The high probability scenarios (shorter program universes) don't matter in mugger scenarios, we categorically assume they all have boring extremely low utilities (the mugger is jokin/lying/crazy).

The utility of the likely scenarios is essential here. If we don't take into account the utility of $5, we have no obvious reason not to pay the mugger. The ratio of the utility differences of various action due to the likely hypotheses and due to the high-utility hypotheses is what is important.

Your observations have some probability P(T|N) to retain a hypothesis of length N. I don't see why this would depend that strongly on the value of N.

In AIXI-models, hypothesis acceptance is not probabilistic, it is completely binary: a universe program either perfectly fits the observation history or it does not. If even 1 bit is off, the program is ignored.

That is a probability (well really a frequency) taken over all hypotheses of length N (or L if you prefer).

It's unfortunate I started using N for program length in my prior post, that was a mistake, L was the term for program length in the EU equation. L (program length) matters because of the solomonoff prior complexity penalty: 2^-L.

The space of valid programs of length L, for any L, is simply all possible programs of length L, which is expected to be a set of around 2^L in size.

Well, an O(1) factor less, since otherwise our prior measure would diverge, but you don't have to write it explicitly; when working with Kolmogorov complexity, you expect everything to be within a constant factor.

Now consider O:{1}. We have cut out exactly half of the program space. O:{11}, cuts out 3/4th of the tegmark, and in general an observation history with length(O) filters the universe space down to 2^-length(O) of it's previous size, removing 1 - 2^-length(O) possible universes - but there are an infinite number of total universes.

No, not quite. Observations are not perfectly informative. If someone wanted to optimally communicate their observations, they would use such a system, but a real observation will not be perfectly optimized to rule out half the hypothesis space. We are reading bits from the output of the program, not its source code!

However, for length(P) > length(O) + C, for some small C, valid programs are absolutely guaranteed. Specifically for some constant C there are programs which simply directly encode random strings which happen to align with O. This set of programs correspond to 'chaos'.

I don't think this set behaves how you think it behaves. 1 - 2^-length(O) of this set will be ruled out, but there are more programs that have with more structure than "print this string" that don't get falsified, since they actually have enough structure to reproduce our observation (about K(O) bits) and they use the leftover bits to encode various unobservable things that might have high utility.

Looking at you conclusions, you can actually replace l(O) with K(O) and everything qualitatively survives.

The utility of the likely scenarios is essential here. If we don't take into account the utility of $5, we have no obvious reason not to pay the mugger.

No, not necessarily. It could be an arbitrarily small cost: the mugger could say just look at me for a nanosecond, and this tiny action of almost no cost could still not be worthwhile.

If AIXI can not find a full observation history O matching program P which generates a future we would describe as (mugger really does have matrix powers and causes massive negative reward) under the constraints that length(P) < length(O), then AIXI's expected utility decision for the mugger futures goes to zero . The length(P) < length(O) is a likelihood bound.

AIXI essentially stops considering theories beyond some upper improbability (much longer than it's observation history).

but a real observation will not be perfectly optimized to rule out half the hypothesis space.

For AIXI, each observation rules out exactly half of the hypothesis space, because it's hypothesis space is the entirety of everything.

there are more programs that have with more structure than "print this string" that don't get falsified, since they actually have enough structure to reproduce our observation (about K(O) bits) and they use the leftover bits to encode various unobservable things that might have high utility

No - this is a contradiction. The programs of K(O) bits are the first valid universes, and by the definition/mapping of the mugger problem to AIXI-logic, those correspond to the mundane worlds where the mugger is [joking,lying,crazy]. If the program is valid and it is K(O) bits, then the leftover bits can't matter - as you said yourself they are unobservable! And any unobservable bits are thus unavailable to the utility function.

Moreover, they are necessarily just repeats, if the program is K(O) bits, then it has appeared far earlier than length(O) in the ensemble, and is some mundane low utility universe.

I thought you were referring only to unbounded utility functions. If utility is bounded, your statement EU(F) -> 0 as p(F) -> 0 is true (though the meaning of the notation might be confusing; expectations are supposed to be taken over the entire probability space), but I don't think it gives reason to believe that everything will cancel out nicely, though it is possible that my intuition is distorted by thinking about busy-beaver size bounds, which are unlikely to be implemented in real agents anyways.