(This reminded me of a couple of arguments I've had in the past, in person. I think you are missing something. But I've previously failed to communicate this thing. I hope I'm not misinterpreting your point, and sorry if this comment comes across as frustrated at some points.)
For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."
Maybe such a gap is possible! It would be very interesting if so.
Such a gap is so common that I'm worried I've missed your point. There are ~always a range of algorithms that solve the same problem and have very different levels of efficiency. This is clearly true of algorithms that predict the next bit. You are correct that a malign hypothesis needs to be running an algorithm that is more efficient than the outer algorithm.
Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.
I think this is easy to imagine. I'm an expert who is among 10 experts recruited to advise some government on making a decision. I can guess some of the signals that the government will use to choose who among us to trust most. I can guess some of the relative weaknesses of fellow experts. I can try to use this to manipulate the government into taking my opinion more seriously. I don't need to create a clone government and hire 10 expert clones in order to do this.
If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.
It's true that if an induction algorithm is maximally compute efficient, then it shouldn't have daemon problems. Because there is no way for a daemon to do better prediction than alternative hypotheses. But... actual algorithms that we build aren't usually compute optimal, so there's a risk they will find more compute efficient algorithms internally. (I'm not sure I understood what you're saying here, so tell me if this is a non sequitur).
The argument is about the content of the "actual" UP, not the content of some computable approximation.
If the reasoning beings are considering -- and trying to influence -- some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP (whatever that means) for the influences upon it to "bubble up to" the UP itself.
You seem to be saying the behavior of a computable approximation is unrelated to the behavior of the idealization? Like, of course there are differences between reality and idealizations. Paul mentions this a few times in the original post, that the connection to real world algorithms and consequences isn't clear. But I think you're missing the whole point of doing theory work about idealized models. Theorizing about idealizations is easier (and possible). A common pattern for theorists is to work out the consequences of an idealized theory, then to apply this theory to reality, they try to approximately adjust for the most important differences between the theory and reality.
A good example is worst case runtime analysis. It's very useful for predicting real world runtime. In some situations, it's far too pessimistic (or optimistic!). But in those situations, there's always a reason, there's some factor that the worst case analysis isn't taking into account. And with some familiarity with the field, you get to know these factors and how and when we can correct for them when transferring your knowledge to the real world.
Back to induction, specifically this line:
some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP
The reasoning beings inside the hypothesis are trying to make their computable approximation as similar to the UP as possible. Sure, they might make mistakes, or be limited in some way by what algorithms are possible. But, in our lack of knowledge about the details, we don't have to get stuck in confusion about how exactly they might create an approximation. We can just assume they did a good job (as an idealization/approximation). This is the standard approximate way of predicting the consequences of competent agents. This is part of making an idealized theory.
If we later discover that all algorithms that are designed to predict the next bit in the real world have a particular property (i.e. are biased toward fast computations), then we can redo the malign prior theory in light of that knowledge. Maybe you think "biased toward fast computations" is obviously true of all induction algorithms? (it's definitely not true, consider the runtime of theories in physics).
The UP is malign idea is the idea of optimization daemons applied directly to Solomonoff inductors. (Note the line: "When heavy optimization pressure on a system crystallizes it into an optimizer—especially one that’s powerful, or more powerful than the previous system, or misaligned with the previous system—we could term the crystallized optimizer a “daemon” of the previous system").
If you wanted to see how optimization daemons could show up in more practical algorithms, you'll probably end up at RFLO (although there are other situations where optimization daemons can show up that don't quite fit the RFLO description).
I hope I'm not misinterpreting your point, and sorry if this comment comes across as frustrated at some points.
I'm not sure you're misinterpreting me per se, but there are some tacit premises in the background of my argument that you don't seem to hold. Rather than responding point-by-point, I'll just say some more stuff about where I'm coming from, and we'll see if it clarifies things.
You talk a lot about "idealized theories." These can of course be useful. But not all idealizations are created equal. You have to actually check that your idealization is good enough, in the right ways, for the sorts of things you're asking it to do.
In physics and applied mathematics, one often finds oneself considering a system that looks like
We quantify the size of the additional nuance with a small parameter . If is literally 0, that's just the base system, but we want to go a step further: we want to underst...
...Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.
I think this is easy to imagine. I'm an expert who is among 10 experts recruited to advise some government on making a decision. I can guess some of the signals that the government will use to choose who among us to trust most. I can guess some of the relative w
Fwiw I think this is basically correct, though I would phrase the critique as "the hypothetical is confused" rather than "the argument is wrong." My sense is that arguments for the malignity of uncomputable priors just really depend on the structure of the hypothetical: how is it that you actually have access to this uncomputable prior, if it's an approximation what sort of approximation is it, and to what extent will others care about influencing your decisions in situations where you're using it?
Here's my understanding of the whole thing:
This all seems to check out to me. Admittedly I didn't actually confirm this with any proponents of the argument, though.
(Probably also worth stating that I don't think the MUP is in any way relevant to real life. AI progress doesn't seem to be on the track where it features AGIs that use big dumb "where am I?" modules. E. g., if an AGI is born of anything like an RL-trained LLM, seems unlikely that its "where am I?" reasoning would be naive in the relevant sense. It'd be able to "manually" filter out universes with malign consequentialists, given good decision theory. You know, like we can.
The MUP specifically applies to highly abstract agent-foundations designs where we hand-code each piece, that currently don't seem practically tractable at all.)
Thanks.
I admit I'm not closely familiar with Tegmark's views, but I know he has considered two distinct things that might be called "the Level IV multiverse":
(I'm getting this from his paper here.)
In particular, Tegmark speculates that the computable universe is "distributed" following the UP (as you say in your final bullet point). This would mean e.g. that one shouldn't be too surprised to find oneself li...
Probably also worth stating that I don't think the MUP is in any way relevant to real life.
I think it's relevant because it illustrates an extreme variant of a very common problem, where "incorrectly specified" priors can cause unexpected behavior. It also illustrates the daemon problem, which I expect to be very relevant to real life.
A more realistic and straightforward example of the "incorrectly specified prior" problem: If the prior on an MCTS value head isn't strong enough, it can overfit to value local instrumental goals too highly. Now your overall ...
Here's a simple argument that simulating universes based on Turing machine number can give manipulated results.
Say we lived in a universe much like this one, except that:
So we send a rocket to the center of the universe and leave a plaque saying "the answer to all your questions is Spongebob". Now any aliens in other universes that simulate our universe and ask "what's in the center of that universe at time step 10^1000?" will see the plaque, search elsewhere in our universe for the reference, and watch Spongebob. We've managed to get aliens outside our universe to watch Spongebob.
I feel like it would be helpful to speak precisely about the universal prior. Here's my understanding.
It's a partial probability distribution over bit strings. It gives a non-zero probability to every bit string, but these probabilities add up to strictly less than 1. It's defined as follows:
That is, describe Turing machines by a binary code
, and assign each one a probability based on the length of its code, such that those probabilities add up to exactly 1. Then magically run all Turing machines "to completion". For those that halt leaving a bitstring
on their tape, attribute the probability of that Turing machine to that bitstring
. Now we have a probability distribution over bitstring
s, though the probabilities add up to less than one because not all of the Turing machines halted.
You cannot compute this probability distribution, but you can compute lower bounds on the probabilities of its bitstrings. (The Nth lower bound is the probability distribution you get from running the first N TMs for N steps.)
Call a TM that halts poisoned if its output is determined as follows:
If we approximate the universal prior, the probability contribution of poisoned TMs will be precisely zero, because we don't have nearly enough compute to simulate a poisoned TM until it halts. However, if there's an outer universe with dramatically more compute available, and it's approximating the universal prior using enough computational power to actually run the poisoned TMs, they'll effect the probability distribution of the bitstrings, making bitstrings with the messages they choose to leave behind more likely.
So I think Paul's right, actually (not what I expected when I started writing this). If you approximate the UP well enough, the distribution you see will have been manipulated.
The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.” Of course in practice different universes would use more or less accurate versions with more or less compromises for efficiency - I think your basic argument holds up insofar as there isn’t a clear mechanism for precise manipulation through the universal distribution. It’s conceivable that some high level actions such as “make it very clear that we prefer this set of moral standards in case anyone with cosmopolitan values simulates are universe” would be preferred based on the malign-universal prior argument.
The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.”
Thanks for bringing this up.
However, I'm skeptical that lower semi-computability really gets us much. While there is a TM that converges to the UP, we have no (computable) way of knowing how close the approximation is at any...
Instead of inspecting all programs in the UP, just inspect all programs with length less than n. As n becomes larger and larger, this covers more and more of the total probability mass in the up and the total probability mass covered this way approaches 1. What to do about the non-halting programs? Well, just run all the programs for m steps, I guess. I think this is the approximation of UP that is implied.
In order to be “UP-like” in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.
Why? The procedure just need to do some reasoning, constrained by UP and outer TM. And then UP-beings can just simulate this fast reasoning without problems of self-simulation.
Yes, AI that practically uses UP may fail to predict whether UP-beings simulate it in the center of their universe or on the boundary. But the point is that the more correct AI is in its reasoning, the more control UP-beings have.
Or you can not create AI that thinks about UP. But that's denying the assumption.
We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.
Did one of these can/can't pairs get flipped?
I have actually never properly understood the universal prior argument in the first place and just seeing this post made me able to understand parts of it now so thank you for writing it!
I'll admit, my mental image for "our universe + hypercomputation" is a sort of webnovel premise, where we're living in a normal computable universe until one day by fiat an app poofs into existence on your phone that lets you enter a binary string or file and instantaneously get the next bit with minimum description length in binary lambda calculus. Aside from the initial poofing and every usage of the app, the universe continues by its normal rules.
But there's probably simpler universes (by some handwavy standard) out there that allow enough hypercomputation that they can have agents querying minimum description length oracles, but not so much that agents querying MDL oracles can no longer be assigned short codes.
In a 2016 blog post, Paul Christiano argued that the universal prior (hereafter "UP") may be "malign." His argument has received a lot of follow-up discussion, e.g. in
among other posts.
This argument never made sense to me. The reason it doesn't make sense to me is pretty simple, but I haven't seen it mentioned explicitly in any of the ensuing discussion.
This leaves me feeling like either I am misunderstanding the argument in a pretty fundamental way, or that there is a problem with the argument that has gotten little attention from the argument's critics (in which case I don't understand why).
I would like to know which of these is the case, and correct my misunderstanding if it exists, hence this post.
(Note: In 2018 I wrote a comment on the original post where I tried to state one of my objections to my argument, though I don't feel I expressed myself especially well there.)
UP-using "universes" and simulatable "universes"
The argument for malignity involves reasoning beings, instantiated in Turing machines (TMs), which try to influence the content of the UP in order to affect other beings who are making decisions using the UP.
Famously, the UP is uncomputable.
This means the TMs (and reasoning beings inside the TMs) will not be able to use[1] the UP themselves, or simulate anyone else using the UP. At least not if we take "using the UP" in a strict and literal sense.
Thus, I am unsure how to interpret claims (which are common in presentations of the argument) about TMs "searching for universes where the UP is used" or the like.
For example, from Mark Xu's "The Solomonoff Prior is Malign":
Or, from Christiano's original post:
What exactly are these "universes" that are being searched over? We have two options:
Option 1 seems hard to square with the talk about TMs "searching for" universes or "simulating" universes. A TM can't do such things to the universes of option 1.
Hence, the argument is presumably about option 2.
That is, although we are trying to reason about the content of the UP itself, the TMs are not "searching over" or "simulating" or "reasoning about" the UP or things containing the UP. They are only doing these things to some other object, which has some (as-yet unspecified) connection to the UP, such as "approximating" the UP in some sense.
But now we face some challenges, which are never addressed in presentations of the argument:
If the reasoning beings are considering -- and trying to influence -- some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP (whatever that means) for the influences upon it to "bubble up to" the UP itself.
In other words, the TMs can affect the UP, but it doesn't seem like they have the resources to figure out what sorts of effects they prefer and disprefer. And on the other hand, there may be something for which they can do this preference reasoning, but we haven't established that they can affect that other thing.
Some thoughts that one might have
What sort of thing is this not-UP -- the thing that the TMs can simulate and search over?
I don't know; I have never seen any discussion of the topic, and haven't thought about it for very long. That said, here are a few seemingly obvious points about it.
On slowdown
Suppose that we have a TM, with a whole world inside it, and some reasoning beings inside that world.
These beings are aware of some computable, but vaguely "UP-like," reasoning procedure that they think is really great.
In order to be "UP-like" in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.
(This procedure needs to differ from the UP by using a computable weighting function for the the TMs. It should also be able to return results without having to wait for eternity as the non-halting TMs do their not-halting. The next section will say more about the latter condition.)
Now they want to search through computable universes (by simulation) to look for ones where the UP-esque procedure is being used.
What does it look like when they find one? At this point, we have
Each level of nesting incurs some slowdown relative to just running the "relevant" part of the thing that is being nested, because some irrelevant stuff has to come along for the ride.
It takes many many clock-ticks of the outer TM to advance the copy of it several levels down, because we have to spend a lot of time on irrelevant galaxies and on other TMs involved in the procedure.
(There is also a extra "constant factor" from the fact that we have to wait for the outer TM to evolve life, etc., before we get to the point where it starts containing a copy at all.)
So I don't see how the guys in the outer TM would be able to advance their simulation up to the point where something they can control is being "read off," without finding that in fact this read-off event occurred in their own distant past, and hence is no longer under their control.
To riff on this: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed, so it may put high weight on TMs that do very long-running things like simulating universes that simulate other universes.
Fine -- but once we start talking about a universe that is simulating itself (in order to reason about UP-like objects that involve it), speed starts to matter for a different reason. If you are simulating yourself, it is always with some slowdown, since you contain parts other than the simulator. You'll never be able to "catch up with yourself" and, e.g., read your own next action off of the simulation rather than choosing it in the ordinary manner.
It's possible that there are ways around this objection, even if it's valid in principle. For instance, maybe the reasoning beings can make inferences about the future behavior of the procedure-users, jumping ahead of the slow simulation.
It's easy to imagine how this might work for "finding the output channel," since you can just guess that a channel used once will be re-used again. But it would be much harder to decide what one's preferred output actually is at "future" points not yet reached in the simulation; here one would effectively need to do futurism about the world in which the procedure is being used, probably on an extremely long time horizon.
On efficiency
There are results showing that the UP (or Solomonoff Induction) are in some sense optimal. So it is easy to wind up thinking that, if some procedure is a good idea, it must be (in some sense) an "approximation of" these things.
But the kind of "approximation" involved does not look (in hand-wavey terms) like the ideal thing (UP or SI), plus some unbiased "approximation noise."
The ways that one would deviate from the ideal, when making a practically useful procedure, have certain properties that the ideal itself lacks. In the hand-wavey statistical analogy, the "noise" is not zero-mean.
I noted above that the "UP-like procedure" will need to use a computible weighting function. So, this function can't be Kolmogorov complexity.
And indeed, if one is designing a procedure for practical use, one probably wouldn't want anything like Kolmogorov complexity. All else being equal, one doesn't want to sit around for ages waiting for a TM to simulate a whole universe, even if that TM is "simple." One probably wants to prioritize TMs that can yield answers more quickly.
As noted above, in practice one never has an infinite amount of time to sit around waiting for TMs to (not) halt, so any method that returns results in finite time will have to involve some kind of effective penalty on long-running TMs.
But one may wish to be even more aggressive about speed than simply saying "I'm only willing to wait this long, ignore any TM that doesn't halt before then." One might want one's prior to actively prefer fast TMs over slow ones, even within the range of TMs fast enough that you're willing to wait for them. That way, if at any point you need to truncate the distribution and only look at the really high-mass TMs, the TMs you are spared from running due to the truncation are preferentially selected to be ones you don't want to run (because they're slow).
These points are not original, of course. Everyone talks about the speed prior.
But now, return to our reasoning beings in a TM, simulating a universe, which in turn uses a procedure that's great for practical purposes.
The fact that the procedure is "great for practical purposes" is crucial to the beings' motivation, here; they expect the procedure to actually get used in practice, in the world they're simulating. They expect this because they think it actually is a great idea -- for practical purposes -- and they expect the inner creatures of the simulation to notice this too.
Since the procedure is great for practical purposes, we should expect that it prioritizes efficiently computable TMs, like the speed prior does.
But this means that TMs like the "outer TM" in which our beings live -- which are simple (hence UP cares about them) but slow, having to simulate whole universes with irrelevant galaxies and all before they can get to the point -- are not what the "great for practical purposes" procedure cares about.
Once again: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed. This is true of the UP. But it is a count against using the UP, or anything like it, for practical purposes.
And so we should not expect the UP, or anything like it, to get used in practice by the kinds of entities we can simulate and reason about.
We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.
The common thread
It feels like there is a more general idea linking the two considerations above.
It's closely related to the idea I presented in When does rationality-as-search have nontrivial implications?.
Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.
If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.
As stated this seems not quite right, since you might have some approximate knowledge of the search that suffices for your control purposes, yet is "less powerful" than the search as a whole.
For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."
Maybe such a gap is possible! It would be very interesting if so.
But this question -- which seems like the question on which the whole thing turns -- is not addressed in any of the treatments I've seen of the malignity argument. Instead, these treatments speak casually of TMs "simulating universes" in which someone is "using" the UP, without addressing where in the picture we are to put the "slack" -- the use of merely-approximate reasoning -- that is necessary for the picture to describe something possible at all.
What am I missing?
For simplicity, I mostly avoid mentioning Solomonoff Induction in this post, and refer more broadly to "uses" of the UP, whatever these may be.