In a 2016 blog post, Paul Christiano argued that the universal prior (hereafter "UP") may be "malign." His argument has received a lot of follow-up discussion, e.g. in
- Mark Xu's The Solomonoff Prior is Malign
- Charlie Steiner's The Solomonoff prior is malign. It's not a big deal.
among other posts.
This argument never made sense to me. The reason it doesn't make sense to me is pretty simple, but I haven't seen it mentioned explicitly in any of the ensuing discussion.
This leaves me feeling like either I am misunderstanding the argument in a pretty fundamental way, or that there is a problem with the argument that has gotten little attention from the argument's critics (in which case I don't understand why).
I would like to know which of these is the case, and correct my misunderstanding if it exists, hence this post.
(Note: In 2018 I wrote a comment on the original post where I tried to state one of my objections to my argument, though I don't feel I expressed myself especially well there.)
UP-using "universes" and simulatable "universes"
The argument for malignity involves reasoning beings, instantiated in Turing machines (TMs), which try to influence the content of the UP in order to affect other beings who are making decisions using the UP.
Famously, the UP is uncomputable.
This means the TMs (and reasoning beings inside the TMs) will not be able to use[1] the UP themselves, or simulate anyone else using the UP. At least not if we take "using the UP" in a strict and literal sense.
Thus, I am unsure how to interpret claims (which are common in presentations of the argument) about TMs "searching for universes where the UP is used" or the like.
For example, from Mark Xu's "The Solomonoff Prior is Malign":
In particular, this suggests a good strategy for consequentialists: find a universe that is using a version of the Solomonoff prior that has a very short description of the particular universe the consequentialists find themselves in.
Or, from Christiano's original post:
So the first step is getting our foot in the door—having control over the parts of the universal prior that are being used to make important decisions.
This means looking across the universes we care about, and searching for spots within those universe where someone is using the universal prior to make important decisions. In particular, we want to find places where someone is using a version of the universal prior that puts a lot of mass on the particular universe that we are living in, because those are the places where we have the most leverage.
Then the strategy is to implement a distribution over all of those spots, weighted by something like their importance to us (times the fraction of mass they give to the particular universe we are in and the particular channel we are using). That is, we pick one of those spots at random and then read off our subjective distribution over the sequence of bits that will be observed at that spot (which is likely to involve running actual simulations).
What exactly are these "universes" that are being searched over? We have two options:
- They are not computable universes. They permit hypercomputation that can leverage the "actual" UP, in its full uncomputable glory, without approximation.
- They are computible universes. Thus the UP cannot be used in them. But maybe there is some computible thing that resembles or approximates the UP, and gets used in these universes.
Option 1 seems hard to square with the talk about TMs "searching for" universes or "simulating" universes. A TM can't do such things to the universes of option 1.
Hence, the argument is presumably about option 2.
That is, although we are trying to reason about the content of the UP itself, the TMs are not "searching over" or "simulating" or "reasoning about" the UP or things containing the UP. They are only doing these things to some other object, which has some (as-yet unspecified) connection to the UP, such as "approximating" the UP in some sense.
But now we face some challenges, which are never addressed in presentations of the argument:
- The argument is about the content of the "actual" UP, not the content of some computable approximation.
If the reasoning beings are considering -- and trying to influence -- some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP (whatever that means) for the influences upon it to "bubble up to" the UP itself.
- The behavior of the TMs obviously affects the UP. But it's not so obvious that the behavior of the TMs can affect the other, UP-related thing that the TMs able to simulate.
In other words, the TMs can affect the UP, but it doesn't seem like they have the resources to figure out what sorts of effects they prefer and disprefer. And on the other hand, there may be something for which they can do this preference reasoning, but we haven't established that they can affect that other thing.
Some thoughts that one might have
What sort of thing is this not-UP -- the thing that the TMs can simulate and search over?
I don't know; I have never seen any discussion of the topic, and haven't thought about it for very long. That said, here are a few seemingly obvious points about it.
On slowdown
Suppose that we have a TM, with a whole world inside it, and some reasoning beings inside that world.
These beings are aware of some computable, but vaguely "UP-like," reasoning procedure that they think is really great.
In order to be "UP-like" in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.
(This procedure needs to differ from the UP by using a computable weighting function for the the TMs. It should also be able to return results without having to wait for eternity as the non-halting TMs do their not-halting. The next section will say more about the latter condition.)
Now they want to search through computable universes (by simulation) to look for ones where the UP-esque procedure is being used.
What does it look like when they find one? At this point, we have
- A TM, which I'll call the "outer" TM, containing...
- ...a universe that includes our reasoning beings, and a bunch of irrelevant galaxies and stuff, along with...
- ...one special part that is simulating a second universe, which (the second universe) includes a bunch of irrelevant galaxies and stuff, along with...
- ...one special part that implements the UP-like procedure, and thus runs a bunch of TMs that aren't the same as the outer TM, along with...
- ...one special part that is simply the outer TM again (and from here on the whole thing repeats indefinitely, with more slowdown every time we go around the loop)
- ...one special part that implements the UP-like procedure, and thus runs a bunch of TMs that aren't the same as the outer TM, along with...
- ...one special part that is simulating a second universe, which (the second universe) includes a bunch of irrelevant galaxies and stuff, along with...
- ...a universe that includes our reasoning beings, and a bunch of irrelevant galaxies and stuff, along with...
Each level of nesting incurs some slowdown relative to just running the "relevant" part of the thing that is being nested, because some irrelevant stuff has to come along for the ride.
It takes many many clock-ticks of the outer TM to advance the copy of it several levels down, because we have to spend a lot of time on irrelevant galaxies and on other TMs involved in the procedure.
(There is also a extra "constant factor" from the fact that we have to wait for the outer TM to evolve life, etc., before we get to the point where it starts containing a copy at all.)
So I don't see how the guys in the outer TM would be able to advance their simulation up to the point where something they can control is being "read off," without finding that in fact this read-off event occurred in their own distant past, and hence is no longer under their control.
To riff on this: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed, so it may put high weight on TMs that do very long-running things like simulating universes that simulate other universes.
Fine -- but once we start talking about a universe that is simulating itself (in order to reason about UP-like objects that involve it), speed starts to matter for a different reason. If you are simulating yourself, it is always with some slowdown, since you contain parts other than the simulator. You'll never be able to "catch up with yourself" and, e.g., read your own next action off of the simulation rather than choosing it in the ordinary manner.
It's possible that there are ways around this objection, even if it's valid in principle. For instance, maybe the reasoning beings can make inferences about the future behavior of the procedure-users, jumping ahead of the slow simulation.
It's easy to imagine how this might work for "finding the output channel," since you can just guess that a channel used once will be re-used again. But it would be much harder to decide what one's preferred output actually is at "future" points not yet reached in the simulation; here one would effectively need to do futurism about the world in which the procedure is being used, probably on an extremely long time horizon.
On efficiency
There are results showing that the UP (or Solomonoff Induction) are in some sense optimal. So it is easy to wind up thinking that, if some procedure is a good idea, it must be (in some sense) an "approximation of" these things.
But the kind of "approximation" involved does not look (in hand-wavey terms) like the ideal thing (UP or SI), plus some unbiased "approximation noise."
The ways that one would deviate from the ideal, when making a practically useful procedure, have certain properties that the ideal itself lacks. In the hand-wavey statistical analogy, the "noise" is not zero-mean.
I noted above that the "UP-like procedure" will need to use a computible weighting function. So, this function can't be Kolmogorov complexity.
And indeed, if one is designing a procedure for practical use, one probably wouldn't want anything like Kolmogorov complexity. All else being equal, one doesn't want to sit around for ages waiting for a TM to simulate a whole universe, even if that TM is "simple." One probably wants to prioritize TMs that can yield answers more quickly.
As noted above, in practice one never has an infinite amount of time to sit around waiting for TMs to (not) halt, so any method that returns results in finite time will have to involve some kind of effective penalty on long-running TMs.
But one may wish to be even more aggressive about speed than simply saying "I'm only willing to wait this long, ignore any TM that doesn't halt before then." One might want one's prior to actively prefer fast TMs over slow ones, even within the range of TMs fast enough that you're willing to wait for them. That way, if at any point you need to truncate the distribution and only look at the really high-mass TMs, the TMs you are spared from running due to the truncation are preferentially selected to be ones you don't want to run (because they're slow).
These points are not original, of course. Everyone talks about the speed prior.
But now, return to our reasoning beings in a TM, simulating a universe, which in turn uses a procedure that's great for practical purposes.
The fact that the procedure is "great for practical purposes" is crucial to the beings' motivation, here; they expect the procedure to actually get used in practice, in the world they're simulating. They expect this because they think it actually is a great idea -- for practical purposes -- and they expect the inner creatures of the simulation to notice this too.
Since the procedure is great for practical purposes, we should expect that it prioritizes efficiently computable TMs, like the speed prior does.
But this means that TMs like the "outer TM" in which our beings live -- which are simple (hence UP cares about them) but slow, having to simulate whole universes with irrelevant galaxies and all before they can get to the point -- are not what the "great for practical purposes" procedure cares about.
Once again: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed. This is true of the UP. But it is a count against using the UP, or anything like it, for practical purposes.
And so we should not expect the UP, or anything like it, to get used in practice by the kinds of entities we can simulate and reason about.
We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.
The common thread
It feels like there is a more general idea linking the two considerations above.
It's closely related to the idea I presented in When does rationality-as-search have nontrivial implications?.
Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.
If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.
As stated this seems not quite right, since you might have some approximate knowledge of the search that suffices for your control purposes, yet is "less powerful" than the search as a whole.
For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."
Maybe such a gap is possible! It would be very interesting if so.
But this question -- which seems like the question on which the whole thing turns -- is not addressed in any of the treatments I've seen of the malignity argument. Instead, these treatments speak casually of TMs "simulating universes" in which someone is "using" the UP, without addressing where in the picture we are to put the "slack" -- the use of merely-approximate reasoning -- that is necessary for the picture to describe something possible at all.
What am I missing?
- ^
For simplicity, I mostly avoid mentioning Solomonoff Induction in this post, and refer more broadly to "uses" of the UP, whatever these may be.
Cool, it sounds we basically agree!
I'm not sure of this. It seems at least possible that we could get an equilibrium where everyone does use the unfiltered UP (in some part of their reasoning process), trusting that no one will manipulate them because (a) manipulative behavior is costly and (b) no one has any reason to expect anyone else will reason differently from them, so if you choose to manipulate someone else you're effectively choosing that someone else will manipulate you.
Perhaps I'm misunderstanding you. I'm imagining something like choosing one's one decision procedure in TDT, where one ends up choosing a procedure that involves "the unfiltered UP" somewhere, and which doesn't do manipulation. (If your procedure involved manipulation, so would your copy's procedure, and you would get manipulated; you don't want this, so you don't manipulate, nor does your copy.) But you write
whereas it seems to me that TDT/FDT-style reasoning is precisely what allows us to "naively" trust the UP, here, without having to do the hard work of "filtering." That is: this kind of reasoning tells us to behave so that the UP won't be malign; hence, the UP isn't malign; hence, we can "naively" trust it, as though it weren't malign (because it isn't).
More broadly, though -- we are now talking about something that I feel like I basically understand and basically agree with, and just arguing over the details, which is very much not the case with standard presentations of the malignity argument. So, thanks for that.