You are viewing revision 1.9.0, last edited by Eliezer Yudkowsky

Of course, I never wrote the “important” story, the sequel about the first amplified human. Once I tried something similar. John Campbell’s letter of rejection began: “Sorry—you can’t write this story. Neither can anyone else.”... “Bookworm, Run!” and its lesson were important to me. Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It’s a problem writers face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity—a place where extrapolation breaks down and new models must be applied—and the world will pass beyond our understanding. -- Vernor Vinge, True Names and other Dangers, p. 47.

Vingean unpredictability is a key part of how we think about a consequentialist intelligence which we believe is smarter than us in a domain. In particular, we usually think we can't predict exactly what a smarter-than-us agent will do, because if we could predict that, we would be that smart ourselves (Vinge's Law).

If you could predict exactly what action Deep Blue would take on a chessboard, you could play as well as Deep Blue by making whatever move you predicted Deep Blue would make. It follows that Deep Blue's programmers necessarily sacrificed their ability to intuitively predict Deep Blue's exact moves in advance, in the course of creating a superhuman chessplayer.

But this doesn't mean Deep Blue's programmers were confused about the criterion by which Deep Blue chose actions. Deep Blue's programmers still knew in advance that Deep Blue would try to win rather than lose chess games. They knew that Deep Blue would try to steer the chess board's future into a particular region that was high in Deep Blue's preference ordering over chess positions. This rejects the claim that we are epistemically helpless and can know nothing about beings smarter than ourselves. We can predict the consequences of Deep Blue's moves better than we can predict the moves themselves.

Our ability to think about agents smarter than ourselves is not limited to knowing a particular goal and predicting its achievement. If we found a giant alien machine that seemed very well-designed, we might be able to infer the aliens were superhumanly intelligent even if we didn't know the aliens' ultimate goals. Properties like distributing energy efficiently, elements that are hard and maintain their shape, etcetera, are recognizable as convergent instrumental strategies; if we can recognize that an alien machine is efficiently harvesting and distributing energy, we might recognize it as an intelligently designed artifact in the service of some goal even if we don't know the goal.

Noncontainment of belief within the action probabilities

A key aspect of reasoning under Vingean uncertainty is that, due to our lack of logical omniscience, our beliefs about the consequences of the agent's actions are not fully contained in our probability distribution over the agent's actions.

Suppose that on each turn of a chess game playing against Deep Blue, I ask you to put a probability distribution on Deep Blue's possible chess moves. If you are a rational agent you should be able to put a well-calibrated probability distribution on these moves - most trivially, by assigning every legal move an equal probability (if Deep Blue has 20 legal moves, and you assign each move 5% probability, you are guaranteed to be well-calibrated).

Now imagine a randomized game player RandomBlue that, on each round, draws randomly from the probability distribution you'd assign to Deep Blue's move from the same chess position. In every turn, your belief about where you'll observe RandomBlue move, is equivalent to your belief about where you'd see Deep Blue move. But your belief about the probable end of the game is very different. (This is only possible due to your lack of logical omniscience - you lack the computing resources to map out the complete sequence of expected moves, from your beliefs about each position.)

In particular, we could draw the following contrast between your reasoning about Deep Blue and your reasoning about RandomBlue:

  • When you see Deep Blue make a move to which you assigned a low probability, you think the rest of the game will go worse for you than you expected (that is, Deep Blue will do better than you previously expected).
  • When you see RandomBlue make a move that you assigned a low probability (i.e., a low probability that Deep Blue would make that move in that position), you expect to beat RandomBlue sooner than you previously expected (things will go worse for RandomBlue than your previous average expectation).

This reflects our belief in something like the instrumental efficiency of Deep Blue. When we estimate the probability that Deep Blue makes a move , we're estimating the probability that, as Deep Blue estimated each move 's expected probability of winning , Deep Blue found (neglecting the possibility of exact ties, which is unlikely with deep searches and floating-point position-value estimates). If Deep Blue picks instead of , we know that Deep Blue estimated and in particular that Deep Blue estimated . This could be because the expected worth of to Deep Blue was less than expected, but for low-probability move to be better than all other moves as well implies that had an unexpectedly high value relative to our own estimates. Thus, when Deep Blue makes a very unexpected move, we mostly expect that Deep Blue saw an unexpectedly good move that was better than what we thought was the best available move.

In contrast, when RandomBlue makes an unexpected move, we think the random number generator happened to land on a move that we justly assigned low worth, and hence we expect to defeat RandomBlue faster than we otherwise would have.

Features of Vingean reasoning

Some key features of reasoning under Vingean uncertainty are as follows:

  • We may find ourselves more confident of the predicted consequences of an action than of the predicted action.
  • We may be more sure about the agent's instrumental strategies than its goals.
  • Due to our lack of logical omniscience, our beliefs about the system's relation to the environment are not contained strictly within our probability distribution over the system's probable next outputs.
  • We update on the probable consequence of an action, and on the probable consequences of other actions not taken, after observing that the agent actually outputs that action.
  • If there is a compact way to describe the consequences of the agent's actions, we might try to infer that this is a goal of the agent and infer similar consequences in the future, even without being able to predict the agent's specific next actions.

Our expectation of Vingean unpredictability in a domain may break down if the domain is extremely simple and sufficiently closed.

Cognitive uncontainability

Vingean unpredictability is one of the core reasons to expect cognitive uncontainability in sufficiently intelligent agents.

Vingean reflection

Vingean unpredictability implies that when an agent is considering constructing a future version of itself, or an agent in the environment, its approval of that future version or subagent can't be conditioned on knowing that agent's exact policies or actions. Deep Blue's programmers decided to run Deep Blue, without knowing Deep Blue's exact moves against Kasparov, and having nonetheless having arrived at strong abstract beliefs about what Deep Blue was 'trying' to achieve, that Deep Blue would reason well about which actions would achieve that end, etcetera.

This is our general template for the notion of Vingean reflection: making predictions about the consequence of operating a system in an environment, while knowing only relatively abstract facts about the system. Formalizing this on a deeper level than "Oh, well, I guess the agent wants G and is a pretty good reasoner so it'll probably get G" is the goal of tiling agents theory.

MIRI suspects that Vingean reflection would form the basis for any agent approving self-modifications, or constructing improved successors of itself, in a robust and reliable way. In other words, understanding Vingean reflection is the key to constructing AIs that do robust self-modification.