You are viewing revision 1.1.0, last edited by Eliezer Yudkowsky

Summary: A strongly uncontainable agent's best solution strategies often go through causal domains we can't model; we would not be able to see them as solutions in advance.

This is a definitional page. For related propositions, see below.

Definition.

Suppose somebody from the 10th century were asked how somebody from the 20th century might cool their house. While they would be able to understand the problem and offer some solutions, maybe even clever solutions ("Locate your house someplace with cooler weather", "divert water from the stream to flow through your living room") the 20th century's actual solution of "air conditioning" is not available to them as a solution, not just because they don't think fast enough or are clever enough, but because an air conditioner takes advantage of physical laws they don't know about. Even if they somehow randomly imagined an air conditioner's exact blueprint, they wouldn't expect that design to operate as an air conditioner until they were told about the relation of pressure to temperature, how electricity can power a compressor motor, and so on.

By definition, a strongly uncontainable agent can conceive strategies that go through causal domains you can't currently model, and it has options accessing those strategies; therefore it may execute high-value solutions such that you would not assign those solutions high expected value without being told further background facts.

At least in this sense, the 20th century is 'strongly uncontainable' relative to the 10th century on the problem of home cooling. We can solve the problem of how to cool homes using a strategy that would not be recognizable in advance to a 10th-century observer solution.

Arguably, most real-world problems, if we today addressed them using the full power of modern science and technology (i.e. we were willing to spend a lot of money on tech and maybe run a prediction market on the relevant facts) would have solutions that couldn't be verified in the 10th-century imagination of our search.

We can imagine a cognitively powerful agent being strongly uncontainable in some domains but not others. Since every cognitive agent is containable on formal games of tic-tac-toe, strong uncontainability cannot be a universal property of an agent across all formal and informal domains.

General arguments.

Arguments in favor of strong uncontainability tend to revolve around either:

  • The richness and partial unknownness of a particular domain. (E.g. human psychology seems very complicated and has a lot of unknown pathways and known exploits that seem very surprising; therefore we should expect strong uncontainability on the domain of human psychology.)
  • Outside-view induction on previous ability advantages derived from cognitive advantages. (The 10th century couldn't contain the 20th century even though all parties involved were biological Homo sapiens; what makes us think we're the first generation to have the real true laws of the universe in our minds?)

Arguments against strong uncontainability tend to revolve around:

  • The apparent knownness of a particular domain. (E.g., since we have observed the rules of chemistry with extreme solidity and know their origin in the underlying molecular dynamics, even an arbitrarily powerful agent should not be able to find a way of turning lead into gold using non-radioactive chemical reagents.)
  • Backward reasoning from the Fermi Paradox, which gives us weak evidence bounding the capabilities of the most powerful agents possible in our universe. (E.g., even though there might be surprises remaining in the question of how to standardly model physics, any surprise yielding Faster-Than-Light travel to a previously uncontacted point makes the Fermi Paradox harder to explain.)

Key propositions.

  • Is there some restriction of input-output channels and of other environmental interactions such that:
    • The richness of the 'human psychology' domain is averted;
    • Remaining causal interactions with the outside universe have an option set too small and flat to contain interesting options.
  • How solid is our current knowledge of the physical universe?
    • To what extent should we expect an advanced agency (e.g. machine superintelligences a million years later) to be boundable using our present physical understanding?
    • Can we reasonably rule out unknown physical domains being accessed by a boxed AI?
  • Are there useful domains conceptually closed to humans' internal understanding?
  • Will a machine superintelligence have 'power we know not' in the sense that it can't be explained to us even after we've seen it (except in the trivial sense that we could simulate another mind understanding it using external storage and Turing-like rules), as with a chimpanzee encountering an air conditioner?
  • What is the highest reasonable probability that could, under optimal conditions, be assigned to having genuinely contained an AI inside a box? Is it more like 20% or 80%?
Parents: