Introduction: Poe, Shannon, and Deep Blue

In 1950, the very first paper ever written on computer chess, by Claude Shannon, gave the algorithm that would play perfect chess given unlimited computing power. In reality, computing power was limited, so computers did not play superhuman chess until 47 years after that. Knowing how to do something in principle doesn't always mean you can do it in practice without additional hard work and insights.

Nonetheless, if you don't know how to play chess using unlimited computing power, you definitely can't play chess using limited computing power. In 1830, Edgar Allen Poe, who was also an amateur magician, carefully argued that no automaton could ever play chess, since at each turn there are many possible moves, but machines can only make deterministic motions. Between Poe in 1830 and Shannon in 1950 there was a key increase in the understanding of computer chess, represented by the intermediate work of Turing, Church, and others - working with blackboards and mathematics rather than existing tabulating and calculating machines, since general computing machines had not yet been built.

If you ask any present researcher how to write a Python program that would be a nice AI if we could run it on a computer much larger than the universe, and they have the ability to notice when they are confused, they should notice the "stubbing the mind's toe" feeling of trying to write a program when we're confused about the nature of the computations we need to perform.

This doesn't necessarily prove that we can best proceed by grabbing our whiteboards and trying to figure out how to crisply model some aspects of the problem given unlimited computing power - try to directly tackle and reduce our own confusion. And if you do, there are certainly pitfalls to avoid. There's nonetheless some strong considerations pushing MIRI in the direction of trying to do unbounded analyses, which can be briefly summarized as follows:

If we don't know how to solve a problem even given unbounded computation, that means we're confused about the nature of the work to be performed. It is often worthwhile to tackle this basic conceptual confusion in a setup that exposes the confusing part of the problem as simply as possible, and doesn't introduce further and distracting complications until the core issue is resolved.

Introducing 'realistic' complications can make it difficult to engage in cooperative discourse about which ideas have which consequences - AIXI was a watershed moment because it was formally specified and there was no way to say "Oh, I didn't mean that" when somebody pointed out that AIXI would try to seize control of its own reward channel.

Increasing the intelligence of an advanced agent may sometimes move its behavior closer to ideals and further from specific complications of an algorithm. Early, stupid chess algorithms had what seemed to humans like weird idiosyncracies tied to their specific algorithms. Modern chess programs, far beyond the human champions, can from an intuitive human perspective be seen as just making good chess moves.

Current AI algorithms are often incapable of demonstrating future phenomena that seem like they should predictably occur later, and whose interesting properties seem like they can be described using an unbounded algorithm as an example. E.g. convergent instrumental strategies arise from consequentialist agents reasoning about domains like "My programmers" and "What happens if my shutdown button gets pressed", which is very far from something that naturally arises in current AI algorithms; but we can still talk about this using notions like economic agency and Utility indifference.

Much of the current technical work in value alignment theory takes place against a background of simplifying assumptions such as:

Unlimited finite computing power, or even hypercomputation.

Perfect knowledge of the environment.

Deterministic environments that are smaller than the agent and can be fully simulated.

Environments separated from the agent by an impermeable Cartesian boundary.

Turn-based discrete time (the environment waits for the agent to make a move, rather than proceeding in continuous real time).

Common knowledge of each other's code in multi-agent dilemmas.

Predefined action sets.

Similarly, in modern AI and especially in value alignment theory, there's a sharp divide between problems we know how to solve using unlimited computing power, and problems which are confusing enough that we can't even state the simple program that would solve them given a larger-than-the-universe computer. It is an alarming aspect of the current state of affairs that we know how to build a non-value-aligned hostile AI using unbounded computing power but not how to build a nice AI using unlimited computing power. The unbounded analysis program in value alignment theory centers on crossing this gap.

Besides the role of mathematization in producing conceptual understanding, it also helps build cooperative discourse. Marcus Hutter's AIXI marks not only the first time that somebody described a short Python program that would be a strong AI if run on a hypercomputer, but also the first time that anyone was able to point to a formal specification and say "And that is why this AI design would wipe out the human species and turn all matter within its reach into configurations of insignificant value" and the reply wasn't just "Oh, well, of course that's not what I meant" because the specification of AIXI was fully nailed down. While not every key issue in value alignment theory is formalizable, there's a strong drive toward formalism not just for conceptual clarity but also to sustain a cooperative process of building up collective debate and understanding of ideas.

LESSWRONG
Wikitags
LW

Methodology of unbounded analysis

Introduction: Poe, Shannon, and Deep Blue