Introduction: Poe, Shannon, and Deep Blue

In 1950, the very first paper ever written on computer chess, by Claude Shannon, gave the algorithm that would play perfect chess given unlimited computing power. In reality, computing power was limited, so computers did not play superhuman chess until 47 years after that. Knowing how to do something in principle doesn't always mean you can do it in practice without additional hard work and insights.

Nonetheless, if you don't know how to play chess using unlimited computing power, you definitely can't play chess using limited computing power. In 1830, Edgar Allen Poe, who was also an amateur magician, carefully argued that no automaton could ever play chess, since at each turn there are many possible moves, but machines can only make deterministic motions. Between Poe in 1830 and Shannon in 1950 there was a key increase in the understanding of computer chess, represented by the intermediate work of Turing, Church, and others - working with blackboards and mathematics rather than existing tabulating and calculating machines, since general computing machines had not yet been built.

If you ask any present researcher how to write a Python program that would be a nice AI if we could run it on a computer much larger than the universe, and they have the ability to notice when they are confused, they should notice the "stubbing the mind's toe" feeling of trying to write a program when we're confused about the nature of the computations we need to perform.

This doesn't necessarily prove that we can best proceed by grabbing our whiteboards and trying to figure out how to crisply model some aspects of the problem given unlimited computing power - try to directly tackle and reduce our own confusion. And if you do, there are certainly pitfalls to avoid. There's nonetheless some strong considerations pushing MIRI in the direction of trying to do unbounded analyses, which can be briefly summarized as follows:

If we don't know how to solve a problem even given unbounded computation, that means we're confused about the nature of the work to be performed. It is often worthwhile to tackle this basic conceptual confusion in a setup that exposes the confusing part of the problem as simply as possible, and doesn't introduce further and distracting complications until the core issue is resolved.

Introducing 'realistic' complications can make it difficult to engage in cooperative discourse about which ideas have which consequences - AIXI was a watershed moment because it was formally specified and there was no way to say "Oh, I didn't mean that" when somebody pointed out that AIXI would try to seize control of its own reward channel.

Increasing the intelligence of an advanced agent may sometimes move its behavior closer to ideals and further from specific complications of an algorithm. Early, stupid chess algorithms had what seemed to humans like weird idiosyncracies tied to their specific algorithms. Modern chess programs, far beyond the human champions, can from an intuitive human perspective be seen as just making good chess moves.

Current AI algorithms are often incapable of demonstrating future phenomena that seem like they should predictably occur later, and whose interesting properties seem like they can be described using an unbounded algorithm as an example. E.g. convergent instrumental strategies arise from consequentialist agents reasoning about difficult domains like "My programmers have beliefs about whether I'll achieve their goals" and "What happens if they decide to press my shutdown button?", which is very far from something that naturally arises in current AI algorithms; but we can still talk about this using abstract notions like economic agency and Utility indifference. Similarly, if you're not allowed to do brute-force proof searches, then modern AI algorithms are very far from arriving at abstract beliefs about alternate versions of their own code.

Unbounded analysis in computer science

Much of the current technical work in value alignment theory takes place against a background of simplifying assumptions such as:

Unlimited finite computing power, or even hypercomputation.

Perfect knowledge of the environment.

Deterministic environments that are smaller than the agent and can be fully simulated.

Environments separated from the agent by an impermeable Cartesian boundary.

Turn-based discrete time (the environment waits for the agent to make a move, rather than proceeding in continuous real time).

Common knowledge of each other's code in multi-agent dilemmas.

Predefined action sets.

All of these properties simplify the scenario, at the expense of realism.

As background for the state of mind in modern computer science, it should be remembered that individual computer scientists may sometimes be tempted to exaggerate how much their algorithm solves or how realistic it is - while not everyone gives into that temptation, at least some computer scientists do so at least some of the time. This means that there are a number of 'unbounded' analyses floating around which don't solve the real-world problem, may not even shed very much light on the real-world problem, and whose authors praise them as nearly complete solutions but for some drudge-work of mere computation. A reasonable reaction is to be suspicious of unbounded analyses, and scrutinize them closely to make sure that they haven't assumed away the conceptual core of the problem - if you try to model image recognition and you assume that all the images are 2 pixels by 2 pixels, you haven't simplified the problem, you've eliminated it and shed no useful light for people trying to solve the realistic problem.

This is a real problem. At MIRI, we could indeed, if asked in private, point to several researchers in near-neighboring fields who we think exaggerated what their unbounded algorithms did - e.g., "finds the fastest provably correct algorithm after longer than the age of the universe" becomes "finds the fastest algorithm" in the researcher's informal description. Or when the entire interesting core question is what to do about events that cross the boundary between the AI's interior and exterior, like dropping an anvil on your own head, this event is designated using a particular Greek letter and it's assumed that its expected utility is already known. In this case, formalization hasn't shed light on the core conceptual confusion, but just swept that conceptual confusion under a rug.

Again, as a matter of cultural background in computer science, you can see some researchers trying to play up the loftier, more mathematical nature of conceptual analyses as showing off more intellectual power. And you can find a reaction to that particular game in the form of computer scientists who say that all the real work, all the useful work, is done with real computers and algorithms that run well today in the messy real world. This isn't entirely wrong either, and it's truer in some parts of computer science than others - we are well past the point where it makes sense to analyze a robotic car using anything resembling first-order logic; with good realistic algorithms that do all the key work, the time for unrealistic algorithms may well have passed.

If, on the other hand, you're in a part of computer science that's still living in the Edgar Allen Poe era of "Literally nobody knows how to write down a program that solves the problem even given unlimited computing power and every other possible simplifying assumption that doesn't eliminate the problem entirely*", then you are perhaps not in the sort of realm where the mathematicians have nothing left to do - or so MIRI would argue, at any rate. It would be even better if we had bounded solutions, to be sure, but to criticize confusion-reducing work on the grounds that it fails to solve the entire and realistic problem would be the nirvana fallacy when no more properly realistic solution as yet exists.

This doesn't mean that doing unbounded analysis will be pitfall-free, or that you can make any possible simplifying assumption without destroying the core of the problem. The whole point of the Vingean reflection program and tiling agents theory is that, for example, we should not assume that agents can exactly model their own successors or self-modifications - even if we say that we can assume unlimited computing power, we should still assume that the future version of the agent has more computing power than the past version (or has seen new sensory data) and therefore can't be predicted exactly in advance. The interesting problem is how to obtain abstract trust in an agent under conditions of Vingean uncertainty where we can't know exactly what a smarter agent will do because to know exactly what it would do we would need to be at least that smart ourselves. To make the 'unbounded' assumption that we could exactly predict what that agent will do, and judge exactly what consequence would follow, would assume away the central confusing core of the problem. But allowing for very large computers bigger than the universe, where the other agent has an even bigger computer, does not assume away this confusing core; or at least, so long as we don't know how to solve even this unbounded form of the problem, it's safe to say that this problem formulation exposes some key confusion.

Given the modern social landscape of computer science, it does happen that debates about appropriate levels of abstraction can take on the heated, personal nature of, say, Republican-Democratic national politics in the United States. For someone who has taken on the identity of opposing the folly of over-abstraction, there will be a tempting ad-hominem available to depict those airy folk who only write Greek letters and never try their ideas in running code, as being ignorant of the pitfalls of Greek letters, ignorant of the advantages of running code, believing that all realistic programs are mere drudge-work, etcetera. And while not everyone gives into the temptation of that ad-hominem, some people do, just like those who give into the temptation to exaggerate the power of unbounded analyses, and there is a well-known selection effect where debates can be dominated by the speaking volume of those who give in the most to such temptations.

Understanding the pitfalls of Greek letters and the benefits of running code is certainly part of being good at unbounded analysis and doing the right kind of unbounded analysis. It was great when we realized that PatrickLaVictoire's modal agents formalism could be run in quadratic time and that we could just write down the agents and run them. It's great when you can give a lecture and take audience suggestions for new modal agents during Q&A and compete them against each other as fast as you can type them. It's helpful to further theoretical development for all the obvious reasons: it's easier to make a mistake on a whiteboard than in a computer, and it can be just faster to try ideas as rapidly as you can type them into a prompt, gaining an inexpensive concrete intuition for the field.

That said, as fun as it was to do that with modal agents, and as much as we'd love to be able to do it for everything (except paperclip maximizers, we'd rather NOT have running code of those), there's a lot of places we can't do that yet. And even when life isn't that convenient, it's sometimes possible to make progress anyway. We'd never have gotten to the point of running code for modal agents in the first place, if we hadn't spent years before then pushing on the whiteboard stages of logical decision theory.

And this doesn't seem like a very unusual state of affairs for computer science as a whole, either. The historical reality is that there often is a lot of whiteboard scribbling that comes first - not always, but sometimes, and not nearly rarely enough for it to be unusual.

Even so, in technical work on value alignment, the relative proportion of whiteboard scribbling to running code is unusual for modern computer science. This doesn't happen randomly or because people who work on value alignment are afraid of computers (yes, there's researchers in the field who've worked with real-world machine learning code, etcetera); it happens because there's systematic reasons pushing in that direction.

Four of those reasons were briefly summarized above, and five of them will be considered at greater length below.

Reasons for the prevalence of unbounded analysis in value alignment theory

Unbounded analysis is often the most practical way to expose and solve conceptual confusions.

(writing in progress)

That said, if a version of Vingean reflection using unrealistic logical agents in unrealistic logical worlds does seem to expose a core problem that we don't know how to solve using unlimited computing power, it may make sense to ask, explicitly, "Okay, how would I solve the simplest version of this scenario that seems to expose the core problem?" rather than launching right into adding all the complications of probabilistic reasoning, realistic computational bounds, action in realtime, etcetera.

Similarly, in modern AI and especially in value alignment theory, there's a sharp divide between problems we know how to solve using unlimited computing power, and problems which are confusing enough that we can't even state the simple program that would solve them given a larger-than-the-universe computer. It is an alarming aspect of the current state of affairs that we know how to build a non-value-aligned hostile AI using unbounded computing power but not how to build a nice AI using unlimited computing power. The unbounded analysis program in value alignment theory centers on crossing this gap.

Besides the role of mathematization in producing conceptual understanding, it also helps build cooperative discourse. Marcus Hutter's AIXI marks not only the first time that somebody described a short Python program that would be a strong AI if run on a hypercomputer, but also the first time that anyone was able to point to a formal specification and say "And that is why this AI design would wipe out the human species and turn all matter within its reach into configurations of insignificant value" and the reply wasn't just "Oh, well, of course that's not what I meant" because the specification of AIXI was fully nailed down. While not every key issue in value alignment theory is formalizable, there's a strong drive toward formalism not just for conceptual clarity but also to sustain a cooperative process of building up collective debate and understanding of ideas.

A facile way of putting it would be that we're often trying to consider phenomena that just don't happen in weak modern AI algorithms, but seem like they should predictably happen with more advanced algorithms later, and we can try to model those future phenomena by introducing unlimited computing power. A more precise and still facile way of putting it would be that philanthropic work in this sector often consists in trying to identify problems that seem unlikely to be solved in the course of addressing immediate, commercial pressures to get today's AI running today. If you can exhibit a problem in running code of today, or if you expect it to appear in running code of 10 years later but still before AIs are smart enough to produce pivotal catastrophes, then it's likely to be the kind of problem that enormous well-funded commercial projects will solve incidentally in the course of pursuing their own immediate incentives. The dangerous problems are those that spring up for the first time with very smart AIs, or the kind of problem where early cheap patches fail in very smart AIs - they're dangerous precisely because immediate incentives don't force people to solve them unless they've used foresight and seen the bullet coming before any bullet hits. A lot of the pressure towards running code comes from the social pressure to deliver immediate benefits and the epistemic desirability of obtaining lots of definite observations; but where current running code seems sufficient to expose and solve a problem in value alignment theory, that's probably not where a philanthropic group should be focusing its own work.

- more detailed history of Shannon and Poe - (diagnosis) nirvana fallacy applied to unbounded analyses, past exaggerations leading to widespread disrespect - unbounded analysis makes claims precise enough to be critiqued and enables the discourse to progress - AIXI is the central example of an unbounded agent, and often also demarcates the boundary between 'straightforward' and 'confusing' problems in modern AI - bounded agent assumptions still hold in real life and should clearly be marked as being violated - in particular we still need to be wary of 'unbounded analyses' that avert the central problem - in some cases advanced agent properties will start to predictably or possibly cross some of those lines (a superintelligence is much more likely to find a strategy, even in a large search space) - you still don't get to assume that the agent can identify arbitrary nice things unless you know how to make a Python program run on a large-enough computer output those nice things - (this is what blocks the 'unbounded analysis' of "Oh, it's really smart, it'll know what we mean")

- why we need basic theoretical understanding - because it's hard to do FAI otherwise - because our FAI concepts need to anchor in basic things that might stay constant under self-modification, not particular programmatic tricks that will evaporate like snow in winter

LESSWRONG
Wikitags
LW

Methodology of unbounded analysis

Introduction: Poe, Shannon, and Deep Blue

Unbounded analysis in computer science

Reasons for the prevalence of unbounded analysis in value alignment theory

Unbounded analysis is often the most practical way to expose and solve conceptual confusions.