MIRI's technical research agenda

So8res

I'm pleased to announce the release of Aligning Superintelligence with Human Interests: A Technical Research Agenda written by Benja and I (with help and input from many, many others). This document summarizes and motivates MIRI's current technical research agenda.

I'm happy to answer questions about this document, but expect slow response times, as I'm travelling for the holidays. The introduction of the paper is included below. (See the paper for references.)

The characteristic that has enabled humanity to shape the world is not strength, not speed, but intelligence. Barring catastrophe, it seems clear that progress in AI will one day lead to the creation of agents meeting or exceeding human-level general intelligence, and this will likely lead to the eventual development of systems which are "superintelligent'' in the sense of being "smarter than the best human brains in practically every field" (Bostrom 2014). A superintelligent system could have an enormous impact upon humanity: just as human intelligence has allowed the development of tools and strategies that let humans control the environment to an unprecedented degree, a superintelligent system would likely be capable of developing tools and strategies that give it extraordinary power (Muehlhauser and Salamon 2012). In light of this potential, it is essential to use caution when developing artificially intelligent systems capable of attaining or creating superintelligence.

There is no reason to expect artificial agents to be driven by human motivations such as lust for power, but almost all goals can be better met with more resources (Omohundro 2008). This suggests that, by default, superintelligent agents would have incentives to acquire resources currently being used by humanity. (Can't we share? Likely not: there is no reason to expect artificial agents to be driven by human motivations such as fairness, compassion, or conservatism.) Thus, most goals would put the agent at odds with human interests, giving it incentives to deceive or manipulate its human operators and resist interventions designed to change or debug its behavior (Bostrom 2014, chap. 8).

Care must be taken to avoid constructing systems that exhibit this default behavior. In order to ensure that the development of smarter-than-human intelligence has a positive impact on humanity, we must meet three formidable challenges: How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?

This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future. Sections 2 through 4 motivate and discuss six research topics that we think are relevant to these challenges. Section 5 discusses our reasons for selecting these six areas in particular.

We call a smarter-than-human system that reliably pursues beneficial goals "aligned with human interests" or simply "aligned." To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also necessary to gain a solid theoretical understanding of why that confidence is justified. This technical agenda argues that there is foundational research approachable today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems.

Of the three challenges, the one giving rise to the largest number of currently tractable research questions is the challenge of finding an agent architecture that will reliably pursue the goals it is given—that is, an architecture which is alignable in the first place. This requires theoretical knowledge of how to design agents which reason well and behave as intended even in situations never envisioned by the programmers. The problem of highly reliable agent designs is discussed in Section 2.

The challenge of developing agent designs which are tolerant of human error has also yielded a number of tractable problems. We argue that smarter-than-human systems would by default have incentives to manipulate and deceive the human operators. Therefore, special care must be taken to develop agent architectures which avert these incentives and are otherwise tolerant of programmer error. This problem and some related open questions are discussed in Section 3.

Reliable, error-tolerant agent designs are only beneficial if they are aligned with human interests. The difficulty of concretely specifying what is meant by "beneficial behavior" implies a need for some way to construct agents that reliably learn what to value (Bostrom 2014, chap. 12). A solution to this "value learning'' problem is vital; attempts to start making progress are reviewed in Section 4.

Why these problems? Why now? Section 5 answers these questions and others. In short, the authors believe that there is theoretical research which can be done today that will make it easier to design aligned smarter-than-human systems in the future.

I'm happy to answer questions about this document, but expect slow response times, as I'm travelling for the holidays. The introduction of the paper is included below. (See the paper for references.)

You say you agree, and then talk about something entirely unrelated. I suspect I failed in communicating my point. That’s okay, let’s try again.

If you want a general intelligence running on unbounded computational hardware, that’s what AIXI is, and approximations thereof. But I hope I am not making a controversial statement if I claim that the behavior of AIXI is qualitatively different than human-like general intelligence. There is absolutely nothing about human-like intelligence I can think of which approximates brute-force search over the space of universal Turing machines. If you want to study human-like intelligence, then you will learn nothing of value by looking at AIXI.

For the purpose of this discussion I’m going to make broad, generalizing assumptions about how human-like intelligence works. Please forgive me and don’t nit the purposefully oversimplified details. General intelligence arises primarily out of the neocortex, which is a network of hundreds of millions of cortical columns, which can be thought of as learned heuristics. Both the behavior of columns and their connections change over time via some learning process.

Architectures like CogPrime similarly consist of networks of learned behaviors / heuristic programs connected in a probability graph model. Both the heuristics and the connections between them can be learned over time, as can the underlying rules (a difference from the biological model which allows the AI to change its architecture over time).

In these models of human-like thinking, the generality of general intelligence is not represented in any particular snapshot of its state. Rather, it is the fact that its behavior drifts over time, in sometimes directed and sometime undirected ways. The drift is the source of generality. The heuristics are learned ways of guiding that drift in productive directions, but (1) it is the update process that gives generality, not the guiding heuristics; (2) it is an inherently unconstrained process; and (3) the abstract function being approximated by the heuristic network over time is constantly changing.

One way of looking at this is to say that a human-like general intelligence is not a static machine able to solve any problem, but rather a dynamic machine able to solve only certain problems at any given point in time, but is able to drift within problem solving space in response to its percepts. And the space of specialized problem solvers is sufficiently connected that the human-like intelligence is able to move from its current state to become any other specialized problem solver in reasonable time, a process we call learning.

One of the stated research objectives of MIRI is learning how to build a “reliable” / steadfast agent. I’ll go out on a limb and say it: the above description of human intelligence, if true, is evidence that a steadfast human-like general intelligence is a contradiction of terms. This is what I mean by making the comparison to El Dorado: you are looking for something of which there is no a priori evidence of its existence.

Maybe there are other architectures for general problem solving which look nothing like the neocortex or integrative AGI designs like CogPrime. But so far the evidence is lacking...

If you want a general intelligence running on unbounded computational hardware, that’s what AIXI is

I disagree. AIXI does not in fact solve the problem. It leaves many questions (of logical uncertainty, counterfactual reasoning, naturalized induction, etc.) unanswered, even in the unbounded case. (These points are touched upon in the technical agenda, and will be expanded upon in one of the forthcoming papers mentioned.) My failure to communicate this is probably why my previous comment looked like a non-sequitur; sorry about that. I am indeed claiming t... (read more)

55

MIRI's technical research agenda

55

55

55

MIRI's technical research agenda

55

55