Outside View(s) and MIRI's FAI Endgame

Wei Dai

21 Outside View(s) and MIRI's FAI Endgame

28th Aug 2013

2 min read

21

On the subject of how an FAI team can avoid accidentally creating a UFAI, Carl Shulman wrote:

If we condition on having all other variables optimized, I'd expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this "halt, melt, and catch fire") that cannot be shown to be safe (given limited human ability, incentives and bias).

In the history of philosophy, there have been many steps in the right direction, but virtually no significant problems have been fully solved, such that philosophers can agree that some proposed idea can be the last words on a given subject. An FAI design involves making many explicit or implicit philosophical assumptions, many of which may then become fixed forever as governing principles for a new reality. They'll end up being last words on their subjects, whether we like it or not. Given the history of philosophy and applying the outside view, how can an FAI team possibly reach "very high standards of proof" regarding the safety of a design? But if we can foresee that they can't, then what is the point of aiming for that predictable outcome now?

Until recently I haven't paid a lot of attention to the discussions here about inside view vs outside view, because the discussions have tended to focus on the applicability of these views to the problem of predicting intelligence explosion. It seemed obvious to me that outside views can't possibly rule out intelligence explosion scenarios, and even a small probability of a future intelligence explosion would justify a much higher than current level of investment in preparing for that possibility. But given that the inside vs outside view debate may also be relevant to the "FAI Endgame", I read up on Eliezer and Luke's most recent writings on the subject... and found them to be unobjectionable. Here's Eliezer:

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View.

Does anyone want to argue that Eliezer's criteria for using the outside view are wrong, or don't apply here?

And Luke:

One obvious solution is to use multiple reference classes, and weight them by how relevant you think they are to the phenomenon you're trying to predict.

[...]

Once you've combined a handful of models to arrive at a qualitative or quantitative judgment, you should still be able to "adjust" the judgment in some cases using an inside view.

These ideas seem harder to apply, so I'll ask for readers' help. What reference classes should we use here, in addition to past attempts to solve philosophical problems? What inside view adjustments could a future FAI team make, such that they might justifiably overcome (the most obvious-to-me) outside view's conclusion that they're very unlikely to be in the possession of complete and fully correct solutions to a diverse range of philosophical problems?

Machine Intelligence Research Institute (MIRI)Inside/Outside View

Personal Blog

21

New Comment

Rendering 0/60 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:23 PM

Moderation Log

21 Outside View(s) and MIRI's FAI Endgame

by Wei Dai

28th Aug 2013

2 min read

21

On the subject of how an FAI team can avoid accidentally creating a UFAI, Carl Shulman wrote:

If we condition on having all other variables optimized, I'd expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this "halt, melt, and catch fire") that cannot be shown to be safe (given limited human ability, incentives and bias).

On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View.

Does anyone want to argue that Eliezer's criteria for using the outside view are wrong, or don't apply here?

And Luke:

One obvious solution is to use multiple reference classes, and weight them by how relevant you think they are to the phenomenon you're trying to predict.

[...]

Once you've combined a handful of models to arrive at a qualitative or quantitative judgment, you should still be able to "adjust" the judgment in some cases using an inside view.

Machine Intelligence Research Institute (MIRI)Inside/Outside View

Personal Blog

21

Mentioned in

85Relitigating the Race to Build Friendly AI

New Comment

Rendering 0/60 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:23 PM

Moderation Log

More from Wei Dai

Curated and popular this week

60Comments

Comment Permalink

Wei Dai13y50

I'll just respond to part of your comment since I'm busy today. I'll respond to the rest later or when we meet.

I don't see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here?

Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said "value drift" I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize "complexity". Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.

I think there's an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don't particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.

ESRogs13y00

I'll respond to the rest later or when we meet.

Did you talk about this at the recent workshop? If you're willing to share publicly, I'd be curious about the outcome of this discussion.

5paulfchristiano13y

Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren't completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from "can only be prevented by building a singleton." To restate how the situation seems to me: you say "the problems are so hard that any attempt to solve them is obviously doomed," and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren't Eliezer. I don't understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don't see why either of you is so confident. To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse t

See in context