Wei_Dai comments on Three Approaches to "Friendliness" - Less Wrong

14 Post author: Wei_Dai 17 July 2013 07:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (84)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 14 April 2015 03:20:28AM 1 point [-]

Is there some other reason to expect failure to be catastrophic?

I'm not pointing out any specific reasons, but just expect that in general, failures when dealing with large amounts of computing power can easily be catastrophic. You have theoretical arguments for why they won't be, given a specific design, but again I am skeptical of such arguments in general.

Comment author: paulfchristiano 14 April 2015 03:59:38PM *  1 point [-]

I agree there is some risk that cannot be removed with either theoretical arguments or empirical evidence. But why is it greater for this kind of AI than any other, and in particular than white-box metaphilosophical or normative AI?

Normative AI seems like by far the worst, since:

  1. it generally demonstrates a treacherous turn if you make an error,
  2. it must work correctly across a range of unanticipated environments

So in that case we have particular concrete reasons to think that emprical testing won't be adequate, in addition to the general concern that empirical testing and theoretical argument is never sufficient. To me, white box metaphilosophical AI seems somewhere in between.

(One complaint is that I just haven't given an especially strong theoretical argument. I agree with that, and I hope that whatever systems people actually use, they are backed by something more convicing. But the current state of the argument seems like it can't point in any direction other than in favor of black box designs, since we don't yet have any arguments at all that any other kind of system could work.)