You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Adele_L comments on Goal retention discussion with Eliezer - Less Wrong Discussion

56 Post author: MaxTegmark 04 September 2014 10:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread. Show more comments above.

Comment author: Adele_L 05 September 2014 03:32:33AM 6 points [-]

It always seemed to me that this strategy had the fatal flaw that we would not be able to tell if the AI was really already superintelligent and was just playing dumb and telling us what we wanted to hear so that we would let it loose, or if the AI really was just learning.

In addition to that fatal flaw, it seems to me that the above quote suggests another fatal flaw to the "raising an AI" strategy—that there would be a limited time window in which the AI's utility function would still be malleable. It would appear that, as soon as part of the AI figures out how to do consequentialist reasoning about code, then its "critical period" in which we could still mould its utility function would be over. Is this the right way of thinking about this, or is this line of thought waaaay too amateurish?

This problem is essentially what MIRI has been calling corrigibility. A corrigible AI is one that understands and accepts that it or its utility function is not yet complete.