pjeby comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: pjeby 21 December 2010 02:28:23PM 2 points [-]

"ask AI copy 1 for permission for all future actions. Never modify AI copy 1's behavior."

I don't think you've noticed that this is just moving the fundamental problem to a different place. For example, you haven't specified things like:

  • Don't lie to AI 1 about your actions

  • Don't persuade AI 1 to modify itself

  • Don't find loopholes in the definition of "AI 1" or "modify"

etc., etc. If you could enforce all these things over superintelligent self-modification, you'd already have solved the general FAI problem.

IOW, what you propose isn't actually a reduction of anything, AFAICT.

Comment author: DanArmak 21 December 2010 03:47:27PM 0 points [-]

I noticed this but didn't explicitly point it out. My point was that when paulfchristiano said:

If the AI has a simple goal---press the button---then I think it is materially easier for the AI to modify itself while preserving the button-pressing goal [...] the problem is difficult, but I don't think it is in the same league as friendliness

He was also assuming that he could handle your objections, e.g. that his AI wouldn't find a loophole in the definition of "pressing a button". So the problem he described was not, in fact, simpler than the general problem of FAI.