You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

drethelin comments on Open thread, September 2-8, 2013 - Less Wrong Discussion

0 Post author: David_Gerard 02 September 2013 02:07PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (376)

You are viewing a single comment's thread. Show more comments above.

Comment author: drethelin 09 September 2013 04:30:40AM 0 points [-]

you'd have to actively stop it from doing so. An AI is just code: If the AI has the ability to write code it has the ability to self modify.

Comment author: [deleted] 09 September 2013 04:42:47AM *  0 points [-]

An AI is just code: If the AI has the ability to write code it has the ability to self modify.

If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I'm proposing to get rid of. See my other comment.

Comment author: drethelin 09 September 2013 04:44:36AM 4 points [-]

If an AI can't modify its own code it can just write a new AI that can.

Comment author: Vaniver 09 September 2013 04:58:39PM 1 point [-]

If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify.

Unpack the word "itself."

(This is basically the same response as drethelin's, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn't alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)

Comment author: [deleted] 10 September 2013 03:27:58AM 0 points [-]

Well, I'm imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say "the AI can't modify itself", what I mean more specifically is "none of the options given to the decision subroutine will be something that involves changing the AI's code, or changing beliefs in unapproved ways".

So perhaps "the AI could write some code" (meaning that the thinking algorithm creates a piece of code inside the belief database), but "the AI can't replace parts of itself with that code" (meaning that the decision algorithm can't make a decision to alter any of the AI's subroutines or beliefs).

Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I'm assuming we're going to first make safe AI and then let it out of the box, rather than the other way around.