Eliezer_Yudkowsky comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 21 December 2010 12:21:42AM 2 points [-]

I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.

What you can do when you can write down stable utility functions - after you have solved the self-modification stability problem - but you can't yet write down CEV - is a whole different topic from this sort of AI-boxing!

Comment author: paulfchristiano 21 December 2010 01:31:05AM 5 points [-]

I don't think I understand this post.

My claim is that it is easier to write down some stable utility functions than others. This is intimately related to the OP, because I am claiming as a virtue of my approach to boxing that it leaves us with the problem of getting an AI to follow essentially any utility function consistently. I am not purporting to solve that problem here, just making the claim that it is obviously no harder and almost obviously strictly easier than friendliness.