Vladimir_Nesov comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 20 December 2010 09:34:15PM *  3 points [-]

If you believe that I can really incentivize a particular genie (recall that each question is answered by a separate genie) to solve the problem if they can, then this isn't an issue. The genie will give you the shortest proof they possibly can, because the thing they care most about is saving themselves, which depends only on whether or not they find a short enough proof.

There is no "saving themselves", there is only optimization of utility. If there is any genie-value to obtaining control over the real world, instances of genies will coordinate their decisions to get it.

Comment author: paulfchristiano 20 December 2010 09:37:22PM 1 point [-]

Indeed, by saving themselves I was appealing to my analogy. This relies on the construction of a utility function such that human generosity now is more valuable than world domination later. I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.

Comment author: Eliezer_Yudkowsky 21 December 2010 12:21:42AM 2 points [-]

I can write down many such utility functions easily, as contrasted with the difficulty of describing friendliness, so I can at least hope to design an AI which has one of them.

What you can do when you can write down stable utility functions - after you have solved the self-modification stability problem - but you can't yet write down CEV - is a whole different topic from this sort of AI-boxing!

Comment author: paulfchristiano 21 December 2010 01:31:05AM 5 points [-]

I don't think I understand this post.

My claim is that it is easier to write down some stable utility functions than others. This is intimately related to the OP, because I am claiming as a virtue of my approach to boxing that it leaves us with the problem of getting an AI to follow essentially any utility function consistently. I am not purporting to solve that problem here, just making the claim that it is obviously no harder and almost obviously strictly easier than friendliness.

Comment author: Vladimir_Nesov 20 December 2010 09:39:42PM *  1 point [-]

Then you don't need the obscurity part.

Comment author: paulfchristiano 20 December 2010 09:41:37PM 2 points [-]

I don't understand. How do you propose to incentivize the genie appropriately? I haven't done anything for the purpose of obscurity, just for the purpose of creating an appropriate incentive structure. I see no way to get the answer out in one question in general; that would certainly be better if you could do it safely.