Alexei comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread.

Comment author: Alexei 20 December 2010 09:35:13PM 2 points [-]

A standard trick reveals that knowing whether a problem has a solution is almost as helpful as knowing the solution. Here is a (very inefficient) way to use this ability

I figure the AI will be smart enough to recognize the strategy you are using. In that case, it can choose to not cooperate and output a malicious solution. If the different AIs you are running are similar enough, it's not improbable for all of them to come to the same conclusion. In fact, I feel like there is a sort of convergence for the most "unfriendly" output they can give. If that's the case, all UFAIs will give the same output.

Comment author: paulfchristiano 20 December 2010 09:40:04PM 3 points [-]

It can choose to not cooperate, but it will only do so if thats what it wants to do. The genie I have described wants to cooperate. An AGI of any of the forms I have described would want to cooperate. Now you can claim that I can't build an AGI with any easily controlled utility function at all, but this is very much a harder claim.