Jack comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: Jack 21 December 2010 01:18:26AM 0 points [-]

As I understand it this method is designed to work for constraint satisfaction problems -where we can easily detect false positives. You're right that a possibility is that all the genies that can't find solutions go on strike just to make us check all the yes's (which would make this process no better than a brute force search, right?), maybe there needs to be a second punishment that is worse than death to give them an incentive not to lie.

Comment author: paulfchristiano 21 December 2010 01:46:41AM 3 points [-]

A genie who can't find a solution has literally no agency. There is nothing he can say to the filter which will cause it to say "yes," because the filter itself checks to see if the genie has given a proof. If the genie can't find a proof, the filter will always say "no." I don't quite know what going on strike would entail, but certainly if all of the genies who can't find solutions collectively have 0 influence in the world, we don't care if they strike.

Comment author: Jack 21 December 2010 01:52:00AM 0 points [-]

Okay, that makes sense. What about computation time limits? A genie that knows it can't give an answer would wait as long as possible before saying anything.

Comment author: paulfchristiano 21 December 2010 01:54:06AM 1 point [-]

I mention timing in the post; the AI gets some fixed interval, at the end of which the filter outputs whether or not they have a proof. If you can't change what the filter says, then you don't get to affect the world.