paulfchristiano comments on What can you do with an Unfriendly AI? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (127)
Your comment made me re-read the post more carefully. I had on first reading assumed that a truthful answer was rewarded (whether yes or no) and a lying answer was punished. If a yes is rewarded and a no is punished, and our AI genies are so afraid of termination that they would never give a 'no' where they could give a 'yes', why wouldn't they all give us 'yes'?
Recall the filter between the AI and the world. The AI doesn't directly say "yes/no." The AI gives a proof to the filter which then says either "yes, the AI found a proof" or "no, the AI didn't." So they do all say 'yes' if they can. I will modify the post to be more clear.