marchdown comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (354)
I'm not so sure that AI suggesting murder is clear evidence of it being unfriendly. After all, it can have a good reason to believe that if it doesn't stop a certain researcher ASAP and at all costs, then humanity is doomed. One way around that is to give infinite positive value to human life, but can you really expect CEV to be handicapped in such a manner?
p(UFAI) > p(Imminent, undetected catastrophe that only a FAI can stop)
Given UFAI results in "human extinction", and my CEV assigns effectively infinite DISutility to that outcome, it would have to FIRST provide sufficient evidence for me to update to the catastrophe being more likely.
I've already demonstrated that an AI which can do exactly that will get more leniency from me :)