You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

orthonormal comments on An example of self-fulfilling spurious proofs in UDT - Less Wrong Discussion

20 Post author: cousin_it 25 March 2012 11:47AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (39)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 25 March 2012 12:12:25PM *  5 points [-]

More generally, what makes Q dangerous is that (1) it only settles for a spurious moral argument, doesn't accept natural ones, and (2) what it finds is taken seriously by the agent, acted out. As a result, provability of a spurious moral argument provably implies its truth, which by Loeb's theorem makes it true and forces the agent to be thus misled.

The only difference between Q and normal proof search procedures is that the normal procedures are OK with any proof, while Q dislikes natural proofs and ignores them. And this bit of "motivated skepticism" is sufficient to make the preferred spurious proofs come true, Q doesn't just loop without finding anything.

This is a whole new level of Oracle AI unsafety... Take what it says seriously, and it can argue you into doing anything at all. :-)

Comment author: orthonormal 25 March 2012 04:17:02PM *  1 point [-]

You mean Q instead of P, right? (Edit: Fixed.)

Comment author: Vladimir_Nesov 25 March 2012 04:27:15PM *  0 points [-]

Right, cousin_it changed some terminology from the post on the list, I didn't notice. (Fixed.)