You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

wedrifid comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 23 January 2013 04:05:26AM 0 points [-]

The original rules allow the AI to provide arbitrary proofs, which the gatekeeper must accept (no saying my cancer cure killed all the test subjects, etc.). Saying you destroy me would require the proof to be false, which is against the rules...

You have to believe that they provided the cure for cancer. You don't have to discover the cure yourself. You have to believe that you will release the AI. You don't have to let the AI out.

Typing AI DESTROYED will result in an incoherent counterfactual universe but it isn't a violation of the rules. It is entirely legitimate for Joe who has encountered a proof that they will do B to do A instead. It means that the universe they are in is nonsensical or the proof flawed but there isn't anything in the physical representation of Joe or his local environment that dictates that they will do A. In fact, typing "AI DESTROYED" in the face of such a proof would be a heck of a lot easier than, for example, taking a single empty box in Transparent Newcomb's problem, which is something I'd also do.

(Of course, if the AI player believed that for some reason the burden of making the universe coherent fell upon the gatekeeper then I'd have to revert to Dorikka's reply.)

Comment author: handoflixue 23 January 2013 09:30:35PM 2 points [-]

I would argue that since the gatekeeper cannot dictate counterfactual results for any other proof (i.e. cannot say "your cancer cure killed everybody!"), that the gatekeeper is obviously responsible for avoiding incoherent, counterfactual universes.

Dorikka's Clause, of course, beats me just fine :)