You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Stuart_Armstrong comments on The Ultimate Testing Grounds - Less Wrong Discussion

6 Post author: Stuart_Armstrong 28 October 2015 05:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (14)

You are viewing a single comment's thread. Show more comments above.

Comment author: Luke_A_Somers 29 October 2015 11:57:38AM *  0 points [-]

On 'certain to fail'... what if it would have pursued plan X that requires only abilities it has, but only if it had ability Y that it doesn't have and you made it think it has, that comes up in a contingency that turns out not to arise?

Like for a human, "I'll ask so-and-so out, and if e says no, I'll leave myself a note and use my forgetfulness potion on both of us so things don't get awkward."

Only for a world-spanning AI, the parts of the contingency table that are realizable could involve wiping out humanity.

So we're going to need to test at the intentions level, or sandbox.

Comment author: Stuart_Armstrong 29 October 2015 04:39:41PM 0 points [-]

A better theory of counterfactuals - that can deal with events of zero probability - could help here.