You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Luke_A_Somers comments on The Ultimate Testing Grounds - Less Wrong Discussion

6 Post author: Stuart_Armstrong 28 October 2015 05:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (14)

You are viewing a single comment's thread.

Comment author: Luke_A_Somers 29 October 2015 11:57:38AM *  0 points [-]

On 'certain to fail'... what if it would have pursued plan X that requires only abilities it has, but only if it had ability Y that it doesn't have and you made it think it has, that comes up in a contingency that turns out not to arise?

Like for a human, "I'll ask so-and-so out, and if e says no, I'll leave myself a note and use my forgetfulness potion on both of us so things don't get awkward."

Only for a world-spanning AI, the parts of the contingency table that are realizable could involve wiping out humanity.

So we're going to need to test at the intentions level, or sandbox.

Comment author: Dagon 29 October 2015 10:31:06PM *  0 points [-]

This is a good point. Theory of second-best implies that if you take an optimal recommendation and take away any of the conditions, there is no guarantee that the remaining components are optional without the missing one.

Comment author: Stuart_Armstrong 30 October 2015 08:37:27AM 0 points [-]

for links, you need to put the "[" brakets for the text, "(" for the links ;-)

Comment author: Dagon 30 October 2015 04:08:21PM 0 points [-]

Thanks, fixed.

Comment author: Stuart_Armstrong 29 October 2015 04:39:41PM 0 points [-]

A better theory of counterfactuals - that can deal with events of zero probability - could help here.