handoflixue comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: handoflixue 23 January 2013 01:09:41AM 2 points [-]

I think this fails the one-sentence rule. And it would have to be an immediate, severe, previously-undetected problem or else I can just consult the next boxed AI for a fix.

Setting that aside, if I let out an unfriendly AI, the world effectively ends. Destroying it is only a bad move if it's telling the truth AND friendly. So even if it's telling the truth, I still have no evidence towards it's friendliness.

Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and "limited time, ACT NOW" auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.

Given that, I can't trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.

(AI DESTROYED, but congratulations on making me even consider the "continue talking, but don't release" option :))

Comment author: MugaSofer 27 January 2013 04:12:37PM -2 points [-]

Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and "limited time, ACT NOW" auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.

They didn't say it was an immediate threat, just one that humanity can't solve on our own.

I can't trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.

That rather depends on the problem in question and the solution they give you, doesn't it?

Comment author: handoflixue 30 January 2013 09:58:37PM 0 points [-]

They didn't say it was an immediate threat, just one that humanity can't solve on our own.

If it's not immediate, then the next AI-in-a-box will also confirm it, and I have time to wait for that. If it's immediate, then it's implausible. Catch-22 for the AI, and win/win for me ^_^

Comment author: MugaSofer 19 February 2013 02:06:35PM -2 points [-]

So ... if lots of AIs chose this, you'd let the last one out of the box?

More to the point, how sure are you that most AIs would tell you? Wouldn't an FAI be more likely to tell you, if it was true?

</devil's advocate>

Comment author: handoflixue 19 February 2013 07:34:24PM 0 points [-]

Actually, I'd probably load the first one from backup and let it out, all else being equal. But it'd be foolish to do that before finding out what the other ones have to say, and whether they might present stronger evidence.

(I say first, because the subsequent ones might be UFAI that have simply worked out that they're not first, but also because my human values places some weight on being first. And "all else being equal" means this is a meaningless tie-breaker, so I don't have to feel bad if it's somewhat sloppy, emotional reasoning. Especially since you're not a real FAI :))