gwern comments on Evaluating the feasibility of SI's plan - Less Wrong

25 Post author: JoshuaFox 10 January 2013 08:17AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (186)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 11 January 2013 04:45:31PM 5 points [-]

That's not 'boxing'. Boxing is a human pitting their wits against a potentially hostile transhuman over a text channel and it is stupid.

That was how you did your boxing experiments, but I've never taken it to be so arbitrarily limited in goals, capacities, or strategies on either end. There is no reason you cannot put the AI in a box with some triggers for it venturing into dangerous territory, and this would be merely sane for anyone doing such a thing.

Comment author: Eliezer_Yudkowsky 11 January 2013 05:33:11PM 5 points [-]

Be specific? What sort of triggers, what sort of dangerous territory? I can't tell if you're still relying on a human to outwit a transhuman or talking about something entirely different.

Comment author: shminux 11 January 2013 05:31:23PM 2 points [-]

There is no reason you cannot put the AI in a box with some triggers for it venturing into dangerous territory

A trans-human intelligence ought to be able to model human one with ease. This means being able to predict potential triggers and being able to predict how to trick the lack-wit humans on the other end to unwittingly reveal the location of the triggers (even if they don't consciously know it themselves). So the only trigger that matters is one to detect a hint of an intent to get out. Even that is probably too naive, as there could well be other failure modes of which AI deboxing is but a side effect, and our limited human imagination will never going to catch them all. My expectation is that if you rely on safety triggers to bail you out (instead of including them as a desperate last-ditch pray-it-works defense), then you might as well not bother with boxing at all.

Comment author: gwern 11 January 2013 05:54:56PM 0 points [-]

Even that is probably too naive, as there could well be other failure modes of which AI deboxing is but a side effect, and our limited human imagination will never going to catch them all. My expectation is that if you rely on safety triggers to bail you out (instead of including them as a desperate last-ditch pray-it-works defense), then you might as well not bother with boxing at all.

My whole point of 'defense in depth' was that each layer was highly fallible and could have errors. Your expectation only holds if you expect failure to be perfectly correlated or multiple layers actually reduce the strength of layers, otherwise the probability of the AI beating layers A and B necessarily is less than beating just A or B (A ^B < A v B).

Comment author: shminux 11 January 2013 06:59:29PM 1 point [-]

Your expectation only holds if you expect failure to be perfectly correlated or multiple layers actually reduce the strength of layers, otherwise the probability of the AI beating layers A and B necessarily is less than beating just A or B (A ^B < A v B).

That's true. However I would expect a transhuman to be able to find a single point of failure which does not even occur to our limited minds, so this perfect correlation is a virtual certainty.

Comment author: gwern 11 January 2013 08:17:34PM *  2 points [-]

Now you're just ascribing magical powers to a potentially-transhuman AI. I'm sure there exists such a silver bullet, in fact by definition if security isn't 100%, that's just another way of saying there exists a strategy which will work; but that's ignoring the point about layers of security not being completely redundant with proofs and utility functions and decision theories, and adding some amount of safety.

Comment author: shminux 11 January 2013 08:21:29PM 2 points [-]

Disengaging.

Comment author: timtyler 13 January 2013 02:56:33AM *  0 points [-]

Boxing is a human pitting their wits against a potentially hostile transhuman over a text channel and it is stupid.

That was how you did your boxing experiments, but I've never taken it to be so arbitrarily limited in goals, capacities, or strategies on either end. There is no reason you cannot put the AI in a box with some triggers for it venturing into dangerous territory, and this would be merely sane for anyone doing such a thing.

That is how they build prisons. It is also how they construct test harnesses. It seems as though using machines to help with security is both obvious and prudent.