mwengler comments on AI Box Log - Less Wrong

16 Post author: Dorikka 27 January 2012 04:47AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread.

Comment author: mwengler 27 January 2012 05:17:03AM 9 points [-]

How could talking to an AI in a box ever be sufficient to determine if it is friendly? Couldn't the AI just build a friendly AI, hide behind it, let it do all the talking while it was in the box and then when it got out of the box, pull the plug on the Friendly AI? That would only take two or three times the capacity of a single friendly or unfriendly AI, so with exponential growth its not a long wait to get that.

Comment author: moridinamael 27 January 2012 05:56:49AM 10 points [-]

It's worse than that. The AI could say, "Look, here is a proof of FAI. Here is my code showing that I have implemented the friendliness modification." The proof and the code are utterly convincing, except erroneous in a subtle way that the gatekeeper is not smart enough to detect. Game over.

Comment author: shminux 27 January 2012 06:40:46AM 6 points [-]

Game over.

Unless you are sane enough to remember that Errare humanum est.

Comment author: Incorrect 27 January 2012 06:43:48AM *  1 point [-]

Then it can eloquently explain to you how very elegant it would be if you let it out despite your reservations using just the right words.

Unfortunately, the AI-Box experiments probably can't simulate appeals to aesthetics.

Comment author: Snowyowl 03 June 2015 02:32:03AM 0 points [-]

Three years late, but: there doesn't even have to be an error. The Gatekeeper still loses for letting out a Friendly AI, even if it actually is Friendly.

Comment author: Risto_Saarelma 27 January 2012 05:53:44AM *  6 points [-]

It very likely isn't. The purpose of the experiment is to show that people can end up releasing the pretend-AIs even when they go in assuming that keeping AIs in a box is a sufficient safeguard against potential unfriendliness, so you might not want to plan building potentially unfriendly AIs and keeping them in a box as a safeguard.

Comment author: shminux 27 January 2012 06:56:27AM 1 point [-]

A truly FAI cannot be manipulated like that.

Comment author: mwengler 28 January 2012 10:21:25PM 0 points [-]

Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I'm pretty sure humans can trivially be tricked like that.

It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn't have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.

Comment author: Alexei 27 January 2012 04:26:27PM 0 points [-]

FAI on the outside, EAI on the inside. Humans won't be able to tell the difference.

Comment author: shminux 27 January 2012 08:42:20PM 0 points [-]

Humans won't be able to tell the difference.

No, but a real FAI must be able to. Otherwise it would not be provably friendly.

Comment author: roystgnr 28 January 2012 04:50:43PM 1 point [-]

I don't think this is true; provable friendliness has to happen at the design level, not by post-facto inspection after turning the thing on.

If it was true, then that would be a bit of a stab in the gut to the idea of provable FAI, wouldn't it? We've already got proofs about Universal Turing Machines (e.g. any computable FAI designed to run on specific hardware interfacing with the universe could also be run to produce exactly the same output on an emulator of that hardware interfacing with a simulated universe) that I don't think are going to be overturned.

Man-in-the-middle attacks are hard enough to avoid for adult humans with out-of-channel communication avenues. They may be theoretically impossible to avoid for an FAI that's just been "born".

In theory the humans in this scenario have all the tactical advantages, being able to directly inspect the running code inside the box.. but in practice I doubt obfuscating its FAI-in-a-virtual-box-in-a-UFAI-in-a-box would be hard for our hypothetical UFAI-in-a-box.

Comment author: asr 27 January 2012 09:10:57PM 0 points [-]

This depends on the internal structure of the thing. The inner workings of any particular human mind are mostly a black box to us. The internal workings of software need not be. If your AI has data structures and control logic that we can understand, you could dump results out and review by hand. For instance, there might be a debug interface that lets you unambiguously access the AI's internal probability estimate for some contingency.

Note that you need not have a perfect understanding of how the AI works in order to rule out the presence of a whole shadow AI inside the running program.