SoundLogic comments on I attempted the AI Box Experiment again! (And won - Twice!) - Less Wrong

36 Post author: Tuxedage 05 September 2013 04:49AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (163)

You are viewing a single comment's thread. Show more comments above.

Comment author: Unnamed 05 September 2013 06:17:20AM 7 points [-]

It seems like many of the advantages/tactics that Tuxedage recommended for the person playing the AI would be absent (or far more difficult) with an actual AI. Or at least they could be made that way with the design of the gatekeeping protocol.

Tailor your arguments to the personality/philosophy/weaknesses/etc. of this particular gatekeeper:
* Have multiple gatekeepers, some of whom are hidden from the AI, and don't let the AI out unless they all agree. For example, have a gatekeeper one-step-removed who reads the transcripts of the conversation between the AI & the first gatekeeper and otherwise has no contact with either of them. This gatekeeper's identity (and possibly even their existence) is hidden from the AI and the first gatekeeper.

Use speed / set the pace / shock & awe with a flurry of arguments:
* Build in delays & breaks. Have a planned schedule of breaks, and a delay before the gatekeeper can choose to release the AI, and a person reading the transcript who has the option of calling for a break if the gatekeeper seems to need one. Another option is to have a bunch of people who read the transcript, where each of them has the job of focusing on one line of argument. If any of them feel like their line of argument hasn't been covered sufficiently, then they can talk to the gatekeeper about it during a break, and prevent the AI from being released until they are satisfied that the argument has been thoroughly covered.

Find a crack in the gatekeeper and press your advantage as soon as you find it:
* This is basically a combination of the previous two, and is countered by the combination of delays and multiple gatekeepers (some hidden).

Use dark arts / lies / manipulation:
* Have people read the transcript with the job of looking for lies, inconsistencies, or other uses of the dark arts from the AI; destroy the AI if they find any. Better yet, have the AI talk with multiple gatekeepers (using multiple copies of the AI which don't share memory, so that the AI talking to gatekeeper #2 doesn't know what is being said to gatekeeper #1), and have the dark-arts-evaluators read all of the transcripts. There could also be similar readers who look for irrationality in the gatekeeper, who have the power to rule the gatekeeper incompetent and in need of replacement.

Of course, an actual superhuman AI would also have many advantages compared to a human playing the role of AI.

Comment author: SoundLogic 05 September 2013 08:09:41AM 3 points [-]

A better mind than Tuxedage could almost certainly keep up the 'feel' of a flurry of arguments even with a schedule of breaks. I myself have had people feel irritated at me where even if I talk to them with days in between that I seem to do so. If I can do so accidentally I'm certain a superintelligence could do it reliably.

Also, I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.

An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.

Comment author: Broolucks 05 September 2013 07:27:34PM 1 point [-]

I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.

At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.

An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.

I don't think it could "easily" do it. For all intents and purposes, you are forcing the AI to figure out a path-invariant strategy against a party that has better information than them (that is to say, they know what's been played out, and the AI doesn't). This is considerably more difficult than the standard experiment.

Now, sure, the AI is smart, but every wrench you throw in its way forces it to allocate more resources into fooling you. This, in itself, is problematic for the AI: every resource it allocates fooling you, it does not allocate doing useful work for you, so its maximal efficiency is technically lower than the maximal efficiency of an AI that wouldn't try concealing anything. This is a leak of information, so in a sense, if you are ranking a group of heterogenous AI by efficiency, it is slightly more likely that the better performing are the least deceitful. The more traps you spring, the more work the AI has to do in order to avoid them, and the greater the statistical gap will be. That doesn't tell you whether the least deceitful AI can be trusted, obviously, but it shows there are limits to what it can hide from you.

Now, all I'm really saying is this: the AI's cleverness comes at a cost, namely that it has to cover its ass for every possible experimental framework you might subject it to. Since it is in a box, it only has the resources you provide, but on your own side, you have a theoretically unlimited amount of resources. Smarts can only defeat so much brute force, and by transferring hardware resources from the AI to your own control, you can set the "power gap" arbitrarily. It is easy to underestimate AI, but I think we also tend to underestimate the arsenal a gatekeeper would realistically have.

Comment author: RobbBB 05 September 2013 08:44:19PM *  0 points [-]

Do keep in mind that, no matter how well-boxed the AI is from the Internet and from sense-data about our world, as a self-modifying AGI it still has access to its own source code, which is descended from a human artifact (the seed AI). The AGI can learn a great deal about human psychology by observing how we code, and a project as large and multi-staged as an AGI is likely to be will contain a whole lot of bits to work with. (Certainly more than is strictly necessary.)

Comment author: Broolucks 05 September 2013 10:08:57PM 2 points [-]

We were talking about extracting knowledge about a particular human from that human's text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I'm not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn't even contain any human-written code at all.

Comment author: RobbBB 05 September 2013 10:12:03PM 0 points [-]

Ah. I was assuming that the AI didn't know English, or anything about human psychology. My expectation is that individual variation contributes virtually nothing to the best techniques a superintelligence would use to persuade a random (trained, competent) human to release it, regardless of whether it had an easy way to learn about the individual variation.