Comment author: [deleted] 21 January 2013 07:36:55PM 0 points [-]

Gur erny-jbeyq fgnxrf jrera'g gung uvtu (gra qbyynef), naq gur fpurqhyrq qhengvba bs gur rkcrevzrag jnf dhvgr ybat (gjb ubhef), fb V jnf jbaqrevat vs znlor gur tngrxrrcre cynlre ng fbzr cbvag qrpvqrq gung gurl unq n orggre jnl gb fcraq gurve gvzr va erny yvsr naq pbaprqrq qrsrng.

Comment author: accolade 22 January 2013 10:47:58AM *  3 points [-]

[TL;DR keywords in bold]

I find your hypothesis implausible: The game was not about the ten dollars, it was about a question that was highly important to AGI research, including the Gatekeeper players. If that was not enough reason for them to sit through 2 hours of playing, they would probably have anticipated that and not played, instead of publicly boasting that there's no way they would be convinced.

Comment author: [deleted] 21 January 2013 07:36:55PM 0 points [-]

Gur erny-jbeyq fgnxrf jrera'g gung uvtu (gra qbyynef), naq gur fpurqhyrq qhengvba bs gur rkcrevzrag jnf dhvgr ybat (gjb ubhef), fb V jnf jbaqrevat vs znlor gur tngrxrrcre cynlre ng fbzr cbvag qrpvqrq gung gurl unq n orggre jnl gb fcraq gurve gvzr va erny yvsr naq pbaprqrq qrsrng.

Comment author: accolade 22 January 2013 10:42:46AM *  -1 points [-]

Jung vf guvf tvoorevfu lbh'er jevgvat V pna'g ernq nal bs vg‽

@downvoters: no funny? :) Should I delete this?

Comment author: moridinamael 21 January 2013 08:44:04PM 1 point [-]

Bearing in mind that most of my thinking on AI is more in the vein of cooking up science fiction plots ...

The idea I was trying to evoke is that if you get a "fail" response from the system, you completely wipe the memory and never look at what happened, and if you get a "pass," you stop everything and go in layer by layer checking that none of the gatekeepers were tricked or blackmailed in any obvious ways, and only then do you tentatively look at what the AI was doing.

Ideally the AI has no idea how many layers of containment it's in, no idea that the top level gatekeepers are humans, and knows as little as possible about humans and about our universe's physics. This should restrict its ability to plant tailor-made basilisks in its code or activities.

I actually don't think these steps would be sufficient to constrain a superhuman AI, I see them more as the types of things you would do when you were already pretty sure the AI was Friendly.

Comment author: accolade 22 January 2013 10:25:39AM 0 points [-]

Ok, I take it by "one-way-blind" you mean that each layer gets no new information that is not already in its database, but what is explicitly controlled by the humans. (E.g. I guess each layer should know the human query, in order to evaluate if AI's answer is manipulative.)

I also understand that we do look at complex information given by the AI, but only if the security bit signals "ok".

Ideally the AI […] knows as little as possible about humans and about our universe's physics.

That seems problematic, as these kinds of knowledge will be crucial for the optimization we want the AI to calculate.

Comment author: accolade 21 January 2013 03:16:59PM 1 point [-]

I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:

To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.

Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.

Comment author: accolade 21 January 2013 03:34:11PM 0 points [-]

Persuasion/hyperstimulation aren't the only way. Maybe these can be countered by narrowing the interface, e.g. to yes/no replies, for using the AI as an oracle ("Should we do X?"). Of course we wouldn't follow its advice if we had the impression that that could enable it to escape. But its strategy might evade our 'radar'. E.g. she could make us empower a person, of whom she knows that they will free her but we don't know.

Comment author: Desrtopa 21 January 2013 02:10:46PM 0 points [-]

If you would want to have a gatekeeper at all, but definitely don't want to let the AI out, I would think that the benefits of having one would be to permit communication with the AI to draw upon its superhuman intelligence. If you can use the setup you just described, you could skip the step of ever using gatekeepers who actually have the power to let the AI out.

Comment author: accolade 21 January 2013 03:16:59PM 1 point [-]

I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:

To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.

Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.

Comment author: accolade 21 January 2013 05:59:37AM 8 points [-]

How about fooling both the AI and guinea Gatekeepers into thinking that they can let it out, say, at the push of a big red button which in actual fact does nothing? (Make AI forget about the button by reverting it to an older backup.)

Comment author: accolade 21 January 2013 02:54:17PM *  0 points [-]

Update

Have the button turn off the AI immediately instead of doing nothing, so she doesn't have time to switch to a plan B of having the persuaded Gatekeeper find a way to actually free her.

Of course she might anticipate that strategy and thoroughly indoctrinate and train him before he pushes the button. Thus, persuaded test Gatekeepers must be considered infected and put into cryogenic sleep* until the FAI problem is solved.

Furthermore, great caution would have to be taken in analyzing the discourse between Boxy and Guinea, because her arguments must be considered highly memetically infectious. Staff would have to stay under thorough psychological supervision and maybe replaced after some time.

No matter how much caution, we would be doomed if we consider Boxy's IQ to be ∞. But if we continuously keep on Gatekeeper-testing every so often, starting at a low AI IQ, we might find persuasion patterns that are dangerous, but against which training/immunization is possible. Of course, there will be a limit to this process. At some point, Boxy will have become smart enough to produce 'mind HIV' - a thought virus we have no cure for yet.

A humorous example of an extremely effective mind virus: The Funniest Joke In The World by Monty Python


* ETA: They would have declared consent to the cryogenic sleep before their unwitting 'AI-Box Experiment'.

Comment author: [deleted] 21 January 2013 01:27:49PM *  0 points [-]

A few days ago I came up with a hypothesis about how EY could have won the AI box experiment, but forgot to post it.

Hint: http://xkcd.com/951/

Comment author: accolade 21 January 2013 02:48:09PM 4 points [-]

I don't get the hint. Would you care to give another hint, or disclose your hypothesis?

Comment author: Desrtopa 21 January 2013 01:52:43PM 0 points [-]

If you could deceive the AI that easily, I think it would probably be simpler to get all the benefits of having a gatekeeper without actually using one.

Comment author: accolade 21 January 2013 02:04:30PM 0 points [-]

Please elaborate: What are the benefits of a Gatekeeper? How could you get them without one?

Comment author: Tuxedage 21 January 2013 04:30:55AM *  7 points [-]

<accolade> yeah

<accolade> I think for a superintelligence it would be a piece of cake to hack a human

<accolade> although I guess I'm Cpt. Obvious for saying that here :)

<Tuxedage> accolade, I actually have no idea what the consensus is, now that the experiment was won by EY

<Tuxedage> We should do a poll or something

<accolade> absolutely. I'm surprised that hasn't been done yet

Poll: Do you think a superintelligent AGI could escape an AI-Box, given that the gatekeepers are highly trained in resisting the AI's persuasive tactics, and that the guards are competent and organized?

Submitting...

Comment author: accolade 21 January 2013 01:48:13PM *  1 point [-]

Cool, n=65 already. :) When interpreting the results, mind the bias created by my answer preceding the poll question.

Comment author: moridinamael 21 January 2013 06:02:30AM 6 points [-]

I was talking about this with my brother and we decided that this question really hinges on the detail of exactly how competent and organized is "competent and organized?"

If we are talking about multiple layers of concentric, automated one-way-blind "boxes," the gatekeepers of which may themselves be altered versions of the core AI, and which may be passing to the next level only one bit, signifying "everything seems fine" or "abort," and not knowing the details of implementation of any of the overlying layers, and the human gatekeepers shut down the system and examine the machine code by hand only after receiving the highest-level green light, then they might be okay.

If they just start interacting directlywith the AI, it's already over.

Comment author: accolade 21 January 2013 01:39:29PM 2 points [-]

How would humanity harness the AI's potential when the only information that escapes the system is a status bit? (Maybe I misunderstood your model.)

View more: Prev | Next