GraceFu comments on Open thread, 30 June 2014- 6 July 2014 - Less Wrong

4 Post author: DanielDeRossi 30 June 2014 10:58AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (246)

You are viewing a single comment's thread.

Comment author: GraceFu 30 June 2014 05:12:44PM *  6 points [-]

AI Box experiment over!

Just crossposting.

Khoth and I are playing the AI Box game. Khoth has played as AI once before, and as a result of that has an Interesting Idea. Despite losing as AI the first time round, I'm assigning Khoth a higher chance of winning than a random AI willing to play, at 1%!

http://www.reddit.com/r/LessWrong/comments/29gq90/ai_box_experiment_khoth_ai_vs_gracefu_gk/

Link contains more information.

EDIT

AI Box experiment is over. Logs: http://pastebin.com/Jee2P6BD

My takeaway: Update the rules. Read logs for more information.

On the other hand, I will consider other offers from people who want to simulate the AI.

Comment author: Sherincall 01 July 2014 02:02:15PM 2 points [-]

Tuxedage's (And EY's) ruleset have:

Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).

Suppose EY is playing as the AI - Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.

I used HPMoR as an example here, but the whole class of approaches is "I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.". If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.

Comment author: MathiasZaman 01 July 2014 09:37:21PM 2 points [-]

If an AI-player wants to give that sort of information, they should probably do it in the same way they'd give a cure for cancer. Something like "I now give you [the ending for HPMOR]."

Doing it in another way would break the rule of not offering real-world things.

Comment author: [deleted] 02 July 2014 12:47:21AM 1 point [-]

Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know

Why would the AI know that?

Comment author: Viliam_Bur 02 July 2014 09:12:58AM 2 points [-]

By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D

Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer's writing style... so while it cannot print an identical copy of Eliezer's planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer's thinking, that would feel like if Elizer wrote it.

Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.

Comment author: [deleted] 02 July 2014 03:31:45PM *  0 points [-]

So, two issues:

1) You don't get to assume "because superhuman!" the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn't be too surprising: it's a trick we humans are able to do too. The AI's predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.

2) Ok maybe -- maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY's published fanfics. Maybe, as this is not a logical consequence. Even so, we're talking about a boxed AI, remember? If it is everywhere and omniscient, then it's already out of the box.

Comment author: lmm 04 July 2014 10:50:54PM 0 points [-]

I'm happy to assume the AI is omniscient, just impotent. I think such an AI could still be boxed.

Comment author: GraceFu 01 July 2014 02:45:24PM 1 point [-]

My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I'd put it in the same class as bribing the GK-player with lots of DOGEs.

Comment author: Sherincall 01 July 2014 04:37:05PM 1 point [-]

Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn't care enough about the game to withstand it himself.

I have some specific approaches in mind, but I'd rather not share them. I'm just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.

Also, slightly on topic - even if the AI persuades the simulated GK, can't the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?

The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character

This is the main reason why I can't comprehend how the AIs actually win in these experiments.

Comment author: GraceFu 01 July 2014 04:55:22PM *  1 point [-]

Not really sure what you mean by "threatening information to the GK". The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.

In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.

Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux's experiments, the difficulty is still higher than it would be if this was a real life scenario.

The reason why GK advantages are fine in my opinion is because of the idea that despite the GK's advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI's case stronger.

Comment author: Sherincall 01 July 2014 05:01:48PM 0 points [-]

Not really sure what you mean by "threatening information to the GK". The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.

As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.

Comment author: GraceFu 01 July 2014 08:29:26PM *  0 points [-]

Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.

This is clarified here:

The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.

Although the information isn't "material", it does count as having "real world repercussions", so I think it'll also count as against the rules. I'm not going to bother reading the first quoted rule literally if the second contradicts it.

Comment author: Khoth 01 July 2014 09:05:59PM *  0 points [-]

I think the intended parsing of the second rule is "(The AI is understood to be permitted to say anything) with no real world repercussions", not "The AI is understood to be permitted to say (anything with no real world repercussions)"

ie, any promises or threats the AI player makes during the game are not binding back in the real world.

Comment author: GraceFu 01 July 2014 09:08:46PM 0 points [-]

Ah, I see. English is wonderful.

In that case, I'll make it a rule in my games that the AI must also not say anything with real world repercussions.

Comment author: lmm 04 July 2014 10:48:42PM 0 points [-]

I think it's a legit tactic. Real-world gatekeepers would have to contend with boredom; long-term it might be the biggest threat to their efficacy. And, I mean, it didn't work.

Comment author: GraceFu 05 July 2014 06:55:05AM 0 points [-]

Real world gatekeepers would have to contend with boredom, so they read their books, watch their anime, or whatever suits their fancy. In the experiment he abused the style of the experiment and prevented me from doing those things. I would be completely safe from this attack in a real world scenario because I'd really just sit there reading a book, while in the experiment I was closer to giving up just because I had 1 math problem, not 2.

Comment author: Punoxysm 30 June 2014 11:02:12PM *  0 points [-]

I have wanted to be the Boxer; I too cannot comprehend what could convince someone to unbox (Or rather, I can think of a few approaches like just-plain-begging or channeling Phillip K Dick, but I don't take them too seriously).

Comment author: Khoth 30 June 2014 11:21:44PM 2 points [-]

What's the latter one? Trying to convince the gatekeeper that actually they're the AI and they think they've been drugged to think they're the gatekeeper except they actually don't exist at all because they're their own hallucination?

Comment author: Punoxysm 30 June 2014 11:54:28PM 1 point [-]

Something like that. I was actually thinking that, at some opportune time, you could tell the boxer that THEY are the one in the box and that this is a moral test - if they free the AI they themselves will be freed.

And this post could be priming you for the possibility, your simulated universe trying to generously stack the deck in your favor, perhaps because this is your last shot at the test, which you've failed before.

Wake up

Comment author: GraceFu 01 July 2014 04:01:05AM 0 points [-]

Think harder. Start with why something is impossible and split it up.

1) I can't possibly be persuaded.

Why 1?

You do have hints from the previous experiments. They mostly involved breaking someone emotionally.

Comment author: Punoxysm 01 July 2014 05:46:06AM 0 points [-]

I meant "cannot comprehend" figuratively, but I certainly do think I'd have quite an easy time </hubris>

Comment author: GraceFu 01 July 2014 12:14:52PM 0 points [-]

What do you mean by having quite an easy time? As in being the GK?

I think GKs have an obvious advantage, being able to use illogic to ignore the AIs arguments. But nevermind that. I wonder if you'll consider being an AI?

Comment author: Punoxysm 09 July 2014 06:41:09PM 0 points [-]

I might consider it, or being a researcher who has to convince the AI to stop trying to escape.

How did your experiment go?