prase comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread.

Comment author: prase 21 January 2013 05:59:22PM *  1 point [-]

I realise that it isn't polite to say that, but I don't see sufficient reasons to believe you. That is, given the apparent fact that you believe in the importance of convincing people about the danger of failing gatekeepers, the hypothesis that you are lying about your experience seems more probable than the converse. Publishing the log would make your statement much more believable (of course, not with every possible log).

(I assign high probability to the ability of a super-intelligent AI to persuade the gatekeeper, but rather low probability to the ability of a human to do the same against a sufficiently motivated adversary.)

Comment author: MixedNuts 23 January 2013 06:47:09PM 3 points [-]

We played. He lost. He came much closer to winning than I expected, though he overstates how close more often than he understates it. The tactic that worked best attacked a personal vulnerability of mine, but analogues are likely to exist for many people.

Comment author: prase 24 January 2013 12:10:58AM 0 points [-]

For the record, I didn't think that if he made the story up, he would do so without a credible agreement that you would verify his claims.

Comment author: Tuxedage 21 January 2013 11:20:51PM 3 points [-]

I do apologize for the lack of logs (I'd like to publish them, but we agreed beforehand not to) , and I admit you have a valid point -- it's entirely possible that this experiment was faked, but I wanted to point out that if I really wanted to fake the experiment in order to convince people about the dangers of failing gatekeepers, wouldn't it be better for me to say I had won? After all, I lost this experiment.

Comment author: Qiaochu_Yuan 22 January 2013 10:28:42AM *  4 points [-]

I really wanted to fake the experiment in order to convince people about the dangers of failing gatekeepers, wouldn't it be better for me to say I had won? After all, I lost this experiment.

If you really had faked this experiment, you might have settled on a lie which is not maximally beneficial to you, and then you might use exactly this argument to convince people that you're not lying. I don't know if this tactic has a name, but it should. I've used it when playing Mafia, for example; as Mafia, I once attempted to lie about being the Detective (who I believe was dead at the time), and to do so convincingly I sold out one of the other members of the Mafia.

Comment author: FluffyC 22 January 2013 06:45:23PM *  1 point [-]

I don't know if this tactic has a name, but it should.

I've heard it called "Wine In Front Of Me" after the scene in The Princess Bride.

That Scene

Comment author: [deleted] 22 January 2013 12:13:02PM *  0 points [-]

If you really had faked this experiment, you might have settled on a lie which is not maximally beneficial to you, and then you might use exactly this argument to convince people that you're not lying.

In this venue, you shouldn't say things like this without giving your estimate for P(fail|fake) / P(fail).

Comment author: Qiaochu_Yuan 22 January 2013 12:36:01PM 0 points [-]

I'm not sure I know what you mean by "fail." Can you clarify what probabilities you want me to estimate?

Comment author: ESRogs 22 January 2013 08:10:47PM 1 point [-]

P(claims to have lost | faked experiment) / P(claims to have lost)

Comment author: Qiaochu_Yuan 22 January 2013 11:12:19PM 0 points [-]

On the order of 1. I don't think it's strong evidence either way.

Comment author: accolade 22 January 2013 12:07:33PM *  0 points [-]

If the author assumes that most people would even put considerable (probabilistic) trust into his assertion of having won, he would not maximize his influence on general opinion by employing this bluff of stating he has almost won. This is amplified by the fact that the statement of an actual AI win is more viral.

Lying is further discouraged by the risk that the other party will sing.

Comment author: Qiaochu_Yuan 22 January 2013 12:39:25PM *  0 points [-]

Agree that lying is discouraged by the risk that the other party will sing, but lying - especially in a way that isn't maximally beneficial - is encouraged by the prevalence of arguments that bad lies are unlikely. The game theory of bad lies seems like it could get pretty complicated.

Comment author: prase 22 January 2013 06:37:46PM 0 points [-]

if I really wanted to fake the experiment in order to convince people about the dangers of failing gatekeepers, wouldn't it be better for me to say I had won?

Win is a stronger claim, tight loss is a more believable claim. There's a tradeoff to be made and it is not a priori clear which variant pursues the goal better.

Comment author: Swimmy 21 January 2013 07:53:03PM 0 points [-]
Comment author: prase 22 January 2013 06:39:12PM *  0 points [-]

Could you please elaborate the point you are trying to make?

Comment author: Swimmy 23 January 2013 03:38:49AM *  0 points [-]

Most people don't usually make these kinds of elaborate things up. Prior probability for that hypothesis is low, even if it might be higher for Tuxedage than it would be for an average person. People do actually try the AI box experiment, and we had a big thread about people potentially volunteering to do it a while back, so prior information suggests that LWers do want to participate in these experiments. Since extraordinary claims are extraordinary evidence (within limits), Tuxedage telling this story is good enough evidence that it really happened.

But on a separate note, I'm not sure the prior probability for this being a lie would necessarily be higher just because Tuxedage has some incentive to lie. If it is found out to be a lie, the cause of FAI might be significantly hurt ("they're a bunch of nutters who lie to advance their silly religious cause"). Folks on Rational Wiki watch this site for things like that, so Tuxedage also has some incentive to not lie. Also more than one person has to be involved in this lie, giving a complexity penalty. I suppose the only story detail that needs to be a lie to advance FAI is "I almost won," but then why not choose "I won"?

Comment author: prase 24 January 2013 12:01:51AM 0 points [-]

Most people don't usually make these kinds of elaborate things up. Prior probability for that hypothesis is low, even if it might be higher for Tuxedage than it would be for an average person.

Most people don't report about these kinds of things either. The correct prior is not the frequency of elaborate lies among all statements of an average person, but the frequency of lies among the relevant class of dubious statements. Of course, what constitutes the relevant class may be disputed.

Anyway, I agree with Hanson that it is not low prior probability which makes a claim dubious in the relevant sense, but rather the fact that the speaker may be motivated to say it for reasons independent of its truth. In such cases, I don't think the claim is extraordinary evidence, and I consider this to be such a case. Probably not much more can be said without writing down the probabilities which I'd prefer not to, but am willing to do it if you insist.

I suppose the only story detail that needs to be a lie to advance FAI is "I almost won," but then why not choose "I won"?

In order to allow this argument.

Comment author: ArisKatsaris 24 January 2013 12:33:29AM -1 points [-]

When talking about games without an explicit score, "I almost won" is a very fuzzy phrase which can be translated to "I lost" without real loss of meaning.

I don't think there's any point in treating the "almost victory" as anything other than a defeat, for either the people who believe or disbelieve him.

Comment author: prase 25 January 2013 12:38:58AM 1 point [-]

If I am interested in the question of whether winning is possible in the game, "almost victory" and "utter defeat" have very different meaning for me. Why would I need explicit score?