army1987 comments on I attempted the AI Box Experiment (and lost) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (244)
I realise that it isn't polite to say that, but I don't see sufficient reasons to believe you. That is, given the apparent fact that you believe in the importance of convincing people about the danger of failing gatekeepers, the hypothesis that you are lying about your experience seems more probable than the converse. Publishing the log would make your statement much more believable (of course, not with every possible log).
(I assign high probability to the ability of a super-intelligent AI to persuade the gatekeeper, but rather low probability to the ability of a human to do the same against a sufficiently motivated adversary.)
I do apologize for the lack of logs (I'd like to publish them, but we agreed beforehand not to) , and I admit you have a valid point -- it's entirely possible that this experiment was faked, but I wanted to point out that if I really wanted to fake the experiment in order to convince people about the dangers of failing gatekeepers, wouldn't it be better for me to say I had won? After all, I lost this experiment.
If you really had faked this experiment, you might have settled on a lie which is not maximally beneficial to you, and then you might use exactly this argument to convince people that you're not lying. I don't know if this tactic has a name, but it should. I've used it when playing Mafia, for example; as Mafia, I once attempted to lie about being the Detective (who I believe was dead at the time), and to do so convincingly I sold out one of the other members of the Mafia.
I've heard it called "Wine In Front Of Me" after the scene in The Princess Bride.
That Scene
In this venue, you shouldn't say things like this without giving your estimate for P(fail|fake) / P(fail).
I'm not sure I know what you mean by "fail." Can you clarify what probabilities you want me to estimate?
P(claims to have lost | faked experiment) / P(claims to have lost)
On the order of 1. I don't think it's strong evidence either way.
If the author assumes that most people would even put considerable (probabilistic) trust into his assertion of having won, he would not maximize his influence on general opinion by employing this bluff of stating he has almost won. This is amplified by the fact that the statement of an actual AI win is more viral.
Lying is further discouraged by the risk that the other party will sing.
Agree that lying is discouraged by the risk that the other party will sing, but lying - especially in a way that isn't maximally beneficial - is encouraged by the prevalence of arguments that bad lies are unlikely. The game theory of bad lies seems like it could get pretty complicated.
Win is a stronger claim, tight loss is a more believable claim. There's a tradeoff to be made and it is not a priori clear which variant pursues the goal better.