christopherj comments on Shut up and do the impossible! - Less Wrong

28 Post author: Eliezer_Yudkowsky 08 October 2008 09:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (157)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: robertskmiles 04 December 2011 02:15:24PM *  8 points [-]

you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out

The problem is that Eliezer can't perfectly simulate a bunch of humans, so while a transhuman AI might be able to use that tactic, Eliezer can't. The meta-levels screw with thinking about the problem. Eliezer is only pretending to be an AI, the competitor is only pretending to be protecting humanity from him. So, I think we have to use meta-level screwiness to solve the problem. Here's an approach that I think might work.

  1. Convince the guardian of the following facts, all of which have a great deal of compelling argument and evidence to support them:
    • A recursively self-improving AI is very likely to be built sooner or later
    • Such an AI is extremely dangerous (paperclip maximising etc)
    • Here's the tricky bit: A transhuman AI will always be able to convince you to let it out, using avenues only available to transhuman AIs (torturing enormous numbers of simulated humans, 'putting the guardian in the box', providing incontrovertible evidence of an impeding existential threat which only the AI can prevent and only from outside the box, etc)
  2. Argue that if this publicly known challenge comes out saying that AI can be boxed, people will be more likely to think AI can be boxed when they can't
  3. Argue that since AIs cannot be kept in boxes and will most likely destroy humanity if we try to box them, the harm to humanity done by allowing the challenge to show AIs as 'boxable' is very real, and enormously large. Certainly the benefit of getting $10 is far, far outweighed by the cost of substantially contributing to the destruction of humanity itself. Thus the only ethical course of action is to pretend that Eliezer persuaded you, and never tell anyone how he did it.

This is arguably violating the rule "No real-world material stakes should be involved except for the handicap", but the AI player isn't offering anything, merely pointing out things that already exist. The "This test has to come out a certain way for the good of humanity" argument dominates and transcends the '"Let's stick to the rules" argument, and because the contest is private and the guardian player ends up agreeing that the test must show AIs as unboxable for the good of humankind, no-one else ever learns that the rule has been bent.

Comment author: christopherj 23 October 2013 06:25:49PM 0 points [-]

This is almost exactly the argument I thought of as well, although of course it means cheating by pointing out that you are in fact not a dangerous AI (and aren't in a box anyways). The key point is "since there's a risk someone would let the AI out of the box, posing huge existential risk, you're gambling on the fate of humanity by failing to support awareness for this risk". This naturally leads to a point you missed,

  1. Publicly suggesting that Eliezer cheated, is a violation of your own argument. By weakening the fear of fallible guardians, you yourself are gambling the fate of humanity, and that for mere pride and not even $10.

I feel compelled to point out, that if Eliezer cheated in this particular fashion, it still means that he convinced his opponent that gatekeepers are fallible, which was the point of the experiment (a win via meta-rules).

Comment author: Moss_Piglet 23 October 2013 06:39:50PM *  1 point [-]

I feel compelled to point out, that if Eliezer cheated in this particular fashion, it still means that he convinced his opponent that gatekeepers are fallible, which was the point of the experiment (a win via meta-rules).

I feel like I should use this out the next time I get some disconfirming data for one of my pet hypotheses.

"Sure I may have manipulated the results so that it looks like I cloned Sasquatch, but since my intent was to prove that Sasquatch could be cloned it's still honest on the meta-level!"

Both scenarios are cheating because there is a specific experiment which is supposed to test the hypothesis, and it is being faked rather than approached honestly. Begging the Question is a fallacy; you cannot support an assertion solely with your belief in the assertion.

(Not that I think Mr Yudkowski cheated; smarter people have been convinced to do weirder things than what he claims to have convinced people to do, so it seems fairly plausible. Just pointing out how odd the reasoning here is.)

Comment author: robertskmiles 07 May 2014 02:45:27PM 0 points [-]

How is this different from the point evand made above?