Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

RobbBB comments on A few misconceptions surrounding Roko's basilisk - Less Wrong

39 Post author: RobbBB 05 October 2015 09:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

You are viewing a single comment's thread. Show more comments above.

Comment author: RobbBB 06 October 2015 11:26:41AM *  8 points [-]

"One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous."

This paragraph is not an Eliezer Yudkowsky quote; it's Eliezer quoting Roko. (The "ve" should be a tip-off.)

This is evidence that Yudkowsky believed, if not that Roko's argument was correct as it was, that at least it was plausible enough that could be developed in [sic] a correct argument, and he was genuinely scared by it.

If you kept going with your initial Eliezer quote, you'd have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn't think Roko's original formulation worked:

"Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn't post them to the Internet."

According to Eliezer, he had three separate reasons for the original ban: (1) he didn't want any additional people (beyond the one Roko cited) to obsess over the idea and get nightmares; (2) he was worried there might be some variant on Roko's argument that worked, and he wanted more formal assurances that this wasn't the case; and (3) he was just outraged at Roko. (Including outraged at him for doing something Roko thought would put people at risk of torture.)

What better place to discuss possible failure modes of an AI design? [...] Yelling and trying to sweep it under the rug was irresponsible.

There are lots of good reasons Eliezer shouldn't have banned R̶o̶k̶o̶ discussion of the basilisk, but I don't think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn't risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)

Roko's original argument, though, could have been stated in one sentence: 'Utilitarianism implies you'll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.' At least, that's the version of the argument that has any bearing on the conclusion 'CEV has unacceptable moral consequences'. The other arguments are a distraction: 'utilitarianism means you'll accept arbitrarily atrocious tradeoffs' is a premise of Roko's argument rather than a conclusion, and 'CEV is utilitarian in the relevant sense' is likewise a premise. A more substantive discussion would have explicitly hashed out (a) whether SIAI/MIRI people wanted to construct a Roko-style utilitarian, and (b) whether this looks like one of those philosophical puzzles that needs to be solved by AI programmers vs. one that we can safely punt if we resolve other value learning problems.

I think we agree that's a useful debate topic, and we agree Eliezer's moderation action was dumb. However, I don't think we should reflexively publish 100% of the risky-looking information we think of so we can debate everything as publicly as possible. ('Publish everything risky' and 'ban others whenever they publish something risky' aren't the only two options.) Do we disagree about that?

Comment author: philh 06 October 2015 02:16:18PM 8 points [-]

There are lots of good reasons Eliezer shouldn't have banned Roko

IIRC, Eliezer didn't ban Roko, just discussion of the basilisk, and Roko deleted his account shortly afterwards.

Comment author: RobbBB 06 October 2015 07:00:32PM 1 point [-]

Thanks, fixed!

Comment author: V_V 09 October 2015 08:57:06PM *  -1 points [-]

(2) he was worried there might be some variant on Roko's argument that worked, and he wanted more formal assurances that this wasn't the case;

I don't think we are in disagreement here.

There are lots of good reasons Eliezer shouldn't have banned R̶o̶k̶o̶ discussion of the basilisk, but I don't think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites.

The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure.

Also I'm not sure what private channles you are referring to. It's not like there is a secret Google Group of all potential AGI designers, is there?
Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn't have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode?
LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern.

Roko's original argument, though, could have been stated in one sentence: 'Utilitarianism implies you'll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.'

It wasn't just that. It was an argument against utilitarianism AND a decision theory that allowed to consider "acausal" effects (e.g. any theory that one-boxes in Newcomb's problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.

Comment author: Houshalter 08 October 2015 02:20:28PM -1 points [-]

There are lots of good reasons Eliezer shouldn't have banned R̶o̶k̶o̶ discussion of the basilisk, but I don't think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn't risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)

As I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it's rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger.

Roko's original argument, though, could have been stated in one sentence: 'Utilitarianism implies you'll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.'

My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically "acausal trade seems wrong or weird", so they basically agree with Roko.

Comment author: RobbBB 08 October 2015 07:09:24PM *  1 point [-]

My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade.

Roko wasn't arguing against TDT. Roko's post was about acausal trade, but the conclusion he was trying to argue for was just 'utilitarian AI is evil because it causes suffering for the sake of the greater good'. But if that's your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.

As I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason.

On Roko's view, if no one finds out about basilisks, the basilisk can't blackmail anyone. So publicizing the idea doesn't make sense, unless Roko didn't take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others' expense, but this seems odd if he also increased his own blackmail risk in the process.)

Possibly Roko was thinking: 'If I don't prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work -- publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.'

(... Irony unintended.)

Still, if that's right, I'm inclined to think Roko should have tried to post other arguments against utilitarianism that don't (in his view) put anyone at risk of torture. I'm not aware of him having done that.

Comment author: Houshalter 09 October 2015 07:23:58AM 0 points [-]

Roko wasn't arguing against TDT. Roko's post was about acausal trade, but the conclusion he was trying to argue for was just 'utilitarian AI is evil because it causes suffering for the sake of the greater good'. But if that's your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.

Ok that makes a bit less sense to me. I didn't think it was against utilitarianism in general, which is much less controversial than TDT. But I can definitely still see his argument.

When people talk about the trolley problem, they don't usually imagine that they might be the ones tied to the second track. The deeply unsettling thing about the basilisk isn't that the AI might torture people for the greater good. It's that you are the one who is going to be tortured. That a pretty compelling case against utilitarianism.

On Roko's view, if no one finds out about basilisks, the basilisk can't blackmail anyone. So publicizing the idea doesn't make sense, unless Roko didn't take his own argument all that seriously.

Roko found out. It disturbed him greatly. So it absolutely made sense for him to try to stop the development of such an AI any way he could. By telling other people, he made it their problem too and converted them to his side.

Comment author: gjm 09 October 2015 10:08:12AM 1 point [-]

It's that you are the one who is going to be tortured. That's a pretty compelling case against utilitarianism.

It doesn't appear to me to be a case against utilitarianism at all. "Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong" doesn't even have the right shape to be a valid argument. It's like "If there is no god then many bad people will prosper and not get punished, which would be awful, therefore there is a god." (Or, from the other side, "If there is a god then he may choose to punish me, which would be awful, therefore there is no god" -- which has a thing or two in common with the Roko basilisk, of course.)

he made it their problem too and converted them to his side.

Perhaps he hoped to. I don't see any sign that he actually did.

Comment author: Houshalter 09 October 2015 10:21:19AM *  0 points [-]

"Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong" doesn't even have the right shape to be a valid argument.

You are strawmanning the argument significantly. I would word it more like this:

"Building an AI that follows utilitarianism will lead to me getting tortured. I don't want to be tortured. Therefore I don't want such an AI to be built."

Perhaps he hoped to. I don't see any sign that he actually did.

That's partially because EY fought against it so hard and even silenced the discussion.

Comment author: gjm 09 October 2015 01:43:03PM 1 point [-]

I would word it more like this

So there are two significant differences between your version and mine. The first is that mine says "might" and yours says "will", but I'm pretty sure Roko wasn't by any means certain that that would happen. The second is that yours ends "I don't want such an AI to be built", which doesn't seem to me like the right ending for "a case against utilitarianism".

(Unless you meant "a case against building a utilitarian AI" rather than "a case against utilitarianism as one's actual moral theory"?)

Comment author: Houshalter 10 October 2015 10:04:29AM 1 point [-]

The first is that mine says "might" and yours says "will", but I'm pretty sure Roko wasn't by any means certain that that would happen.

I should have mentioned that it's conditional on the Basilisk being correct. If we build an AI that follows that line of reasoning, then it will torture. If the basilisk isn't correct for unrelated reasons, then this whole line of reasoning is irrelevant.

Anyway, the exact certainty isn't too important. You use the word "might", as if the probability of you being tortured was really small. Like the AI would only do it in really obscure scenarios. And you are just as likely to be picked for torture as anyone else.

Roko believed that the probability was much higher, and therefore worth worrying about.

The second is that yours ends "I don't want such an AI to be built", which doesn't seem to me like the right ending for "a case against utilitarianism".

Unless you meant "a case against building a utilitarian AI" rather than "a case against utilitarianism as one's actual moral theory"?

Well the AI is just implementing the conclusions of utilitarianism (again, conditional on the basilisk argument being correct.) If you don't like those conclusions, and if you don't want AIs to be utilitarian, then do you really support utilitarianism?

It's a minor semantic point though. The important part is the practical consequences for how we should build AI. Whether or not utilitarianism is "right" is more subjective and mostly irrelevant.

Comment author: gjm 10 October 2015 09:42:33PM 0 points [-]

Roko believed that the probability was much higher

All I know about what Roko believed about the probability is that (1) he used the word "might" just as I did and (2) he wrote "And even if you only think that the probability of this happening is 1%, ..." suggesting that (a) he himself probably thought it was higher and (b) he thought it was somewhat reasonable to estimate it at 1%. So I'm standing by my "might" and robustly deny your claim that writing "might" was strawmanning.

if you don't want AIs to be utilitarian

If you're standing in front of me with a gun and telling me that you have done some calculations suggesting that on balance the world would be a happier place without me in it, then I would probably prefer you not to be utilitarian. This has essentially nothing to do with whether I think utilitarianism produces correct answers. (If I have a lot of faith in your reasoning and am sufficiently strong-minded then I might instead decide that you ought to shoot me. But my likely failure to do so merely indicates typical human self-interest.)

The important part is the practical consequences for how we should build AI.

Perhaps so, in which case calling the argument "a case against utilitarianism" is simply incorrect.

Comment author: Houshalter 11 October 2015 04:16:32AM 0 points [-]

Roko's argument implies the AI will torture. The probability you think his argument is correct is a different matter. Roko was just saying that "if you think there is a 1% chance that my argument is correct", not "if my argument is correct, there is a 1% chance the AI will torture."

This really isn't important though. The point is, if an AI has some likelihood of torturing you, you shouldn't want it to be built. You can call that self-interest, but that's admitting you don't really want utilitarianism to begin with. Which is the point.

Anyway this is just steel-manning Roko's argument. I think the issue is with acausal trade, not utilitarianism. And that seems to be the issue most people have with it.