Comment author: V_V 10 October 2015 03:12:56PM *  0 points [-]

If you're saying 'LessWrongers think there's a serious risk they'll be acausally blackmailed by a rogue AI', then that seems to be false. That even seems to be false in Eliezer's case,

Is it?

Assume that:
a) There will be a future AI so powerful to torture people, even posthumously (I think this is quite speculative, but let's assume it for the sake of the argument).
b) This AI will be have a value system based on some form of utilitarian ethics.
c) This AI will use an "acausal" decision theory (one that one-boxes in Newcomb's problem).

Under these premises it seems to me that Roko's argument is fundamentally correct.

As far as I can tell, belief in these premises was not only common in LessWrong at that time, but it was essentially the officially endorsed position of Eliezer Yudkowsky and SIAI. Therefore, we can deduce that EY should have believed that Roko's argument was correct.

But EY claims that he didn't believe that Roko's argument was correct. So the question is: is EY lying?

His behavior was certainly consistent with him believing Roko's argument. If he wanted to prevent the diffusion of that argument, then even lying about its correctness seems consistent.

So, is he lying? If he is not lying, then why didn't he believe Roko's argument? As far as I know, he never provided a refutation.

Comment author: RobbBB 14 October 2015 04:41:38AM *  1 point [-]

This was addressed on the LessWrongWiki page; I didn't copy the full article here.

A few reasons Roko's argument doesn't work:

  • 1 - Logical decision theories are supposed to one-box on Newcomb's problem because it's globally optimal even though it's not optimal with respect to causally downstream events. A decision theory based on this idea could follow through on blackmail threats even when doing so isn't causally optimal, which appears to put past agents at risk of coercion by future agents. But such a decision theory also prescribes 'don't be the kind of agent that enters into trades that aren't globally optimal, even if the trade is optimal with respect to causally downstream events'. In other words, if you can bind yourself to precommitments to follow through on acausal blackmail, then it should also be possible to bind yourself to precommitments to ignore threats of blackmail.

The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category.

  • 2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice.

  • 3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent's probability of being created, all else being equal.

  • 4 - Insofar as it's "utilitarian" to horribly punish anyone who doesn't perfectly promote human flourishing, SIAI doesn't seem to have endorsed utilitarianism.

4 means that the argument lacks practical relevance. The idea of CEV doesn't build in very much moral philosophy, and it doesn't build in predictions about the specific dilemmas future agents might end up in.

Comment author: mwengler 09 October 2015 01:49:39PM 0 points [-]

I'd go so far as to say that anyone who advocates cooperating in a one-shot prisoners' dilemma simply doesn't understand the setting. By definition, defecting gives you a better outcome than cooperating. Anyone who claims otherwise is changing the definition of the prisoners' dilemma.

I think this is correct. I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person. I think we have evolved to cooperate, or perhaps that should be stated as we have evolved to want to cooperate. We have evolved to value cooperating. Our values come from our genes and our memes, and both are subject to evolution, to natural selection. But we want to cooperate.

So if I am in a prisoner's dilemma against another human, if I perceive that other human as "one of us," I will choose cooperation. Essentially, I care about their outcome. But in a one-shot PD defecting is the "better" strategy. The problem is that with genetic and/or memetic evolution of cooperation, we are not playing in a one-shot PD. We are playing with a set of values that developed over many shots.

Of course we don't always cooperate. But when we do cooperate in one-shot PD's, it is because, in some sense, there are so darn many one-shot PD's, especially in the universe of hypotheticals, that we effectively know there is no such thing as a one-shot PD. This should not be too hard to accept around here where people semi-routinely accept simulations of themselves or clones of themselves as somehow just as important as their actual selves. I.e. we don't even accept the "one-shottedness" of ourselves.

Comment author: RobbBB 09 October 2015 07:58:34PM 1 point [-]

I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person.

If you have 100% identical consequentialist values to all other humans, then that means 'cooperation' and 'defection' are both impossible for humans (because they can't be put in PDs). Yet it will still be correct to defect (given that your decision and the other player's decision don't strongly depend on each other) if you ever run into an agent that doesn't share all your values. See The True Prisoner's Dilemma.

This shows that the iterated dilemma and the dilemma-with-common-knowledge-of-rationality allow cooperation (i.e., giving up on your goal to enable someone else to achieve a goal you genuinely don't want them to achieve), whereas loving compassion and shared values merely change goal-content. To properly visualize the PD, you need an actual value conflict -- e.g., imagine you're playing against a serial killer in a hostage negotiation. 'Cooperating' is just an English-language label; the important thing is the game-theoretic structure, which allows that sometimes 'cooperating' looks like letting people die in order to appease a killer's antisocial goals.

Comment author: Houshalter 08 October 2015 02:20:28PM -1 points [-]

There are lots of good reasons Eliezer shouldn't have banned R̶o̶k̶o̶ discussion of the basilisk, but I don't think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn't risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)

As I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it's rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger.

Roko's original argument, though, could have been stated in one sentence: 'Utilitarianism implies you'll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.'

My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically "acausal trade seems wrong or weird", so they basically agree with Roko.

Comment author: RobbBB 08 October 2015 07:09:24PM *  1 point [-]

My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade.

Roko wasn't arguing against TDT. Roko's post was about acausal trade, but the conclusion he was trying to argue for was just 'utilitarian AI is evil because it causes suffering for the sake of the greater good'. But if that's your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.

As I understand Roko's motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason.

On Roko's view, if no one finds out about basilisks, the basilisk can't blackmail anyone. So publicizing the idea doesn't make sense, unless Roko didn't take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others' expense, but this seems odd if he also increased his own blackmail risk in the process.)

Possibly Roko was thinking: 'If I don't prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work -- publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.'

(... Irony unintended.)

Still, if that's right, I'm inclined to think Roko should have tried to post other arguments against utilitarianism that don't (in his view) put anyone at risk of torture. I'm not aware of him having done that.

Comment author: anon85 08 October 2015 02:16:27AM 1 point [-]

Defecting gives you a better outcome than cooperating if your decision is uncorrelated with the other players'. Different humans' decisions aren't 100% correlated, but they also aren't 0% correlated, so the rationality of cooperating in the one-shot PD varies situationally for humans.

You're confusing correlation with causation. Different players' decision may be correlated, but they sure as hell aren't causative of each other (unless they literally see each others' code, maybe).

But part of the reason for cooperation is probably also that we've evolved to do a very weak and probabilistic version of 'source code sharing': we've evolved to (sometimes) involuntarily display veridical evidence of our emotions, personality, etc. -- as opposed to being in complete control of the information we give others about our dispositions.

Calling this source code sharing, instead of just "signaling for the purposes of a repeated game", seems counter-productive. Yes, I agree that in a repeated game, the situation is trickier and involves a lot of signaling. The one-shot game is much easier: just always defect. By definition, that's the best strategy.

Comment author: RobbBB 08 October 2015 09:35:17AM *  1 point [-]

You're confusing correlation with causation. Different players' decision may be correlated, but they sure as hell aren't causative of each other (unless they literally see each others' code, maybe).

Causation isn't necessary. You're right that correlation isn't quite sufficient, though!

What's needed for rational cooperation in the prisoner's dilemma is a two-way dependency between A and B's decision-making. That can be because A is causally impacting B, or because B is causally impacting B; but it can also occur when there's a common cause and neither is causing the other, like when my sister and I have similar genomes even though my sister didn't create my genome and I didn't create her genome. Or our decision-making processes can depend on each other because we inhabit the same laws of physics, or because we're both bound by the same logical/mathematical laws -- even if we're on opposite sides of the universe.

(Dependence can also happen by coincidence, though if it's completely random I'm not sure how'd you find out about it in order to act upon it!)

The most obvious example of cooperating due to acausal dependence is making two atom-by-atom-identical copies of an agent and put them in a one-shot prisoner's dilemma against each other. But two agents whose decision-making is 90% similar instead of 100% identical can cooperate on those grounds too, provided the utility of mutual cooperation is sufficiently large.

For the same reason, a very large utility difference can rationally mandate cooperation even if cooperating only changes the probability of the other agent's behavior from '100% probability of defection' to '99% probability of defection'.

Calling this source code sharing, instead of just "signaling for the purposes of a repeated game", seems counter-productive.

I disagree! "Code-sharing" risks confusing someone into thinking there's something magical and privileged about looking at source code. It's true this is an unusually rich and direct source of information (assuming you understand the code's implications and are sure what you're seeing is the real deal), but the difference between that and inferring someone's embarrassment from a blush is quantitative, not qualitative.

Some sources of information are more reliable and more revealing than others; but the same underlying idea is involved whenever something is evidence about an agent's future decisions. See: Newcomblike Problems are the Norm

Yes, I agree that in a repeated game, the situation is trickier and involves a lot of signaling. The one-shot game is much easier: just always defect. By definition, that's the best strategy.

If you and the other player have common knowledge that you reason the same way, then the correct move is to cooperate in the one-shot game. The correct move is to defect when those conditions don't hold strongly enough, though.

Comment author: anon85 07 October 2015 05:35:06PM 1 point [-]

I find TDT to be basically bullshit except possibly when it is applied to entities which literally see each others' code, in which case I'm not sure (I'm not even sure if the concept of "decision" even makes sense in that case).

I'd go so far as to say that anyone who advocates cooperating in a one-shot prisoners' dilemma simply doesn't understand the setting. By definition, defecting gives you a better outcome than cooperating. Anyone who claims otherwise is changing the definition of the prisoners' dilemma.

Comment author: RobbBB 07 October 2015 06:12:29PM *  1 point [-]

Defecting gives you a better outcome than cooperating if your decision is uncorrelated with the other players'. Different humans' decisions aren't 100% correlated, but they also aren't 0% correlated, so the rationality of cooperating in the one-shot PD varies situationally for humans.

Part of the reason why humans often cooperate in PD-like scenarios in the real world is probably that there's uncertainty about how iterated the PD is (and our environment of evolutionary adaptedness had a lot more iterated encounters than once-off encounters). But part of the reason for cooperation is probably also that we've evolved to do a very weak and probabilistic version of 'source code sharing': we've evolved to (sometimes) involuntarily display veridical evidence of our emotions, personality, etc. -- as opposed to being in complete control of the information we give others about our dispositions.

Because they're at least partly involuntary and at least partly veridical, 'tells' give humans a way to trust each other even when there are no bad consequences to betrayal -- which means at least some people can trust each other at least some of the time to uphold contracts in the absence of external enforcement mechanisms. See also Newcomblike Problems Are The Norm.

Comment author: anon85 07 October 2015 05:23:17PM 1 point [-]

Probably not a quick fix, but I would definitely say Eliezer gives significant chances (say, 10%) to there being some viable version of the Basilisk, which is why he actively avoids thinking about it.

If Eliezer was just angry at Roko, he would have yelled or banned Roko; instead, he banned all discussion of the subject. That doesn't even make sense as a "slashing out" reaction against Roko.

Comment author: RobbBB 07 October 2015 05:41:52PM 0 points [-]

It sounds like you have a different model of Eliezer (and of how well-targeted 'lashing out' usually is) than I do. But, like I said to V_V above:

According to Eliezer, he had three separate reasons for the original ban: (1) he didn't want any additional people (beyond the one Roko cited) to obsess over the idea and get nightmares; (2) he was worried there might be some variant on Roko's argument that worked, and he wanted more formal assurances that this wasn't the case; and (3) he was just outraged at Roko. (Including outraged at him for doing something Roko thought would put people at risk of torture.)

The point I was making wasn't that (2) had zero influence. It was that (2) probably had less influence than (3), and its influence was probably of the 'small probability of large costs' variety.

Comment author: shminux 07 October 2015 07:09:25AM 1 point [-]

Ah, OK, I agree then. Sorry I took the original quote out of context.

Comment author: RobbBB 07 October 2015 07:19:33AM 1 point [-]

Sure, not a problem.

Comment author: anon85 07 October 2015 12:44:15AM 0 points [-]

That even seems to be false in Eliezer's case, and Eliezer definitely isn't 'LessWrong'.

It seems we disagree on this factual issue. Eliezer does think there is a risk of acausal blackmail, or else he wouldn't have banned discussion of it.

Comment author: RobbBB 07 October 2015 07:17:53AM *  2 points [-]

Sorry, I'll be more concrete; "there's a serious risk" is really vague wording. What would surprise me greatly is if I heard that Eliezer assigned even a 5% probability to there being a realistic quick fix to Roko's argument that makes it work on humans. I think a larger reason for the ban was just that Eliezer was angry with Roko for trying to spread what Roko thought was an information hazard, and angry people lash out (even when it doesn't make a ton of strategic sense).

Comment author: shminux 07 October 2015 06:46:56AM -1 points [-]

What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before?

That's par for the course here. Philosophy, frequentism, non-MWI QM all get this treatment in the (original) sequences.

Comment author: RobbBB 07 October 2015 06:55:32AM *  3 points [-]

The full thing I said was:

What if your devastating take-down of string theory is intended for consumption by people who have never heard of 'string theory' before? Even if you're sure string theory is hogwash, then, you should be wary of giving the impression that the only people discussing string theory are the commenters on a recreational physics forum.

I wasn't saying that there's anything wrong with trying to convince random laypeople that specific academic ideas (including string theory and non-causal decision theories) are hogwash. That can be great; it depends on execution. My point was that it's bad to mislead people about how much mainstream academic acceptance an idea has, whether or not you're attacking the idea.

Comment author: anon85 06 October 2015 05:16:49PM *  0 points [-]

I'm not sure what your point is here. Would you mind re-phrasing? (I'm pretty sure I understand the history of Roko's Basilisk, so your explanation can start with that assumption.)

For example, someone who thinks LWers are overly panicky about AI and overly fixated on decision theory should still reject Auerbach's assumption that LWers are irrationally panicky about Newcomb's Problem or acausal blackmail; the one doesn't follow from the other.

My point was that LWers are irrationally panicky about acausal blackmail: they think Basilisks are plausible enough that they ban all discussion of them!

(Not all LWers, of course.)

Comment author: RobbBB 06 October 2015 11:29:20PM *  0 points [-]

If you're saying 'LessWrongers think there's a serious risk they'll be acausally blackmailed by a rogue AI', then that seems to be false. That even seems to be false in Eliezer's case, and Eliezer definitely isn't 'LessWrong'. If you're saying 'LessWrongers think acausal trade in general is possible,' then that seems true but I don't see why that's ridiculous.

Is there something about acausal trade in general that you're objecting to, beyond the specific problems with Roko's argument?

View more: Prev | Next