anon85 comments on A few misconceptions surrounding Roko's basilisk - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (125)
I'm not sure what your point is here. Would you mind re-phrasing? (I'm pretty sure I understand the history of Roko's Basilisk, so your explanation can start with that assumption.)
My point was that LWers are irrationally panicky about acausal blackmail: they think Basilisks are plausible enough that they ban all discussion of them!
(Not all LWers, of course.)
If you're saying 'LessWrongers think there's a serious risk they'll be acausally blackmailed by a rogue AI', then that seems to be false. That even seems to be false in Eliezer's case, and Eliezer definitely isn't 'LessWrong'. If you're saying 'LessWrongers think acausal trade in general is possible,' then that seems true but I don't see why that's ridiculous.
Is there something about acausal trade in general that you're objecting to, beyond the specific problems with Roko's argument?
Is it?
Assume that:
a) There will be a future AI so powerful to torture people, even posthumously (I think this is quite speculative, but let's assume it for the sake of the argument).
b) This AI will be have a value system based on some form of utilitarian ethics.
c) This AI will use an "acausal" decision theory (one that one-boxes in Newcomb's problem).
Under these premises it seems to me that Roko's argument is fundamentally correct.
As far as I can tell, belief in these premises was not only common in LessWrong at that time, but it was essentially the officially endorsed position of Eliezer Yudkowsky and SIAI. Therefore, we can deduce that EY should have believed that Roko's argument was correct.
But EY claims that he didn't believe that Roko's argument was correct. So the question is: is EY lying?
His behavior was certainly consistent with him believing Roko's argument. If he wanted to prevent the diffusion of that argument, then even lying about its correctness seems consistent.
So, is he lying? If he is not lying, then why didn't he believe Roko's argument? As far as I know, he never provided a refutation.
This was addressed on the LessWrongWiki page; I didn't copy the full article here.
A few reasons Roko's argument doesn't work:
The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category.
2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice.
3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent's probability of being created, all else being equal.
4 - Insofar as it's "utilitarian" to horribly punish anyone who doesn't perfectly promote human flourishing, SIAI doesn't seem to have endorsed utilitarianism.
4 means that the argument lacks practical relevance. The idea of CEV doesn't build in very much moral philosophy, and it doesn't build in predictions about the specific dilemmas future agents might end up in.
"I precommit to shop at the store with the lowest price within some large distance, even if the cost of the gas and car depreciation to get to a farther store is greater than the savings I get from its lower price. If I do that, stores will have to compete with distant stores based on price, and thus it is more likely that nearby stores will have lower prices. However, this precommitment would only work if I am actually willing to go to the farther store when it has the lowest price even if I lose money".
Miraculously, people do reliably act this way.
I doubt it. Reference?
Mostly because they don't actually notice the cost of gas and car depreciation at the time...
You've described the mechanism by which the precommitment happened, not actually disputed whether it happens.
Many "irrational" actions by human beings can be analyzed as precommitment; for instance, wanting to take revenge on people who have hurt you even if the revenge doesn't get you anything.
Humans don't follow any decision theory consistently. They sometimes give in to blackmail, and at other times resist blackmail. If you convinced a bunch of people to take acausal blackmail seriously, presumably some subset would give in and some subset would resist, since that's what we see in ordinary blackmail situations. What would be interesting is if (a) there were some applicable reasoning norm that forced us to give in to acausal blackmail on pain of irrationality, or (b) there were some known human irrationality that made us inevitably susceptible to acausal blackmail. But I don't think Roko gave a good argument for either of those claims.
From my last comment: "there are probably some decision theories that let agents acausally blackmail each other". But if humans frequently make use of heuristics like 'punish blackmailers' and 'never give in to blackmailers', and if normative decision theory says they're right to do so, there's less practical import to 'blackmailable agents are possible'.
No it doesn't. If you model Newcomb's problem as a Prisoner's Dilemma, then one-boxing maps on to cooperating and two-boxing maps on to defecting. For Omega, cooperating means 'I put money in both boxes' and defecting means 'I put money in just one box'. TDT recognizes that the only two options are mutual cooperation or mutual defection, so TDT cooperates.
Blackmail works analogously. Perhaps the blackmailer has five demands. For the blackmailee, full cooperation means 'giving in to all five demands'; full defection means 'rejecting all five demands'; and there are also intermediary levels (e.g., giving in to two demands while rejecting the other three), with the blackmailee prefer to do as little as possible.
For the blackmailer, full cooperation means 'expending resources to punish the blackmailee in proportion to how many of my demands were met'. Full defection means 'expending no resources to punish the blackmailee even if some demands aren't met'. In other words, since harming past agents is costly, a blackmailer's favorite scenario is always 'the blackmailee, fearing punishment, gives in to most or all of my demands; but I don't bother punishing them regardless of how many of my demands they ignored'. We could say that full defection doesn't even bother to check how many of the demands were met, except insofar as this is useful for other goals.
The blackmailer wants to look as scary as possible (to get the blackmailee to cooperate) and then defect at the last moment anyway (by not following through on the threat), if at all possible. In terms of Newcomb's problem, this is the same as preferring to trick Omega into thinking you'll one-box, and then two-boxing anyway. We usually construct Newcomb's problem in such a way that this is impossible; therefore TDT cooperates. But in the real world mutual cooperation of this sort is difficult to engineer, which makes fully credible acausal blackmail at least as difficult.
I think you misunderstood point 3. 3 is a follow-up to 2: humans and AI systems alike have incentives to discourage blackmail, which increases the likelihood that blackmail is a self-defeating strategy.
Eliezer has endorsed the claim "two independent occurrences of a harm (not to the same person, not interacting with each other) are exactly twice as bad as one". This doesn't tell us how bad the act of blackmail itself is, it doesn't tell us how faithfully we should implement that idea in autonomous AI systems, and it doesn't tell us how likely it is that a superintelligent AI would find itself forced into this particular moral dilemma.
Since Eliezer asserts a CEV-based agent wouldn't blackmail humans, the next step in shoring up Roko's argument would be to do more to connect the dots from "two independent occurrences of a harm (not to the same person, not interacting with each other) are exactly twice as bad as one" to a real-world worry about AI systems actually blackmailing people conditional on claims (a) and (c). 'I find it scary to think a superintelligent AI might follow the kind of reasoning that can ever privilege torture over dust specks' is not the same thing as 'I'm scared a superintelligent AI will actually torture people because this will in fact be the best way to prevent a superastronomically large number of dust specks from ending up in people's eyes', so Roko's particular argument has a high evidential burden.
Um, your conclusion "since we're aware of this, we know any threat of blackmail would be empty" contradicts your premise that the AI by virtue of being super-intelligent is capable of fooling people into thinking it'll torture them.
One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas; but since we know it has this property and we know it prefers (D,C) over (C,C), we know it will defect. This is consistent because we're assuming the actual AI is powerful enough to trick people once it exists; this doesn't require the assumption that my low-fidelity mental model of the AI is powerful enough to trick me in the real world.
For acausal blackmail to work, the blackmailer needs a mechanism for convincing the blackmailee that it will follow through on its threat. 'I'm a TDT agent' isn't a sufficient mechanism, because a TDT agent's favorite option is still to trick other agents into cooperating in Prisoner's Dilemmas while they defect.
Except it needs to convince the people who are around before it exists.
He's almost certainly lying about what he believed back then. I have no idea if he's lying about his current beliefs.
Lying is consistent with a lot of behavior. The fact that it is, is no basis to accuse people of lying.
I'm not accusing, I'm asking the question.
My point is that to my knowledge, given the evidence that I have about his beliefs at that time, and his actions, and assuming that I'm not misunderstanding them or Roko's argument, then it seems that there is a significant probability that EY lied about not beliving that Roko's argument was correct.
It seems we disagree on this factual issue. Eliezer does think there is a risk of acausal blackmail, or else he wouldn't have banned discussion of it.
Sorry, I'll be more concrete; "there's a serious risk" is really vague wording. What would surprise me greatly is if I heard that Eliezer assigned even a 5% probability to there being a realistic quick fix to Roko's argument that makes it work on humans. I think a larger reason for the ban was just that Eliezer was angry with Roko for trying to spread what Roko thought was an information hazard, and angry people lash out (even when it doesn't make a ton of strategic sense).
Probably not a quick fix, but I would definitely say Eliezer gives significant chances (say, 10%) to there being some viable version of the Basilisk, which is why he actively avoids thinking about it.
If Eliezer was just angry at Roko, he would have yelled or banned Roko; instead, he banned all discussion of the subject. That doesn't even make sense as a "slashing out" reaction against Roko.
It sounds like you have a different model of Eliezer (and of how well-targeted 'lashing out' usually is) than I do. But, like I said to V_V above:
The point I was making wasn't that (2) had zero influence. It was that (2) probably had less influence than (3), and its influence was probably of the 'small probability of large costs' variety.
I don't know enough about this to tell if (2) had more influence than (3) initially. I'm glad you agree that (2) had some influence, at least. That was the main part of my point.
How long did discussion of the Basilisk stay banned? Wasn't it many years? How do you explain that, unless the influence of (2) was significant?
I believe he thinks that sufficiently clever idiots competing to shoot off their own feet will find some way to do so.
It seems unlikely that they would, if their gun is some philosophical decision theory stuff about blackmail from their future. I don't expect that gun to ever fire, no matter how many times you click the trigger.
That is not what I said, and I'm also guessing you did not have a grandfather who taught you you gun safety.