VoiceOfRa comments on A few misconceptions surrounding Roko's basilisk - Less Wrong

39 Post author: RobbBB 05 October 2015 09:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 10 October 2015 03:12:56PM *  0 points [-]

If you're saying 'LessWrongers think there's a serious risk they'll be acausally blackmailed by a rogue AI', then that seems to be false. That even seems to be false in Eliezer's case,

Is it?

Assume that:
a) There will be a future AI so powerful to torture people, even posthumously (I think this is quite speculative, but let's assume it for the sake of the argument).
b) This AI will be have a value system based on some form of utilitarian ethics.
c) This AI will use an "acausal" decision theory (one that one-boxes in Newcomb's problem).

Under these premises it seems to me that Roko's argument is fundamentally correct.

As far as I can tell, belief in these premises was not only common in LessWrong at that time, but it was essentially the officially endorsed position of Eliezer Yudkowsky and SIAI. Therefore, we can deduce that EY should have believed that Roko's argument was correct.

But EY claims that he didn't believe that Roko's argument was correct. So the question is: is EY lying?

His behavior was certainly consistent with him believing Roko's argument. If he wanted to prevent the diffusion of that argument, then even lying about its correctness seems consistent.

So, is he lying? If he is not lying, then why didn't he believe Roko's argument? As far as I know, he never provided a refutation.

Comment author: RobbBB 14 October 2015 04:41:38AM *  1 point [-]

This was addressed on the LessWrongWiki page; I didn't copy the full article here.

A few reasons Roko's argument doesn't work:

  • 1 - Logical decision theories are supposed to one-box on Newcomb's problem because it's globally optimal even though it's not optimal with respect to causally downstream events. A decision theory based on this idea could follow through on blackmail threats even when doing so isn't causally optimal, which appears to put past agents at risk of coercion by future agents. But such a decision theory also prescribes 'don't be the kind of agent that enters into trades that aren't globally optimal, even if the trade is optimal with respect to causally downstream events'. In other words, if you can bind yourself to precommitments to follow through on acausal blackmail, then it should also be possible to bind yourself to precommitments to ignore threats of blackmail.

The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category.

  • 2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice.

  • 3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent's probability of being created, all else being equal.

  • 4 - Insofar as it's "utilitarian" to horribly punish anyone who doesn't perfectly promote human flourishing, SIAI doesn't seem to have endorsed utilitarianism.

4 means that the argument lacks practical relevance. The idea of CEV doesn't build in very much moral philosophy, and it doesn't build in predictions about the specific dilemmas future agents might end up in.

Comment author: VoiceOfRa 14 October 2015 09:00:18PM 0 points [-]

Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty.

Um, your conclusion "since we're aware of this, we know any threat of blackmail would be empty" contradicts your premise that the AI by virtue of being super-intelligent is capable of fooling people into thinking it'll torture them.

Comment author: RobbBB 14 October 2015 10:05:44PM *  0 points [-]

One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas; but since we know it has this property and we know it prefers (D,C) over (C,C), we know it will defect. This is consistent because we're assuming the actual AI is powerful enough to trick people once it exists; this doesn't require the assumption that my low-fidelity mental model of the AI is powerful enough to trick me in the real world.

For acausal blackmail to work, the blackmailer needs a mechanism for convincing the blackmailee that it will follow through on its threat. 'I'm a TDT agent' isn't a sufficient mechanism, because a TDT agent's favorite option is still to trick other agents into cooperating in Prisoner's Dilemmas while they defect.

Comment author: VoiceOfRa 15 October 2015 08:30:38PM 1 point [-]

One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas

Except it needs to convince the people who are around before it exists.