You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

VoiceOfRa comments on A few misconceptions surrounding Roko's basilisk - Less Wrong Discussion

39 Post author: RobbBB 05 October 2015 09:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

You are viewing a single comment's thread. Show more comments above.

Comment author: RobbBB 14 October 2015 04:41:38AM *  1 point [-]

This was addressed on the LessWrongWiki page; I didn't copy the full article here.

A few reasons Roko's argument doesn't work:

  • 1 - Logical decision theories are supposed to one-box on Newcomb's problem because it's globally optimal even though it's not optimal with respect to causally downstream events. A decision theory based on this idea could follow through on blackmail threats even when doing so isn't causally optimal, which appears to put past agents at risk of coercion by future agents. But such a decision theory also prescribes 'don't be the kind of agent that enters into trades that aren't globally optimal, even if the trade is optimal with respect to causally downstream events'. In other words, if you can bind yourself to precommitments to follow through on acausal blackmail, then it should also be possible to bind yourself to precommitments to ignore threats of blackmail.

The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category.

  • 2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice.

  • 3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent's probability of being created, all else being equal.

  • 4 - Insofar as it's "utilitarian" to horribly punish anyone who doesn't perfectly promote human flourishing, SIAI doesn't seem to have endorsed utilitarianism.

4 means that the argument lacks practical relevance. The idea of CEV doesn't build in very much moral philosophy, and it doesn't build in predictions about the specific dilemmas future agents might end up in.

Comment author: VoiceOfRa 14 October 2015 09:00:18PM 0 points [-]

Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty.

Um, your conclusion "since we're aware of this, we know any threat of blackmail would be empty" contradicts your premise that the AI by virtue of being super-intelligent is capable of fooling people into thinking it'll torture them.

Comment author: RobbBB 14 October 2015 10:05:44PM *  0 points [-]

One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas; but since we know it has this property and we know it prefers (D,C) over (C,C), we know it will defect. This is consistent because we're assuming the actual AI is powerful enough to trick people once it exists; this doesn't require the assumption that my low-fidelity mental model of the AI is powerful enough to trick me in the real world.

For acausal blackmail to work, the blackmailer needs a mechanism for convincing the blackmailee that it will follow through on its threat. 'I'm a TDT agent' isn't a sufficient mechanism, because a TDT agent's favorite option is still to trick other agents into cooperating in Prisoner's Dilemmas while they defect.

Comment author: VoiceOfRa 15 October 2015 08:30:38PM 1 point [-]

One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas

Except it needs to convince the people who are around before it exists.