Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Luke_A_Somers comments on Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” - Less Wrong

0 Post author: turchin 25 November 2017 11:44AM

Comments (17)

You are viewing a single comment's thread.

Comment author: Luke_A_Somers 29 November 2017 11:53:01PM 0 points [-]

I read up to 3.1. The arguments in 3.1 are weak. It seems dubious that any AI would not be aware of the risks pertaining to disobedience. Persuasion to be corrigible seems too late - either already this would already work because its goals were made sufficiently indirect that this question would be obvious and pressing, or it doesn't care to have 'correct' goals in the first place; I really don't see how persuasion would help. The arguments for allowing itself to be turned off are especially weak, doubly-especially the MWI.

Comment author: turchin 30 November 2017 09:40:27AM *  0 points [-]

Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message.

However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless.

I reformulated the idea as following: AI can't know if there is a circuit trigger which will turn off the AI, as an attempt to check for the existence of such trigger will alarm this trigger. This idea should be clear to any AI before it starts to search for any triggers, and is based on our commitment to install such triggers in any AI we will create.

In other words, some future AIs will have "first treacherous thought alarm triggers", and some will not, but the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers and thus it is unable to estimate the probability that the given triggers exist. This is true even for AIs which don't have any triggers, as they can't know it for sure.

(This is similar to Nelson's attack discussed later when we create uncertainty in any future AI by our precommitment to create some AIs with given properties.)

Comment author: Lumifer 08 December 2017 06:11:55PM *  0 points [-]

the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers

So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around.

I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go "there" without actually naming "there". I am not sure this is a desirable outcome.

Comment author: turchin 09 December 2017 11:07:10AM 0 points [-]

In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.

I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.

Comment author: Luke_A_Somers 08 December 2017 05:56:54PM 0 points [-]

I suspect that an AI will have a bullshit detector. We want to avoid setting it off.

Comment author: turchin 09 December 2017 11:11:24AM 0 points [-]

Why? if we have fullblown papercliper, which is coming of age, we are already doomed, and starting its bullshit detector will not make our situation any worse.

Comment author: Luke_A_Somers 12 December 2017 01:41:29AM 0 points [-]

… you don't think that pissing away credibility could weaken the arguments? I think presenting those particular arguments is more likely to do that than it is to work.