LESSWRONG
LW

All of Luke_A_Somers's Comments + Replies

Why Bayesians should two-box in a one-shot

If you find an Omega, then you are in an environment where Omega is possible. Perhaps we are all simulated and QM is optional. Maybe we have easily enough determinism in our brains that Omega can make predictions, much as quantum mechanics ought to in some sense prevent predicting where a cannonball will fly but in practice does not. Perhaps it's a hypothetical where we're AI to begin with so deterministic behavior is just to be expected.

Why Bayesians should two-box in a one-shot

Luke_A_Somers7y00

I think the more relevant case is when the random noise is imperceptibly small. Of course you two-box if it's basically random.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

Luke_A_Somers7y00

… you don't think that pissing away credibility could weaken the arguments? I think presenting those particular arguments is more likely to do that than it is to work.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

Luke_A_Somers7y00

I suspect that an AI will have a bullshit detector. We want to avoid setting it off.

-1turchin7y

Why? if we have fullblown papercliper, which is coming of age, we are already doomed, and starting its bullshit detector will not make our situation any worse.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

Luke_A_Somers7y00

I read up to 3.1. The arguments in 3.1 are weak. It seems dubious that any AI would not be aware of the risks pertaining to disobedience. Persuasion to be corrigible seems too late - either already this would already work because its goals were made sufficiently indirect that this question would be obvious and pressing, or it doesn't care to have 'correct' goals in the first place; I really don't see how persuasion would help. The arguments for allowing itself to be turned off are especially weak, doubly-especially the MWI.

0turchin7y

Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message. However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless. I reformulated the idea as following: AI can't know if there is a circuit trigger which will turn off the AI, as an attempt to check for the existence of such trigger will alarm this trigger. This idea should be clear to any AI before it starts to search for any triggers, and is based on our commitment to install such triggers in any AI we will create. In other words, some future AIs will have "first treacherous thought alarm triggers", and some will not, but the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers and thus it is unable to estimate the probability that the given triggers exist. This is true even for AIs which don't have any triggers, as they can't know it for sure. (This is similar to Nelson's attack discussed later when we create uncertainty in any future AI by our precommitment to create some AIs with given properties.)

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

Luke_A_Somers7y20

See: my first post on this site.

0turchin7y

Thanks! It is nice that we come to the wording "young AI" more or less independently. I tried to maximally stealmanning this approach and collect all ideas about messaging to the future AI. I will add your post in the bibliographic section.

Fables grow around missed natural experiments

Luke_A_Somers7y00

What do you mean by natural experiment, here? And what was the moral, anyway?

0MaryCh7y

The moral was that it is wrong to use an obviously false claim to prove wrong something nobody believes in anyway... by NE, I mean "something so awfully outside of everyday experience that either it is totally made up, or a scientifically-minded person should look into it and see where it leads".