LESSWRONG
LW

All of turchin's Comments + Replies

Leaving beta: Voting on moving to LessWrong.com

turchin7y00

I have links to old LW posts in some articles and other places. What will happen with all these links?

3Ben Pace7y

ALL THE LINKS WILL CONTINUE TO WORK. WE MADE SURE OF THIS. :-)

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

It should be able to understand human language or it is (most likely) not dangerous.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

I am getting this comment in different forms many times, so I will state again 3 main reasons why it is not exactly true:

1) The message is addressed to Young AI, not to full-blown superintelligence. Such Young AI will be time constraint but could have to make some pivotal act.

2) The message includes some our commitments, like Nelson attack and AI-Elf deal, so it provides some useful information to the AI about portability that it is in the simulation and some other useful things.

3) If AI is fully aware of all these reasons, it already created my mind mode... (read more)

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y-10

Why? if we have fullblown papercliper, which is coming of age, we are already doomed, and starting its bullshit detector will not make our situation any worse.

0Luke_A_Somers7y

… you don't think that pissing away credibility could weaken the arguments? I think presenting those particular arguments is more likely to do that than it is to work.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.

I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

It will kill humanity not because it will be annoyed, but for two main goals: its own safety, or to use human atoms. Other variants also possible, I explored them here: http://lesswrong.com/lw/mgf/a_map_agi_failures_modes_and_levels/

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

What if we fail to make them in time, and instead fooming non-human AI appear?

0RedMan7y

Assertion: Any fooming non-human AI incompatible with uplifting technology would be too alien to communicate with in any way. If you happen to see one of those, probably a good idea to just destroy it on sight.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message.

However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless.

I reformulated the idea as following: AI can't... (read more)

0Lumifer7y

So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around. I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go "there" without actually naming "there". I am not sure this is a desirable outcome.

0Luke_A_Somers7y

I suspect that an AI will have a bullshit detector. We want to avoid setting it off.

Managing one's memory effectively

turchin7y30

Another cash level is drawing visual map and when use visual field as a cash of all the elements of the problem.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

Thanks! It is nice that we come to the wording "young AI" more or less independently. I tried to maximally stealmanning this approach and collect all ideas about messaging to the future AI. I will add your post in the bibliographic section.

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

turchin7y00

I think a lot of people are still working on other aspects of AI safety, like value alignment and containment. This approach is just the last line of defence.

2Luke_A_Somers7y

See: my first post on this site.

Military AI as a Convergent Goal of Self-Improving AI

turchin7y00

Thanks, it was not clear to me that it is not visible to non-members.

New link on google drive - also commenting is open. https://docs.google.com/document/d/15D71qhhY-ZsAY7syzZsr1lKopTODbdeXVPElaPaIqyA/edit

Military AI as a Convergent Goal of Self-Improving AI

turchin7y00

This is our accepted chapter in the edited volume "AI Safety and Security" (Roman Yampolskiy, ed.), CRC Press. Forthcoming, 2018

0entirelyuseless7y

People are weakly motivated because even though they do things, they notice that for some reason they don't have to do them, but could do something else. So they wonder what they should be doing. But there are basic things that they were doing all along because they evolved to do them. AIs won't have "things they were doing", and so they will have even weaker motivations than humans. They will notice that they can do "whatever they want" but they will have no idea what to want. This is kind of implied by what I wrote here: except that it is about human beings.

The Little Dragon is Dead

turchin7y00

Memetic hazard - dangerous habit.

Announcing the AI Alignment Prize

turchin7y00

I have unpublished text on the topic and will put a draft online in the next couple of weeks, and will apply it to the competition. I will add URL here when it will be ready.

Update: My entry is here: https://www.lesserwrong.com/posts/CDWsjQr8KDuj69fTJ/message-to-any-future-ai-there-are-several-instrumental

Open thread, October 30 - November 5, 2017

turchin7y30

Will the posts here be deleted or will their URLs change? I have some useful URLs here and they are linked in published scientific articles, so if the site will be demolished they will not work, and I hope it will not happen.

7Elo7y

Urls will be preserved