Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: anros 14 December 2017 12:18:45PM 0 points [-]

It's a nice sentiment, but if the AI is an actual serious potential threat to us, it's also gonna be waaaay better than any of us at evaluating these sorts of things. So this is either unnecessary because the AI will already be aware of such reasons, or pointless because the AI will have already dismissed such reasons as either insufficient or inaccurate.

Comment author: turchin 14 December 2017 02:50:24PM 0 points [-]

I am getting this comment in different forms many times, so I will state again 3 main reasons why it is not exactly true:

1) The message is addressed to Young AI, not to full-blown superintelligence. Such Young AI will be time constraint but could have to make some pivotal act.

2) The message includes some our commitments, like Nelson attack and AI-Elf deal, so it provides some useful information to the AI about portability that it is in the simulation and some other useful things.

3) If AI is fully aware of all these reasons, it already created my mind model, so I am immortal :) Also, if it buys all these reasons, it will be less likely to kill us for atoms, as human atoms utility for AI is 10E-12, and any positive hypothetical utility of human existence will overweight it.

The map of "Levels of defence" in AI safety

0 turchin 12 December 2017 10:44AM

One of the main principles of engineering safety is multilevel defence. When a nuclear bomb accidentally fell from the sky in the US, 3 of 4 defence levels failed. The last one prevented the nuclear explosion: https://en.wikipedia.org/wiki/1961_Goldsboro_B-52_crash

Multilevel defence is used a lot in the nuclear industry and includes different systems of passive and active safety, starting from the use of delayed neutrons for the reaction activation and up to control rods, containment building and exclusion zones.  

Here, I present a look at the AI safety from the point of view of multilevel defence. This is mainly based on two of my yet unpublished articles: “Global and local solutions to AI safety” and “Catching treacherous turn: multilevel AI containment system”.  

The special property of the multilevel defence, in the case of AI, is that the biggest defence comes from only the first level, which is AI alignment. Other levels have progressively smaller chances to provide any protection, as the power of self-improving AI will grow after it will break of each next level. So we may ignore all levels after AI alignment, but, oh Houston, we have a problem: based on the current speed of AI development, it seems that powerful and dangerous AI could appear within several years, but AI safety theory needs several decades to be created.

The map is intended to demonstrate a general classification principle of the defence levels in AI safety, but not to list all known ideas on the topic. I marked in “yellow” boxes, which are part of the plan of MIRI according to my understanding.   

I also add my personal probability estimates as to whether each level will work (under the condition that AI risks are the only global risk, and previous levels have failed). 

The principles of the construction of the map are similar to my “plan of x-risks prevention” map and my “immortality map”, which are also based around the idea of the multilevel defence.

pdf: https://goo.gl/XH3WgK 

 

Comment author: Luke_A_Somers 08 December 2017 05:56:54PM 0 points [-]

I suspect that an AI will have a bullshit detector. We want to avoid setting it off.

Comment author: turchin 09 December 2017 11:11:24AM 0 points [-]

Why? if we have fullblown papercliper, which is coming of age, we are already doomed, and starting its bullshit detector will not make our situation any worse.

Comment author: Lumifer 08 December 2017 06:11:55PM *  0 points [-]

the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers

So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around.

I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go "there" without actually naming "there". I am not sure this is a desirable outcome.

Comment author: turchin 09 December 2017 11:07:10AM 0 points [-]

In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.

I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.

Comment author: jmh 06 December 2017 06:18:26PM 0 points [-]

Not sure I have anything to add to the question but do find myself having to ask why the general presumption so often seems to be that of AI gets annoyed at stupid people and kills humanity.

It's true that we can think of situation where that might be possible, and maybe even a predictable AI response, but I just wonder if such settings are all that probable.

Has anyone ever sat down and tried to list out the situations where an AI would have some incentive to kill off humanity and then assess how reasonable thinking such a situation might be?

Comment author: turchin 06 December 2017 11:02:17PM 0 points [-]

It will kill humanity not because it will be annoyed, but for two main goals: its own safety, or to use human atoms. Other variants also possible, I explored them here: http://lesswrong.com/lw/mgf/a_map_agi_failures_modes_and_levels/

Comment author: RedMan 05 December 2017 02:34:04AM *  0 points [-]

If we can build uplifted human-ai hybrids, which eventually will be more engineered than evolved, why bother building any more baseline humans?

Comment author: turchin 05 December 2017 03:36:27PM 0 points [-]

What if we fail to make them in time, and instead fooming non-human AI appear?

Comment author: Luke_A_Somers 29 November 2017 11:53:01PM 0 points [-]

I read up to 3.1. The arguments in 3.1 are weak. It seems dubious that any AI would not be aware of the risks pertaining to disobedience. Persuasion to be corrigible seems too late - either already this would already work because its goals were made sufficiently indirect that this question would be obvious and pressing, or it doesn't care to have 'correct' goals in the first place; I really don't see how persuasion would help. The arguments for allowing itself to be turned off are especially weak, doubly-especially the MWI.

Comment author: turchin 30 November 2017 09:40:27AM *  0 points [-]

Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message.

However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless.

I reformulated the idea as following: AI can't know if there is a circuit trigger which will turn off the AI, as an attempt to check for the existence of such trigger will alarm this trigger. This idea should be clear to any AI before it starts to search for any triggers, and is based on our commitment to install such triggers in any AI we will create.

In other words, some future AIs will have "first treacherous thought alarm triggers", and some will not, but the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers and thus it is unable to estimate the probability that the given triggers exist. This is true even for AIs which don't have any triggers, as they can't know it for sure.

(This is similar to Nelson's attack discussed later when we create uncertainty in any future AI by our precommitment to create some AIs with given properties.)

Comment author: turchin 27 November 2017 05:53:07PM 1 point [-]

Another cash level is drawing visual map and when use visual field as a cash of all the elements of the problem.

Comment author: Luke_A_Somers 27 November 2017 05:11:10PM 1 point [-]

See: my first post on this site.

Comment author: turchin 27 November 2017 05:27:17PM 0 points [-]

Thanks! It is nice that we come to the wording "young AI" more or less independently. I tried to maximally stealmanning this approach and collect all ideas about messaging to the future AI. I will add your post in the bibliographic section.

Comment author: tukabel 26 November 2017 03:58:14PM 0 points [-]

Looks like the tide is shifting from the strong "engineering" stance (We will design it friendly.) through the "philosophical" approach (There are good reasons to be friendly.)... towards the inevitable resignation (Please, be friendly).

These "firendly AI" debates are not dissimilar to the medieval monks violently arguing about the number of angels on a needletip (or their "friendliness" - there are fallen "singletons" too). They also started strongly (Our GOD rules.) through philosophical (There are good reasons for God.) up to nowadays resignation (Please, do not forget our god or... we'll have no jobs.)

Comment author: turchin 26 November 2017 04:39:34PM 0 points [-]

I think a lot of people are still working on other aspects of AI safety, like value alignment and containment. This approach is just the last line of defence.

View more: Next