Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The map of "Levels of defence" in AI safety

0 turchin 12 December 2017 10:44AM

One of the main principles of engineering safety is multilevel defence. When a nuclear bomb accidentally fell from the sky in the US, 3 of 4 defence levels failed. The last one prevented the nuclear explosion: https://en.wikipedia.org/wiki/1961_Goldsboro_B-52_crash

Multilevel defence is used a lot in the nuclear industry and includes different systems of passive and active safety, starting from the use of delayed neutrons for the reaction activation and up to control rods, containment building and exclusion zones.  

Here, I present a look at the AI safety from the point of view of multilevel defence. This is mainly based on two of my yet unpublished articles: “Global and local solutions to AI safety” and “Catching treacherous turn: multilevel AI containment system”.  

The special property of the multilevel defence, in the case of AI, is that the biggest defence comes from only the first level, which is AI alignment. Other levels have progressively smaller chances to provide any protection, as the power of self-improving AI will grow after it will break of each next level. So we may ignore all levels after AI alignment, but, oh Houston, we have a problem: based on the current speed of AI development, it seems that powerful and dangerous AI could appear within several years, but AI safety theory needs several decades to be created.

The map is intended to demonstrate a general classification principle of the defence levels in AI safety, but not to list all known ideas on the topic. I marked in “yellow” boxes, which are part of the plan of MIRI according to my understanding.   

I also add my personal probability estimates as to whether each level will work (under the condition that AI risks are the only global risk, and previous levels have failed). 

The principles of the construction of the map are similar to my “plan of x-risks prevention” map and my “immortality map”, which are also based around the idea of the multilevel defence.

pdf: https://goo.gl/XH3WgK 

 

[Link] Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

0 turchin 25 November 2017 11:44AM

[Link] Military AI as a Convergent Goal of Self-Improving AI

0 turchin 13 November 2017 11:25AM

Beauty as a signal (map)

4 turchin 12 October 2017 10:02AM

This is my new map, in which female beauty is presented as a signal which moves from woman to man through different mediums and amplifiers. pdf

Mini-conference "Near-term AI safety"

4 turchin 11 October 2017 03:19PM

TL;DR: The event will be in Moscow, Russia, and near-term risks of AI will be discussed. The main language will be Russian, but Jonatan Yan will speak in English from HK. English presentations will be uploaded later on the FB page of the group "Near-term AI safety." Speakers: S. Shegurin, A. Turchin, Jonathan Yan. The event's FB page is here.

In the last five years, artificial intelligence has developed at a much faster pace in connection with the success of neural network technologies. If we extrapolate these trends, AI near-human level may appear in the next five to ten years, and there is a significant probability that this will lead to a global catastrophe. At a one-day conference at the Kocherga rationalist club, we'll look at how recent advances in the field of neural networks are changing our estimates of the timing of the creation of AGI, and what global catastrophes are possible in connection with the emergence of an increasingly strong AI. A special guest of the program Jonathan Yan is Hong Kong will tell (in English, via Skype) the latest research data on this topic.

The language of the conference: the first two reports in Russian, and the report Yan in English without translation, the discussion after it in English.

Registration: on the event page on Facebook.

Place: rationalist club "Kocherga", mail hall, Bolshaya Dorogomilovskaya ul., 5 корпус 2.

Participation in tariffs anticafe, 2.5 rubles a minute, coffee for free.

Videobroadcast will be on the Facebook.

 

Program:

October 14, Saturday 15.00 - the beginning.

15.00 - Shegurin Sergey. "Is it possible to create a human level AI in the next 10 years?"

16.00 - Turchin Alexey. "The next 10 years: the global risks of AI before the creation of the superintelligence"

17.00 - Jonathan Yan. "Recent Developments Towards AGI & Why It's Nearer Than You Think (in English)"

17.40 - Discussion

 

 

Mini map of s-risks

2 turchin 08 July 2017 12:33PM
S-risks are risks of future global infinite sufferings. Foundational research institute suggested them as the most serious class of existential risks, even more serious than painless human extinction. So it is time to explore types of s-risks and what to do about them.

Possible causes and types of s-risks:
"Normal Level" - some forms of extreme global suffering exist now, but we ignore them:
1. Aging, loss of loved ones, moral illness, infinite sufferings, dying, death and non-existence - for almost everyone, because humans are mortal
2. Nature as a place of suffering, where animals constantly eat each other. Evolution as superintelligence, which created suffering and using it for its own advance.

Colossal level:
1. Quantum immortality creates bad immortality - I survived as old, but always dying person, because of weird observation selection.
2. AI goes wrong. 2.1 Rocobasilisk 2.2. Error in programming 2.3. Hacker's joke 2.4 Indexical blackmail.
3. Two AIs go in war with each other, and one of them is benevolent to human, so another AI tortures humans to get bargain position in the future deal.
4. X-risks, which includes infinite suffering for everyone - natural pandemic, cancer epidemic etc
5. Possible worlds (in Lewis terms) with infinite sufferings qualia in them. For any human a possible world with his infinite sufferings exist. Modal realism makes them real.

Ways to fight s-risks:
1. Ignore them by boxing personal identity inside today
2. Benevolent AI fights "measure war" to create infinitely more copies of happy beings, as well as trajectories in the space of the possible minds from sufferings to happiness

Types of most intensive sufferings:

Qualia based, listed from bad to worse:
1. Eternal, but bearable in each moment suffering (Anhedonia)
2. Unbearable sufferings - sufferings, to which death is the preferable outcome (cancer, death in fire, death by hanging). However, as said Mark Aurelius: “Unbearable pain kills. If it not kills, it is bearable"
3. Infinite suffering - qualia of the infinite pain, so the duration doesn’t matter (not known if it exists)
4. Infinitely growing eternal sufferings, created by constant upgrade of the suffering’s subject (hypothetical type of sufferings created by malevolent superintelligence)

Value based s-risks:
1. Most violent action against one’s main values: like "brutal murder of children”
2. Meaninglessness, acute existential terror or derealisation with depression (Nabokov’s short story “Terror”) - incurable and logically proved understanding of meaningless of life
3. Death and non-existence are forms of counter-value sufferings.

Time-based:
1. Infinite time without happiness.

Subjects, who may suffer from s-risks:

1. Anyone as individual person
2. Currently living human population
3. Future generation of humans
4. Sapient beings
5. Animals
6. Computers, neural nets with reinforcement learning, robots and AIs.
7. Aliens
8. Unembodied sufferings in stones, Boltzmann brains, pure qualia etc.

My position

It is important to prevent s-risks, but not by increasing probability of human extinction, as it would mean that we already fail victims of blackmail by non-existence things.

Also s-risk is already default outcome for anyone personally (so it is global), because of inevitable aging and death (and may be bad quantum immortality).

People prefer the illusive certainty of non-existence - to hypothetical possibility of infinite sufferings. But nothing is certain after death.

The same way overestimating of the animal suffering results in the underestimating of the human sufferings and risks of human extinction. But animals are more suffering in the forests than in the animal farms, where they are feed every day, get basic healthcare, there no predators, who will eat them alive etc.

The hopes are wrong that we will prevent future infinite sufferings if we stop progress or commit suicide on the personal or civilzational level. It will not help animals. It will not help in sufferings in the possible world. It even will not prevent sufferings after death, if quantum immortality in some form is true.

But the fear of infinite sufferings makes us vulnerable to any type of the “acausal" blackmail. The only way to fight sufferings in possible worlds is to create an infinitely larger possible world with happiness.


[Link] Verifier Theory and Unverifiability

1 turchin 08 February 2017 10:40AM

The map of agents which may create x-risks

2 turchin 13 October 2016 11:17AM

Recently Phil Torres wrote an article  where he raises a new topic in existential risks research: the question about who could be possible agents in the creation of a global catastrophe. Here he identifies five main types of agents, and two main reasons why they will create a catastrophe (error and terror).  

He discusses the following types of agents: 

 

(1) Superintelligence. 

(2) Idiosyncratic actors.  

(3) Ecoterrorists.  

(4) Religious terrorists.  

(5) Rogue states.  

 

Inspired by his work I decided to create a map of all possible agents as well as their possible reasons for creating x-risks. During this work some new ideas appeared.  

I think that a significant addition to the list of agents should be superpowers, as they are known to have created most global risks in the 20th century; corporations, as they are now on the front line of AGI creation; and pseudo-rational agents who could create a Doomsday weapon in the future to use for global blackmail (may be with positive values), or who could risk civilization’s fate for their own benefits (dangerous experiments). 

The X-risks prevention community could also be an agent of risks if it fails to prevent obvious risks, or if it uses smaller catastrophes to prevent large risks, or if it creates new dangerous ideas of possible risks which could inspire potential terrorists.  

The more technology progresses, the more types of agents will have access to dangerous technologies, even including teenagers. (like: "Why This 14-Year-Old Kid Built a Nuclear Reactor” ) 

In this situation only the number of agents with risky tech will matter, not the exact motivations of each one. But if we are unable to control tech, we could try to control potential agents or their “medium" mood at least. 

The map shows various types of agents, starting from non-agents, and ending with types of agential behaviors which could result in catastrophic consequences (error, terror, risk etc). It also shows the types of risks that are more probable for each type of agent. I think that my explanation in each case should be self evident. 

We could also show that x-risk agents will change during the pace of technological progress. In the beginning there are no agents, and later there are superpowers, and then smaller and smaller agents, until there will be millions of people with biotech labs at home. In the end there will be only one agent - SuperAI.  

So, a lessening the number of agents, and increasing their ”morality” and intelligence seem to be the most plausible directions in lowering risks. Special organizations or social networks may be created to control the most risky type of agents. Differing agents probably need differing types of control. Some ideas of this agent-specific control are listed in the map, but a real control system should be much more complex and specific.

The map shows many agents, some of them real and exist now (but don’t have dangerous capabilities), and some are only possible in moral sense or in technical sense.

 

So there are 4 types of agents, and I show them in the map in different colours:

 

1) Existing and dangerous, that is already having technology to destroy the humanity. That is superpowers, arrogant scientists – Red

2) Existing, and willing to end the world, but lacking needed technologies. (ISIS, VHEMt) - Yellow

3) Morally possible, but don’t existing. We could imagine logically consistent value systems which may result in human extinction. That is Doomsday blackmail. - Green

4) Agents, which will pose risk only after supertechnologies appear, like AI-hackers, children biohackers. - Blue

 

Many agents types are not fit for this classification so I rest them white in the map. 

 

The pdf of the map is here: http://immortality-roadmap.com/agentrisk11.pdf

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

 

 

 

 

 

The map of organizations, sites and people involved in x-risks prevention

6 turchin 07 October 2016 12:04PM

Three known attempts to make a map of x-risks prevention in the field of science exist:

1. First is the list from the Global Catastrophic Risks Institute in 2012-2013, and many links there are already not working:

2. The second was done by S. Armstrong in 2014

3. And the most beautiful and useful map was created by Andrew Critch. But its ecosystem ignores organizations which have a different view of the nature of global risks (that is, they share the value of x-risks prevention, but have another world view).

In my map I have tried to add all currently active organizations which share the value of global risks prevention.

It also regards some active independent people as organizations, if they have an important blog or field of research, but not all people are mentioned in the map. If you think that you (or someone) should be in it, please write to me at alexei.turchin@gmail.com

I used only open sources and public statements to learn about people and organizations, so I can’t provide information on the underlying net of relations.

I tried to give all organizations a short description based on its public statement and also my opinion about its activity. 

In general it seems that all small organizations are focused on their collaboration with larger ones, that is MIRI and FHI, and small organizations tend to ignore each other; this is easily explainable from the social singnaling theory. Another explanation is that larger organizations have a great ability to make contacts.

It also appears that there are several organizations with similar goal statements. 

It looks like the most cooperation exists in the field of AI safety, but most of the structure of this cooperation is not visible to the external viewer, in contrast to Wikipedia, where contributions of all individuals are visible. 

It seems that the community in general lacks three things: a united internet forum for public discussion, an x-risks wikipedia and an x-risks related scientific journal.

Ideally, a forum should be used to brainstorm ideas, a scientific journal to publish the best ideas, peer review them and present them to the outer scientific community, and a wiki to collect results.

Currently it seems more like each organization is interested in creating its own research and hoping that someone will read it. Each small organization seems to want to be the only one to present the solutions to global problems and gain full attention from the UN and governments. It raises the problem of noise and rivalry; and also raises the problem of possible incompatible solutions, especially in AI safety.

The pdf is here: http://immortality-roadmap.com/riskorg5.pdf

Fermi paradox of human past, and corresponding x-risks

6 turchin 01 October 2016 05:01PM

Based on known archaeological data, we are the first technological and symbol-using civilisation on Earth (but not the first tool-using species). 
This leads to an analogy that fits Fermi’s paradox: Why are we the first civilisation on Earth? For example, flight was invented by evolution independently several times. 
We could imagine that on our planet, many civilisations appeared and also became extinct, and based on mediocre principles, we should be somewhere in the middle. For example, if 10 civilisations appeared, we have only a 10 per cent chance of being the first one.

The fact that we are the first such civilisation has strong predictive power about our expected future: it lowers the probability that there will be any other civilisations on Earth, including non-humans or even a restarting of human civilisation from scratch. It is because, if there will be many civiizations, we should not find ourselves to be the first one (It is some form of Doomsday argument, the same logic is used in Bostrom's article “Adam and Eve”).

If we are the only civilisation to exist in the history of the Earth, then we will probably become extinct not in mild way, but rather in a way which will prevent any other civilisation from appearing. There is higher probability of future (man-made) catastrophes which will not only end human civilisation, but also prevent any existence of any other civilisations on Earth.

Such catastrophes would kill most multicellular life. Nuclear war or pandemic is not that type of a catastrophe. The catastrophe must be really huge: such as irreversible global warming, grey goo or black hole in a collider.

Now, I will list possible explanations of the Fermi paradox of human past and corresponding x-risks implications:

 

1. We are the first civilisation on Earth, because we will prevent the existence of any future civilisations.

If our existence prevents other civilisations from appearing in the future, how could we do it? We will either become extinct in a very catastrophic way, killing all earthly life, or become a super-civilisation, which will prevent other species from becoming sapient. So, if we are really the first, then it means that "mild extinctions" are not typical for human style civilisations. Thus, pandemics, nuclear wars, devolutions and everything reversible are ruled out as main possible methods of human extinction.

If we become a super-civilisation, we will not be interested in preserving biosphera, as it will be able to create new sapient species. Or, it may be that we care about biosphere so strongly, that we will hide very well from new appearing sapient species. It will be like a cosmic zoo. It means that past civilisations on Earth may have existed, but decided to hide all traces of their existence from us, as it would help us to develop independently. So, the fact that we are the first raises the probability of a very large scale catastrophe in the future, like UFAI, or dangerous physical experiments, and reduces chances of mild x-risks such as pandemics or nuclear war. Another explanation is that any first civilisation exhausts all resources which are needed for a technological civilisation restart, such as oil, ores etc. But, in several million years most such resources will be filled again or replaced by new by tectonic movement.

 

2. We are not the first civilisation.

2.1. We didn't find any traces of a previous technological civilisation, yet based on what we know, there are very strong limitations for their existence. For example, every civilisation makes genetic marks, because it moves animals from one continent to another, just as humans brought dingos to Australia. It also must exhaust several important ores, create artefacts, and create new isotopes. We could be sure that we are the first tech civilisation on Earth in last 10 million years.

But, could we be sure for the past 100 million years? Maybe it was a very long time ago, like 60 million years ago (and killed dinosaurs). Carl Sagan argued that it could not have happened, because we should find traces mostly as exhausted oil reserves. The main counter argument here is that cephalisation, that is the evolutionary development of the brains, was not advanced enough 60 millions ago, to support general intelligence. Dinosaurian brains were very small. But, bird’s brains are more mass effective than mammalians. All these arguments in detail are presented in this excellent article by Brian Trent “Was there ever a dinosaurian civilisation”? 

The main x-risks here are that we will find dangerous artefacts from previous civilisation, such as weapons, nanobots, viruses, or AIs. And, if previous civilisations went extinct, it increases the chances that it is typical for civilisations to become extinct. It also means that there was some reason why an extinction occurred, and this killing force may be still active, and we could excavate it. If they existed recently, they were probably hominids, and if they were killed by a virus, it may also affect humans.

2.2. We killed them. Maya civilisation created writing independently, but Spaniards destroy their civilisation. The same is true for Neanderthals and Homo Florentines.

2.3. Myths about gods may be signs of such previous civilisation. Highly improbable.

2.4. They are still here, but they try not to intervene in human history. So, it is similar to Fermi’s Zoo solution.

2.5. They were a non-tech civilisation, and that is why we can’t find their remnants.

2.6 They may be still here, like dolphins and ants, but their intelligence is non-human and they don’t create tech.

2.7 Some groups of humans created advanced tech long before now, but prefer to hide it. Highly improbable as most tech requires large manufacturing and market.

2.8 Previous humanoid civilisation was killed by virus or prion, and our archaeological research could bring it back to life. One hypothesis of Neanderthal extinction is prionic infection because of cannibalism. The fact is - several hominid species went extinct in the last several million years.

 

3. Civilisations are rare

Millions of species existed on Earth, but only one was able to create technology. So, it is a rare event.Consequences: cyclic civilisations on earth are improbable. So the chances that we will be resurrected by another civilisation on Earth is small.

The chances that we will be able to reconstruct civilisation after a large scale catastrophe, are also small (as such catastrophes are atypical for civilisations and they quickly proceed to total annihilation or singularity).

It also means that technological intelligence is a difficult step in the evolutionary process, so it could be one of the solutions of the main Fermi paradox.

Safety of remains of previous civilisations (if any exist) depends on two things: the time distance from them and their level of intelligence. The greater the distance, the safer they are (as the biggest part of dangerous technology will be destructed by time or will not be dangerous to humans, like species specific viruses).

The risks also depend on the level of intelligence they reached: the higher intelligence the riskier. If anything like their remnants are ever found, strong caution is recommend.

For example, the most dangerous scenario for us will be one similar to the beginning of the book of V. Vinge “A Fire upon the deep.” We could find remnants of a very old, but very sophisticated civilisation, which will include unfriendly AI or its description, or hostile nanobots.

The most likely place for such artefacts to be preserved is on the Moon, in some cavities near the pole. It is the most stable and radiation shielded place near Earth.

I think that based on (no) evidence, estimation of the probability of past tech civilisation should be less than 1 per cent. While it is enough to think that they most likely don’t exist, it is not enough to completely ignore risk of their artefacts, which anyway is less than 0.1 per cent.

Meta: the main idea for this post came to me in a night dream, several years ago.

View more: Next