Estimating the probability of human extinction
I'm looking for feedback on the following idea. The article from which it's been excerpted can be found here: http://ieet.org/index.php/IEET/more/torres20120213
"But not only has the number of scenarios increased in the past 71 years, many riskologists believe that the probability of a global disaster has also significantly risen. Whereas the likelihood of annihilation for most of our species’ history was extremely low, Nick Bostrom argues that “setting this probability lower than 25% [this century] would be misguided, and the best estimate may be considerably higher.” Similarly, Sir Martin Rees claims that a civilization-destroying event before the year 02100 is as likely as getting a “heads” after flipping a coin. These are only two opinions, of course, but to paraphrase the Russell-Einstein Manifesto, my experience confirms that those who know the < most tend to be the most gloomy.
"I [would] argue that Rees’ figure is plausible. To adapt a maxim from the philosopher David Hume, wise people always proportion their fears to the best available evidence, and when one honestly examines this evidence, one finds that there really is good reason for being alarmed. But I also offer a novel — to my knowledge — argument for why we may be systematically underestimating the overall likelihood of doom. In sum, just as a dog can’t possibly comprehend any of the natural and anthropogenic risks mentioned above, so too could there be risks that forever lie beyond our epistemic reach. All biological brains have intrinsic limitations that constrain the library of concepts to which one has access. And without concepts, one can’t mentally represent the external world. It follows that we could be “cognitively closed” to a potentially vast number of cosmic risks that threaten us with total annihilation. This being said, one might argue that such risks, if they exist at all, must be highly improbable, since Earth-originating life has existed for some 3.5 billion years without an existential catastrophe having happened. But this line of reasoning is deeply flawed: it fails to take into account that the only worlds in which observers like us could find ourselves are ones in which such a catastrophe has never occurred. It follows that a record of past survival on our planetary spaceship provides no useful information about the probability of certain existential disasters happening in the future. The facts of cognitive closure plus the observation selection effect suggest that our probability conjectures of total annihilation may be systematically underestimated, perhaps by a lot."
Thoughts?
Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife
This is a bit rough, but I think that it is an interesting and potentially compelling idea. To keep this short, and accordingly increase the number of eyes over it, I have only sketched the bare bones of the idea.
1) Empirically, people have varying intuitions and beliefs about causality, particularly in Newcomb-like problems (http://wiki.lesswrong.com/wiki/Newcomb's_problem, http://philpapers.org/surveys/results.pl, and https://en.wikipedia.org/wiki/Irresistible_grace).
2) Also, as an empirical matter, some people believe in taking actions after the fact, such as one-boxing, or Calvinist “irresistible grace”, to try to ensure or conform with a seemingly already determined outcome. This might be out of a sense of retrocausality, performance, moral honesty, etc. What matters is that we know that they will act it out, despite it violating common sense causality. There has been some great work on decision theory on LW about trying to thread this needle well.
3) The second disjunct of the simulation argument (http://wiki.lesswrong.com/wiki/Simulation_argument) shows that the decision making of humanity is evidentially relevant in what our subjective credence should be that we are in a simulation. That is to say, if we are actively headed toward making simulations, we should increase our credence of being in a simulation, if we are actively headed away from making simulations, through either existential risk or law/policy against it, we should decrease our credence.
4) Many, if not most, people would like for there to be a pleasant afterlife after death, especially if we could be reunited with loved ones.
5) There is no reason to believe that simulations which are otherwise nearly identical copies of our world, could not contain, after the simulated bodily death of the participants, an extremely long-duration, though finite, "heaven"-like afterlife shared by simulation participants.
6) Our heading towards creating such simulations, especially if they were capable of nesting simulations, should increase credence that we exist in such a simulation and should perhaps expect a heaven-like afterlife of long, though finite, duration.
7) Those who believe in alternative causality, or retrocausality, in Newcomb-like situations should be especially excited about the opportunity to push the world towards surviving, allowing these types of simulations, and creating them, as it would potentially suggest, analogously, that if they work towards creating simulations with heaven-like afterlives, that they might in some sense be “causing” such a heaven to exist for themselves, and even for friends and family who have already died. Such an idea of life-after-death, and especially for being reunited with loved ones, can be extremely compelling.
8) I believe that people matching the above description, that is, holding both an intuition in alternative causality, and finding such a heaven-like-afterlife compelling, exist. Further, the existence of such people, and their associated motivation to try to create such simulations, should increase the credence even of two-boxing types, that we already live in such a world with a heaven-like afterlife. This is because knowledge of a motivated minority desiring simulations should increase credence in the likely success of simulations. This is essentially showing that “this probably happened before, one level up” from the two-box perspective.
9) As an empirical matter, I also think that there are people who would find the idea of creating simulations with heaven-like afterlives compelling, even if they are not one-boxers, from a simply altruistic perspective, both since it is a nice thing to do for the future sim people, who can, for example, probabilistically have a much better existence than biological children on earth can, and as it is a nice thing to do to increase the credence (and emotional comfort) of both one-boxers and two-boxers in our world thinking that there might be a life after death.
10) This creates the opportunity for a secular movement in which people work towards creating these simulations, and use this work and potential success in order to derive comfort and meaning from their life. For example, making donations to a simulation-creating or promoting, or existential threat avoiding, think-tank after a loved one’s death, partially symbolically, partially hopefully.
11) There is at least some room for Pascalian considerations even for two-boxers who allow for some humility in their beliefs. Nozick believed one-boxers will become two boxers if Box A is raised to 900,000, and two-boxers will become one-boxers if Box A is lowered to $1. Similarly, trying to work towards these simulations, even if you do not find it altruistically compelling, and even if you think that the odds of alternative or retrocausality is infinitesimally small, might make sense in that the reward could be extremely large, including potentially trillions of lifetimes worth of time spent in an afterlife “heaven” with friends and family.
Finally, this idea might be one worth filling in (I have been, in my private notes for over a year, but am a bit shy to debut that all just yet, even working up the courage to post this was difficult) if only because it is interesting, and could be used as a hook to get more people interested in existential risk, including the AI control problem. This is because existential catastrophe is probably the best enemy of credence in the future of such simulations, and accordingly in our reasonable credence in thinking that we have such a heaven awaiting us after death now. A short hook headline like “avoiding existential risk is key to afterlife” can get a conversation going. I can imagine Salon, etc. taking another swipe at it, and in doing so, creating publicity which would help in finding more similar minded folks to get involved in the work of MIRI, FHI, CEA etc. There are also some really interesting ideas about acausal trade, and game theory between higher and lower worlds, as a form of “compulsion” in which they punish worlds for not creating heaven containing simulations (therefore effecting their credence as observers of the simulation), in order to reach an equilibrium in which simulations with heaven-like afterlives are universal, or nearly universal. More on that later if this is received well.
Also, if anyone would like to join with me in researching, bull sessioning, or writing about this stuff, please feel free to IM me. Also, if anyone has a really good, non-obvious pin with which to pop my balloon, preferably in a gentle way, it would be really appreciated. I am spending a lot of energy and time on this if it is fundamentally flawed in some way.
Thank you.
*******************************
November 11 Updates and Edits for Clarification
1) There seems to be confusion about what I mean by self-location and credence. A good way to think of this is the Sleeping Beauty Problem (https://wiki.lesswrong.com/wiki/Sleeping_Beauty_problem)
If I imagine myself as Sleeping Beauty (and who doesn’t?), and I am asked on Sunday what my credence is that the coin will be tails, I will say 1/2. If I am awakened during the experiment without being told which day it is and am asked what my credence is that the coin was tails, I will say 2/3. If I am then told it is Monday, I will update my credence to ½. If I am told it is Tuesday I update my credence to 1. If someone asks me two days after the experiment about my credence of it being tails, if I somehow do not know the days of the week still, I will say ½. Credence changes with where you are, and with what information you have. As we might be in a simulation, we are somewhere in the “experiment days” and information can help orient our credence. As humanity potentially has some say in whether or not we are in a simulation, information about how humans make decisions about these types of things can and should effect our credence.
Imagine Sleeping Beauty is a lesswrong reader. If Sleeping Beauty is unfamiliar with the simulation argument, and someone asks her about her credence of being in a simulation, she probably answers something like 0.0000000001% (all numbers for illustrative purposes only). If someone shows her the simulation argument, she increases to 1%. If she stumbles across this blog entry, she increases her credence to 2%, and adds some credence to the additional hypothesis that it may be a simulation with an afterlife. If she sees that a ton of people get really interested in this idea, and start raising funds to build simulations in the future and to lobby governments both for great AI safeguards and for regulation of future simulations, she raises her credence to 4%. If she lives through the AI superintelligence explosion and simulations are being built, but not yet turned on, her credence increases to 20%. If humanity turns them on, it increases to 50%. If there are trillions of them, she increases her credence to 60%. If 99% of simulations survive their own run-ins with artificial superintelligence and produce their own simulations, she increases her credence to 95%.
2) This set of simulations does not need to recreate the current world or any specific people in it. That is a different idea that is not necessary to this argument. As written the argument is premised on the idea of creating fully unique people. The point would be to increase our credence that we are functionally identical in type to the unique individuals in the simulation. This is done by creating ignorance or uncertainty in simulations, so that the majority of people similarly situated, in a world which may or may not be in a simulation, are in fact in a simulation. This should, in our ignorance, increase our credence that we are in a simulation. The point is about how we self-locate, as discussed in the original article by Bostrom. It is a short 12-page read, and if you have not read it yet, I would encourage it: http://simulation-argument.com/simulation.html. The point about past loved ones I was making was to bring up the possibility that the simulations could be designed to transfer people to a separate after-life simulation where they could be reunited after dying in the first part of the simulation. This was not about trying to create something for us to upload ourselves into, along with attempted replicas of dead loved ones. This staying-in-one simulation through two phases, a short life, and relatively long afterlife, also has the advantage of circumventing the teletransportation paradox as “all of the person" can be moved into the afterlife part of the simulation.
Simulations Map: what is the most probable type of the simulation in which we live?
There is a chance that we may be living in a computer simulation created by an AI or a future super-civilization. The goal of the simulations map is to depict an overview of all possible simulations. It will help us to estimate the distribution of other multiple simulations inside it along with their measure and probability. This will help us to estimate the probability that we are in a simulation and – if we are – the kind of simulation it is and how it could end.
Simulation argument
The simulation map is based on Bostrom’s simulation argument. Bostrom showed that that “at least one of the following propositions is true:
(1) the human species is very likely to go extinct before reaching a “posthuman” stage;
(2) any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof);
(3) we are almost certainly living in a computer simulation”. http://www.simulation-argument.com/simulation.html
The third proposition is the strongest one, because (1) requires that not only human civilization but almost all other technological civilizations should go extinct before they can begin simulations, because non-human civilizations could model human ones and vice versa. This makes (1) extremely strong universal conjecture and therefore very unlikely to be true. It requires that all possible civilizations will kill themselves before they create AI, but we can hardly even imagine such a universal course. If destruction is down to dangerous physical experiments, some civilizations may live in universes with different physics; if it is down to bioweapons, some civilizations would have enough control to prevent them.
In the same way, (2) requires that all super-civilizations with AI will refrain from creating simulations, which is unlikely.
Feasibly there could be some kind of universal physical law against the creation of simulations, but such a law is impossible, because some kinds of simulations already exist, for example human dreaming. During human dreaming very precise simulations of the real world are created (which can’t be distinguished from the real world from within – that is why lucid dreams are so rare). So, we could conclude that after small genetic manipulations it is possible to create a brain that will be 10 times more capable of creating dreams than an ordinary human brain. Such a brain could be used for the creation of simulations and strong AI surely will find more effective ways of doing it. So simulations are technically possible (and qualia is no problem for them as we have qualia in dreams).
Any future strong AI (regardless of whether it is FAI or UFAI) should run at least several million simulations in order to solve the Fermi paradox and to calculate the probability of the appearance of other AIs on other planets, and their possible and most typical goal systems. AI needs this in order to calculate the probability of meeting other AIs in the Universe and the possible consequences of such meetings.
As a result a priory estimation of me being in a simulation is very high, possibly 1000000 to 1. The best chance of lowering this estimation is to find some flaws in the argument, and possible flaws are discussed below.
Most abundant classes of simulations
If we live in a simulation, we are going to be interested in knowing the kind of simulation it is. Probably we belong to the most abundant class of simulations, and to find it we need a map of all possible simulations; an attempt to create one is presented here.
There are two main reasons for simulation domination: goal and price. Some goals require the creation of very large number of simulations, so such simulations will dominate. Also cheaper and simpler simulations are more likely to be abundant.
Eitan_Zohar suggested http://lesswrong.com/r/discussion/lw/mh6/you_are_mostly_a_simulation/ that FAI will deliberately create an almost infinite number of simulations in order to dominate the total landscape and to ensure that most people will find themselves inside FAI controlled simulations, which will be better for them as in such simulations unbearable suffering can be excluded. (If in the infinite world an almost infinite number of FAIs exist, each of them could not change the landscape of simulation distribution, because its share in all simulations would be infinitely small. So we need a casual trade between an infinite number of FAIs to really change the proportion of simulations. I can't say that it is impossible, but it may be difficult.)
Another possible largest subset of simulations is the one created for leisure and for the education of some kind of high level beings.
The cheapest simulations are simple, low-resolution and me-simulations (one real actor, with the rest of the world around him like a backdrop), similar to human dreams. I assume here that simulations are distributed as the same power law as planets, cars and many other things: smaller and cheaper ones are more abundant.
Simulations could also be laid on one another in so-called Matryoshka simulations where one simulated civilization is simulating other civilizations. The lowest level of any Matryoshka system will be the most populated. If it is a Matryoshka simulation, which consists of historical simulations, the simulation levels in it will be in descending time order, for example the 24th century civilization models the 23rd century one, which in turn models the 22nd century one, which itself models the 21st century simulation. A simulation in a Matryoshka will end on the level where creation of the next level is impossible. The beginning of 21st century simulations will be the most abundant class in Matryoshka simulations (similar to our time period.)
Argument against simulation theory
There are several possible objections against the Simulation argument, but I find them not strong enough to do it.
1. Measure
The idea of measure was introduced to quantify the extent of the existence of something, mainly in quantum universe theories. While we don’t know how to actually measure “the measure”, the idea is based on intuition that different observers have different powers of existence, and as a result I could find myself to be one of them with a different probability. For example, if we have three functional copies of me, one of them is the real person, another is a hi-res simulation and the third one is low-res simulation, are my chances of being each of them equal (1/3)?
The «measure» concept is the most fragile element of all simulation arguments. It is based mostly on the idea that all copies have equal measure. But perhaps measure also depends on the energy of calculations. If we have a computer which is using 10 watts of energy to calculate an observer, it may be presented as two parallel computers which are using five watts each. These observers may be divided again until we reach the minimum amount of energy required for calculations, which could be called «Plank observer». In this case our initial 10 watt computer will be equal to – for example – one billion plank observers.
And here we see a great difference in the case of simulations, because simulation creators have to spend less energy on calculations (or it would be easier to make real world experiments). But in this case such simulations will have a lower measure. But if the total number of all simulations is large, then the total measure of all simulations will still be higher than the measure of real worlds. But if most real worlds end with global catastrophe, the result would be an even higher proportion of real worlds which could outweigh simulations after all.
2. Universal AI catastrophe
One possible universal global catastrophe could happen where a civilization develops an AI-overlord, but any AI will meet some kind of unresolvable math and philosophical problems which will terminate it at its early stages, before it can create many simulations. See an overview of this type of problem in my map “AI failures level”.
3. Universal ethics
Another idea is that all AIs converge to some kind of ethics and decision theory which prevent them from creating simulations, or they create p-zombie simulations only. I am skeptical about that.
4. Infinity problems
If everything possible exists or if the universe is infinite (which are equal statements) the proportion between two infinite sets is meaningless. We could overcome this conjecture using the idea of mathematical limit: if we take a bigger universe and longer periods of time, the simulations will be more and more abundant within them.
But in all cases, in the infinite universe any world exists an infinite number of times, and this means that my copies exist in real worlds an infinite number of times, regardless of whether I am in a simulation or not.
5. Non-uniform measure over Universe (actuality)
Contemporary physics is based on the idea that everything that exists, exists in equal sense, meaning that the Sun and very remote stars have the same measure of existence, even in casually separated regions of the universe. But if our region of space-time is somehow more real, it may change simulation distribution which will favor real worlds.
6. Flux universe
The same copies of me exist in many different real and simulated worlds. In simple form it means that the notion that “I am in one specific world” is meaningless, but the distribution of different interpretations of the world is reflected in the probabilities of different events.
E.g. the higher the chances that I am in a simulation, the bigger the probability that I will experience some kind of miracles during my lifetime. (Many miracles almost prove that you are in simulation, like flying in dreams.) But here correlation is not causation.
The stronger version of the same principle implies that I am one in many different worlds, and I could manipulate the probability of finding myself in a set of possible worlds, basically by forgetting who I am and becoming equal to a larger set of observers. It may work without any new physics, it only requires changing the number of similar observers, and if such observers are Turing computer programs, they could manipulate their own numbers quite easily.
Higher levels of flux theory do require new physics or at least quantum mechanics in a many worlds interpretation. In it different interpretations of the world outside of the observer could interact with each other or experience some kind of interference.
See further discussion about a flux universe here: http://lesswrong.com/lw/mgd/the_consequences_of_dust_theory/
7. Bolzmann brains outweigh simulations
It may turn out that BBs outweigh both real worlds and simulations. This may not be a problem from a planning point of view because most BBs correspond to some real copies of me.
But if we take this approach to solve the BBs problem, we will have to use it in the simulation problem as well, meaning: "I am not in a simulation because for any simulation, there exists a real world with the same “me”. It is counterintuitive.
Simulation and global risks
Simulations may be switched off or may simulate worlds which are near global catastrophe. Such worlds may be of special interest for future AI because they help to model the Fermi paradox and they are good for use as games.
Miracles in simulations
The map also has blocks about types of simulation hosts, about many level simulations, plus ethics and miracles in simulations.
The main point about simulation is that it disturbs the random distribution of observers. In the real world I would find myself in mediocre situations, but simulations are more focused on special events and miracles (think about movies, dreams and novels). The more interesting my life is, the less chance that it is real.
If we are in simulation we should expect more global risks, strange events and miracles, so being in a simulation is changing our probability expectation of different occurrences.
This map is parallel to the Doomsday argument map.
Estimations given in the map of the number of different types of simulation or required flops are more like place holders, and may be several orders of magnitude higher or lower.
I think that this map is rather preliminary and its main conclusions may be updated many times.
The pdf of the map is here, and jpg is below.
Previous posts with maps:
A map: AI failures modes and levels
A Roadmap: How to Survive the End of the Universe
A map: Typology of human extinction risks
Roadmap: Plan of Action to Prevent Human Extinction Risks

Nick Bostrom's TED talk on Superintelligence is now online
http://www.ted.com/talks/nick_bostrom_what_happens_when_our_computers_get_smarter_than_we_are
Artificial intelligence is getting smarter by leaps and bounds — within this century, research suggests, a computer AI could be as "smart" as a human being. And then, says Nick Bostrom, it will overtake us: "Machine intelligence is the last invention that humanity will ever need to make." A philosopher and technologist, Bostrom asks us to think hard about the world we're building right now, driven by thinking machines. Will our smart machines help to preserve humanity and our values — or will they have values of their own?
I realize this might go into a post in a media thread, rather than its own topic, but it seems big enough, and likely-to-prompt-discussion enough, to have its own thread.
I liked the talk, although it was less polished than TED talks often are. What was missing I think was any indication of how to solve the problem. He could be seen as just an ivory tower philosopher speculating on something that might be a problem one day, because apart from mentioning in the beginning that he works with mathematicians and IT guys, he really does not give an impression that this problem is already being actively worked on.
[Link] Will Superintelligent Machines Destroy Humanity?
A summary and review of Bostrom's Superintelligence is in the December issue of Reason magazine, and is now posted online at Reason.com.
Goal retention discussion with Eliezer
Although I feel that Nick Bostrom’s new book “Superintelligence” is generally awesome and a well-needed milestone for the field, I do have one quibble: both he and Steve Omohundro appear to be more convinced than I am by the assumption that an AI will naturally tend to retain its goals as it reaches a deeper understanding of the world and of itself. I’ve written a short essay on this issue from my physics perspective, available at http://arxiv.org/pdf/1409.0813.pdf.
give you, some we can't, few have been written up and even fewer in any
well-organized way. Benja or Nate might be able to expound in more detail
while I'm in my seclusion.
Very briefly, though:
The problem of utility functions turning out to be ill-defined in light of
new discoveries of the universe is what Peter de Blanc named an
"ontological crisis" (not necessarily a particularly good name, but it's
what we've been using locally).
http://intelligence.org/files/OntologicalCrises.pdf
The way I would phrase this problem now is that an expected utility
maximizer makes comparisons between quantities that have the type
"expected utility conditional on an action", which means that the AI's
utility function must be something that can assign utility-numbers to the
AI's model of reality, and these numbers must have the further property
that there is some computationally feasible approximation for calculating
expected utilities relative to the AI's probabilistic beliefs. This is a
constraint that rules out the vast majority of all completely chaotic and
uninteresting utility functions, but does not rule out, say, "make lots of
paperclips".
Models also have the property of being Bayes-updated using sensory
information; for the sake of discussion let's also say that models are
about universes that can generate sensory information, so that these
models can be probabilistically falsified or confirmed. Then an
"ontological crisis" occurs when the hypothesis that best fits sensory
information corresponds to a model that the utility function doesn't run
on, or doesn't detect any utility-having objects in. The example of
"immortal souls" is a reasonable one. Suppose we had an AI that had a
naturalistic version of a Solomonoff prior, a language for specifying
universes that could have produced its sensory data. Suppose we tried to
give it a utility function that would look through any given model, detect
things corresponding to immortal souls, and value those things. Even if
the immortal-soul-detecting utility function works perfectly (it would in
fact detect all immortal souls) this utility function will not detect
anything in many (representations of) universes, and in particular it will
not detect anything in the (representations of) universes we think have
most of the probability mass for explaining our own world. In this case
the AI's behavior is undefined until you tell me more things about the AI;
an obvious possibility is that the AI would choose most of its actions
based on low-probability scenarios in which hidden immortal souls existed
that its actions could affect. (Note that even in this case the utility
function is stable!)
Since we don't know the final laws of physics and could easily be
surprised by further discoveries in the laws of physics, it seems pretty
clear that we shouldn't be specifying a utility function over exact
physical states relative to the Standard Model, because if the Standard
Model is even slightly wrong we get an ontological crisis. Of course
there are all sorts of extremely good reasons we should not try to do this
anyway, some of which are touched on in your draft; there just is no
simple function of physics that gives us something good to maximize. See
also Complexity of Value, Fragility of Value, indirect normativity, the
whole reason for a drive behind CEV, and so on. We're almost certainly
going to be using some sort of utility-learning algorithm, the learned
utilities are going to bind to modeled final physics by way of modeled
higher levels of representation which are known to be imperfect, and we're
going to have to figure out how to preserve the model and learned
utilities through shifts of representation. E.g., the AI discovers that
humans are made of atoms rather than being ontologically fundamental
humans, and furthermore the AI's multi-level representations of reality
evolve to use a different sort of approximation for "humans", but that's
okay because our utility-learning mechanism also says how to re-bind the
learned information through an ontological shift.
This sorta thing ain't going to be easy which is the other big reason to
start working on it well in advance. I point out however that this
doesn't seem unthinkable in human terms. We discovered that brains are
made of neurons but were nonetheless able to maintain an intuitive grasp
on what it means for them to be happy, and we don't throw away all that
info each time a new physical discovery is made. The kind of cognition we
want does not seem inherently self-contradictory.
Three other quick remarks:
*) Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
The Omohundrian/Yudkowskian argument is not that we can take an arbitrary
stupid young AI and it will be smart enough to self-modify in a way that
preserves its values, but rather that most AIs that don't self-destruct
will eventually end up at a stable fixed-point of coherent
consequentialist values. This could easily involve a step where, e.g., an
AI that started out with a neural-style delta-rule policy-reinforcement
learning algorithm, or an AI that started out as a big soup of
self-modifying heuristics, is "taken over" by whatever part of the AI
first learns to do consequentialist reasoning about code. But this
process doesn't repeat indefinitely; it stabilizes when there's a
consequentialist self-modifier with a coherent utility function that can
precisely predict the results of self-modifications. The part where this
does happen to an initial AI that is under this threshold of stability is
a big part of the problem of Friendly AI and it's why MIRI works on tiling
agents and so on!
*) Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
It built humans to be consequentialists that would value sex, not value
inclusive genetic fitness, and not value being faithful to natural
selection's optimization criterion. Well, that's dumb, and of course the
result is that humans don't optimize for inclusive genetic fitness.
Natural selection was just stupid like that. But that doesn't mean
there's a generic process whereby an agent rejects its "purpose" in the
light of exogenously appearing preference criteria. Natural selection's
anthropomorphized "purpose" in making human brains is just not the same as
the cognitive purposes represented in those brains. We're not talking
about spontaneous rejection of internal cognitive purposes based on their
causal origins failing to meet some exogenously-materializing criterion of
validity. Our rejection of "maximize inclusive genetic fitness" is not an
exogenous rejection of something that was explicitly represented in us,
that we were explicitly being consequentialists for. It's a rejection of
something that was never an explicitly represented terminal value in the
first place. Similarly the stability argument for sufficiently advanced
self-modifiers doesn't go through a step where the successor form of the
AI reasons about the intentions of the previous step and respects them
apart from its constructed utility function. So the lack of any universal
preference of this sort is not a general obstacle to stable
self-improvement.
*) The case of natural selection does not illustrate a universal
computational constraint, it illustrates something that we could
anthropomorphize as a foolish design error. Consider humans building Deep
Blue. We built Deep Blue to attach a sort of default value to queens and
central control in its position evaluation function, but Deep Blue is
still perfectly able to sacrifice queens and central control alike if the
position reaches a checkmate thereby. In other words, although an agent
needs crystallized instrumental goals, it is also perfectly reasonable to
have an agent which never knowingly sacrifices the terminally defined
utilities for the crystallized instrumental goals if the two conflict;
indeed "instrumental value of X" is simply "probabilistic belief that X
leads to terminal utility achievement", which is sensibly revised in the
presence of any overriding information about the terminal utility. To put
it another way, in a rational agent, the only way a loose generalization
about instrumental expected-value can conflict with and trump terminal
actual-value is if the agent doesn't know it, i.e., it does something that
it reasonably expected to lead to terminal value, but it was wrong.
This has been very off-the-cuff and I think I should hand this over to
Nate or Benja if further replies are needed, if that's all right.
[link] [poll] Future Progress in Artificial Intelligence
Vincent Müller and Nick Bostrom have just released a paper surveying the results of a poll of experts about future progress in artificial intelligence. The authors have also put up a companion site where visitors can take the poll and see the raw data. I just checked the site and so far only one individual has submitted a response. This provides an opportunity for testing the views of LW members against those of experts. So if you are willing to complete the questionnaire, please do so before reading the paper. (I have abstained from providing a link to the pdf to create a trivial inconvenience for those who cannot resist temptaion. Once you take the poll, you can easily find the paper by conducting a Google search with the keywords: bostrom muller future progress artificial intelligence.)
Bostrom versus Transcendence
Nick Bostrom takes on the facts, the fictions and the speculations in the movie Transcendence:

Could you upload Johnny Depp's brain? Oxford Professor on Transcendence
How soon until machine intelligence? Oxford professor on Transcendence
Would you have warning before artificial superintelligence? Oxford professor on Transcendence
Oxford professor on Transcendence: how could you get a machine intelligence?
Does the universe contain a friendly artificial superintelligence?
First and foremost, let's give a definition of "friendly artificial superintelligence" (from now on, FASI). A FASI is a computer system that:
- is capable to deduct, reason and solve problems
- helps human progress, is incapable to harm anybody and does not allow anybody to come to any kind of harm
- is so much more intelligent than any human that it has developed molecular nanotechnology by itself, making it de facto omnipotent
In order to find an answer to this question, we must check whether our observations on the universe match with what we would observe if the universe did, indeed, contain a FASI.
If, somewhere in another solar system, an alien civilization had already developed a FASI, it would be reasonable to presume that, sooner or later, one or more members of that civilization would ask it to make them omnipotent. The FASI, being friendly by definition, would not refuse. [1]
It would also make sure that anybody who becomes omnipotent is also rendered incapable to harm anybody and incapable to allow anybody to come to any kind of harm.
The new omnipotent beings would also do the same to anybody who asks them to become omnipotent. It would be a short time before they use their omnipotence to leave their own solar system, meet other intelligent civilizations and make them omnipotent too.
In short, the ultimate consequence of the appearance of a FASI would be that every intelligent being in the universe would become omnipotent. This does not match with our observations, so we must conclude that a FASI does not exist anywhere in the universe.
[1] We must assume that a FASI would not just reply "You silly creature, becoming omnipotent is not in your best interest so I will not make you omnipotent because I know better" (or an equivalent thereof). If we did, we would implicitly consider the absence of omnipotent beings as evidence for the presence of a FASI. This would force us to consider the eventual presence of omnipotent beings as evidence for the absence of a FASI, which would not make sense.
Based on this conclusion, let's try to answer another question: is our universe a computer simulation?
According to Nick Bostrom, if even just one civilization in the universe
- survives long enough to enter a posthuman stage, and
- is interested to create "ancestor simulations"
then the probability that we are living in one is extremely high.
However, if a civilization did actually reach a posthuman stage where it can create ancestor simulations, it would also be advanced enough to create a FASI.
If a FASI existed in such a universe, the cheapest way it would have to make anybody else omnipotent would be to create a universe simulation that does not differ substantially from our universe, except for the presence of an omnipotent simulacrum of the individual who asked to be made omnipotent in our universe. Every subsequent request of omnipotence would result in another simulation being created, containing one more omnipotent being. Any eventual simulation where those beings are not omnipotent would be deactivated: keeping it running would lead to the existence of a universe where a request of omnipotence has not been granted, which would go against the modus operandi of the FASI.
Thus, any simulation of a universe containing even just one friendly omnipotent being would always progress to a state where every intelligent being is omnipotent. Again, this does not match with our observations. Since we had already concluded that a FASI does not exist in our universe, we must come to the further conclusion that our universe is not a computer simulation.
Update on establishment of Cambridge’s Centre for Study of Existential Risk
Transcription and Summary of Nick Bostrom's Q&A
INTRO: From the original posting by Stuart_Armstrong:
Underground Q&A session with Nick Bostrom (http://www.nickbostrom.com) on existential risks and artificial intelligence with the Oxford Transhumanists (recorded 10 October 2011).
Below I (will) have a summary of the Q&A followed by the transcription. The transcription is slightly edited, mainly for readability. The numbers are minute markers. Anything followed by a (?) means I don't know quite what he said (example- attruing(?) program), but if you figure it out, let me know!
SUMMARY: I'll have a summary here by end of the day, probably.
TRANSCRIPTION:
Nick: I wanted to just [interact with your heads]. Any questions, really, that you have. To discuss with you. I can say what I’m working on right now which is this book on super-intelligence, not so much on the question of whether and how long it might take to develop machine intelligence that equals human intelligence, but rather what happens if and when that occurs. To forget human level machine intelligence, how quickly, how explosively will we get super-intelligenct, and how can you solve the control problem. If you build super-intelligence how can you make sure it will do what you want. That it will be safe and beneficial.
Once one starts to pull on that problem, it turns out to be quite complicated and difficult. That it has many aspects to it that I would be happy to talk about. Or if you prefer to talk about other things; existential risks, or otherwise, I’d be happy to do that as well. But no presentation, just Q&A. So you all have to provide at least the stimulus. So should I take questions or do you want…
[00:01]
Questioner: So what’s your definition of machine intelligence or super-intellegence AI… Is there like a precise definition there?
Nick: There isn’t. Now if you look at domain specific intelligence, there are already areas where machines surpass humans, such doing arithmetical calculations or chess. I think the interesting point is when machines equal humans in general intelligence or perhaps slightly more specifically in engineering intelligence. So if you had this general capability of being able to program creatively and design new systems... There is in a sense a point at which if you had sufficient capability of that sort, you have general capability.
Because if you can build new systems, even if all it could initially do is this type engineering work, you can build yourself a poetry module or build yourself a social skills module, if you have that general ability to build . So it might be that general intelligence or it might be that slightly more narrow version of that engineering type of intelligence is the key variable to look at. That’s the kind of thing that can unleash the rest. But “human-level intelligence”... that’s a vague term, and I think it’s important to understand that. It’s not necessarily the natural kind.
[00:03]
Questioner: Got a question that maybe should have waited til the end: There are two organizations, FHI and SIAI, working on this. Let's say I thought this was the most important problem in the world, and I should be donating money to this. Who should I give it to?
Nick:
It's good. We've come to the chase!
I think there is a sense that both organizations are synergistic. If one were about to go under or something like that, that would probably be the one. If both were doing well, it's... different people will have different opinions. We work quite closely with a lot of the folks from SIAI.
There is an advantage to having one academic platform and one outside academia. There are different things these types of organizations give us. If you want to get academics to pay more attention to this, to get postdocs to work on this, that's much easier to do within academia; also to get the ear of policy-makers and media.
On the other hand, for SIAI there might be things that are easier for them to do. More flexibility, they're not embedded in a big bureaucracy. So they can more easily hire people with non-standard backgrounds without the kind of credentials that we would usually need, and also more grass-roots stuff like the community blog Less Wrong, is easier to do.
So yeah. I'll give the non-answer answer to that question.
[00:05]
Questioner: Do you think a biological component is necessary for an artificial intelligence to achieve sentience or something equivalent?
Nick: It doesn’t seem that that should be advantageous…If you go all the way back to atoms, it doesn’t seem to matter that it’s carbon rather than silicon atoms. Then you could wonder, instead of having the same atoms you run a simulation of everything that’s going on. Would you have to simulate biological processes? I don’t even think that’s necessary.
My guess (and Im not sure about this, I don’t have an official position or even a theory about what exactly the criteria are that would make a system conscious)…But my intuition is that If you replicated computational processes that goes on in a human brain, at a sufficient level of detail, where that sufficient level of detail might be roughly on the level of individual neurons and synapses, I think you would likely have consciousness. And it might be that it’s something weaker than that which would suffice. Maybe you wouldn’t need every neuron. Maybe you could simplify things and still have consciousness. But at least at that level it seems likely.
It’s a lot harder to say if you had very alien types of mental architecture. Something that wasn’t a big neural network but of normal machine intelligence that performs very well in a certain way, but using a very different method than a human brain. Whether that would be conscious as well? Much less sure. A limiting case would be a big lookup table that was physically impossible to realize, but you can imagine having every sort of situation possible described, and that program would run through until it found the situation that matched its current memory and observation and would read off which action it should perform. But that would be an extremely alien type of architecture. But would that have conscious experience or not? Even less clear. It might be that it would not have, but maybe the process of generating this giant look-up table would generate kinds of experiences that you wouldn’t get from actually implementing it or something like that. (?)
[00:07]
Questioner- This relates to AI being dangerous. It seems to me that while it would certainly be interesting if we were to get AI that were much more intelligent than a human being, its not necessarily dangerous.
Even if the AI is very intelligent it might be hard for it to get resources for it to actually do anything to be able to manufacture extra hardware or anything like that. There are obviously situations where you can imagine intelligence or Creative thinking can get you out of or get you further capability . So..
Nick: I guess it’s useful to identify two cases: One is sort of the default case unless we successfully implement some sort of safeguard or engineer it in a particular way in order to avoid dangers …So let’s think of a default just a bit: You have something that is super intelligent and capable of improving itself to even more levels of super intelligence…. I guess one way to get initial possibility of why this is dangerous is to think about why humans are powerful.. Why are we dominant on this planet? It’s not because we have stronger muscles or our teeth are sharper or we have special poison glands. It’s all because of our brains, which have enabled us to develop a lot of other technologies that give us in effect muscles that are stronger than the other animals…We have bulldozers and external devices and all the other things. And also, it enables us to coordinate socially and build up complicated society so we can act as groups. And all of this makes us supreme on this planet. We can argue with the case of bacteria which have their own domains where they rule. But certainly in the case of the larger mammals we are unchallenged because of our brains.
And the brains are not all that different from the brains of other animals. It might be that all these advantages we have are due to a few tweaks on some parameters that occurred in our ancestors a couple million years ago. And these tiny changes in the nature our intelligence that had these huge affects. So just prima facie it then seems possible that that if the system surpassed us by just a small amount that we surpass chimpanzees, it could lead to a similar kind of advantage in power. And if they exceeded our intelligence by a much greater margin, then all of that could happen in a more dramatic fashion
It’s true that you could have in principle an AI that was locked in a box, such that it would be incapable of affecting anything outside the box and in that sense it would be weak. That might be one of the safety methods one tries to apply that I've been thinking about.
Broadly speaking you can distinguish between two different approaches to solving the control problem, of making sure that super-intelligence, if it’s built wouldn’t cause harm. On one hand you have capability control measures where you try to limit what the AI is able to do. The most obvious example would be lock it in a box and limit its ability to interact with the rest of the world.
The other class of approach would be motivation selection methods where you would try to control what it wants to do. Where you build it in such a way that even if it has the power to do all this bad stuff, it would choose not to. But so far, there isn’t one method or even a combination of methods that it seems we can currently be fully convinced would work. There’s a lot more work needed...
[00:11]
Questioner: Human beings have been very successful. One feature of that that has been very crucial are our hands that have enabled us to get a start on working tools and so on. Even if an AI is running on some computer somewhere, that would be more analogous to a very intelligent creature which doesn’t have very good hands. It’s very hard for it to actually DO anything.
Maybe the in-the-box method is promising. Because if we just don’t give the AI hands, some way to actually do something..If all it can do is alter its own code, and maybe communicate infomationally. That seems...
Nick: So let’s be careful there… So clearly it’s not “hands” per se. If it didn’t have hands it could still be very dangerous, because there are other people with hands, that it could persuade to do its bidding. It might be that it has no direct effectors other than the ability to type very slowly, and then some human gatekeeper could read and choose to act on or not. Even that limited ability to affect the world might be sufficient if it had a super power in the domain of persuasion. So if it had an engineering super-power, it might then get all these other superpowers. And then if it were able to, in particular be a super skilled persuader, it could then get other accessories outside our system that could implement its designs.
You might have heard of this guy, Eliezer Yudkowsky, about 5 years back who ran a series of role playing exercises...The idea was one person should play the AI, pretend to be in a box. The other should play the human gatekeeper whose job was to not let the AI out of the box, but he has to talk with the AI for a couple of hours over the internet chat. This experiment was run five times, with EY playing the AI and different people playing the human gatekeeper. And for the most part people, who were intitially convinced that they would never let the AI out of the box, but in 3 of 5 cases, the experiment ended with the gatekeepers announcing yes, they would let the AI out of the box.
This experiment was run under conditions that neither party would be allowed to disclose the methods that were used, the main conversational sequence...sorta maintain a shroud of mystery. But this is where the human-level persuader has two hours to work on the human gatekeeper. It seems reasonable to be doubtful of the ability of humanity to keep the super-intelligent persuader in the box, indefinitely, for that reason.
[00:15]
Questioner: How hard do you think the idea of controlling the mentality of intelligence is, with something at least as intelligent as us, considering how hard it is to convince humans to act in a certain civilized way of life?
Nick: So humans sort of start out with a motivation system and then you can try to persuade them or structure incentives to behave in a certain way. But they don’t start out with a tabula rasa where you get to write in what a human’s values should be. So that’s made a difference. In the case of the super-intelligence of course once it already has unfriendly values and it has sufficient power, it will resist any attempt to corrupt its goal system as it would see it.
[00:16]
Questioner: You don’t think that like us, its experiences might cause it to question its core values as we do?
Nick: Well, I think that depends on how the goal system is structured. So with humans we don’t have a simple declarative goal structure list. Not like a simple slot where we have super goal and everything else is derived from that
Rather it’s like many different little people inhabit our skull and have their debates and fight it out and make compromises. And in some situations, some of them get a boost like permutations and stuff like that. Then over time we have different things that change what we want like hormones kicking in, fading out, all kinds of processes.
Another process that might affect us is what I call value accretion. The idea that we can have mechanisms that loads new values into us, as we go along. Like maybe falling in love is like that; Initially you might not value that person for their own sake above any other person. But once you undergo this process you start to value them for their own sake in a special way. So human have this mechanism that make us acquire values depending on our experiences.
If you were building a machine super intelligence and trying to engineer its goal systems so that it will be reliably safe and human friendly, you might want to go with something, more transparent where you have an easier time seeing what is happening, rather than have a complex modular minds with a lot of different forces battling it out...you might want to have a more hierarchical structure.
Questioner: What do you think of the necessary…requisites for the conscious mind? What are the features?
Nick: Yes, I’m not sure. We’ve talked a little on that earlier. Suppose there is a certain kind of computation that is needed, that is really is the essence of mind. I’m sympathetic to the idea that something in the vicinity of that view might be correct. You have to think about exactly how to develop it. Then there is this stage of what is a computation.
So there is this challenge (I think it might go back to Hans Moravec but I think similar objections have been raised in philosophy against computationalism) where the idea is that if you have an arbitrary physical system that is sufficiently complicated, it could be a stone or a chair or just anything with a lot of molecules in it. And then you have this abstract computation that you think is what constitutes the implementation of the mind. Then there would be some mathematical mapping between all the parts in your computation and atoms in the chair so that you could artificially, through a very complicated mapping interpret the motions of the molecules in the chair in such a way that they would be seen as implementing the computation. It would not be any plausible mapping, not a useful mapping, but a bizarro mapping. Nonetheless if there were sufficiently limited parts there, you could just arbitrarily define some, by injection..
And clearly we don’t think that all these random physical objects implement the mind, or all possible minds.
So the lesson to me is that it seems that we need some kind of account of what it means to implement a computation that is not trivial and this mapping function between the abstract entity that is a sort of Turing program, or whatever your model of a computation is and the physical entity that decides to implement it to be some sort of non-trivial representation of what this mapping can look like
It might have to be reasonably simple. It might have to have certain counter-factual properties, so that the system would have implemented a related, but slightly different computation if you had scrambled the initial conditions of the system in a certain way, so something like that. But this is an open question in the philosophy of mind, to try to nail down what it means to implement the computation.
[00:20]
Questioner: To bring back to the goal and motivation approach to making an AI friendly towards us, one of the most effective ways of controlling human behavior, quite aside from goals and motivations , is to train them by instilling neuroses. It’s why 99.99% of us in this room couldn’t pee in our pants right now even if he really, really wanted to.
Is it possible to approach controlling an AI in that way or even would it be possible for an AI to develop in such a way that there is a developmental period in which a risk-reward system or some sort of neuroses instilment could be used to basically create these rules that an AI couldn’t break?
Nick: It doesn’t sound so promising because a neurosis is a complicated thing that might be a particular syndrome of a phenomenon that occurs in human- style mind, because of the way that humans’ minds are configured. It’s not clear there would be something exactly analogous to that in a cognitive system with a very different architecture.
Also, because neuroses, at least certain kinds of neuroses, are ones we would choose to get rid of if we could. So if you had a big phobia and there was a button that would remove the phobia, obviously you would press the button. And here we have this system that is presumably able to self-modify. So if it had this big hang up that it didn’t like, then it could reprogram itself to get rid of that.
This would be different than a top-level goal because top-level goal would be the criterion it produced to decide whether to take an action. In particular, like an action to remove the top level goal.
So generally speaking with reasonable and coherent goal architecture you would get certain convergent instrumental values that would crop up in a wide range of situations. One might be self preservation, not necessarily because you value your own survival for its own sake, but because in many situations you can predict that if you are around in the future you can continue to act in the future according to your goals, and that will make it more likely that the world will then be implementing your goals.
Another convergent instrumental value might be protection of your goal system from corruption (?) for very much the same reason. For even if you were around in the future but you have different goals from the ones you had now, you would now predict that that means in the future you will no longer be working towards realizing your current goals but maybe towards a completely different purpose, that would make it now less likely that your current goals would be realized. If your current goals are what you use as a criterion to choose an action, you would want to try to take actions that would prevent corruption of your goal system.
One might list a couple of other of the convergent instrumental values like intelligence amplification, technology perfection and resource acquisition. So this relates to why generic super-intelligence might be dangerous. It’s not so much that you have to worry that it would have human Unfriendliness in the sense of disliking human goals, that it would *hate* humans . The danger is that it wouldn’t *care* about humans. It would care about something different, like paperclips. But then if you have almost any other goals, like paperclips, there would be these other convergent instrumental reasons that you discover. For while your goal is to make as many paperclips as possible you might want to a) prevent humans from switching you off or tampering with your goal system or b) you might want to acquire as much resources as possible, including planets, and the solar system, and the galaxy. All of that stuff could be made into paperclips. So even with pretty much a random goal, you would end up with these motivational tendencies which would be harmful to humans.
[00:25]
Questioner: Appreciating the existential risks, what do you think about goals and motivations, and such drastic measures of control sort of a) ethically and b) as a basis of a working relationship?
Nick: Well, in terms of the working relationship one has to think about the differences with these kinds of the artificial being. I think there are a lot of (?) about how to relate to artificial agents that are conditioned on the fact that we are used to dealing with human agents, and there are a lot of things we can assume about the human.
We can assume perhaps that they don’t want to be enslaved. Even if they say that they want to be enslaved, we might think that deep inside of them, there is a sort of more genuine authentic self that doesn’t want to be enslaved. Even if some prisoner has been brainwashed to do the bidding of their master, maybe we say it’s not really good for them because it’s in their nature, this will to be autonomous. And there are other things like that, that don’t necessarily have to obtain for a completely artificial system which might not have any of that rich human nature that we have.
So in terms of what the good working relationship is, just as what we think of a good relationship with our word processor or email program. Not in these terms, as if you’re exploiting it for your ends, without giving it anything in return. If your email program had a will, presumably it would be the will to be a good and efficient email program that processed your emails properly. Maybe that was the only thing it wanted and cared about. So having a relationship with it would be a different thing.
There was another part of your question, about whether this would be right and ethical. I think if you are operating a new agent from scratch, and there are many different possible agents you could create, some of those agents will have human style values; they want to be independent and respected. Other agents that you could create would have no greater desire than to be of service. Others would just want paperclips. So if you step back, and look at which of these options we should decide, then looking at the question of moral constraints on which of these are legitimate.
And I’m not saying that those are trivial, I think there are some deep ethical questions here. However in the particular scenario where we are considering the creation of a single super intelligence the more pressing concern would be to ensure that it doesn’t destroy everything else, like humanity and its future. Now, if you have a different scenario, like instead of this one uber-mind rising ahead, you have many minds that become smarter and smarter that rival humans and then gradually exceed them
Say an uploading scenario where you start with very slow software, where you have human like minds running very slowly. In that case, maybe how we should relate to these machine intellects morally becomes more pressing. Or indeed, even if you just have one, but in the process of figuring out what to do it creates “thought crimes”.
If you have a sufficiently powerful mind maybe you have thoughts themselves would contain structures that are conscious. This sounds mystical, but imagine you are a very powerful computer and one of the things you are doing is you are trying to predict what would happen in the future under different scenarios, and so you might play out a future
And if those simulations you are running inside of this program were sufficiently detailed, then they could be conscious. This comes back to our earlier discussion of what is conscious. But I think a sufficiently detailed computer simulation of the mind could be conscious
You could then have a super intelligence that could process by thinking about things could create sentient beings, maybe millions or billions or trillions of them, and their welfare would then be a major ethical issue. They might be killed when it stops thinking about them, or they might be mistreated in different ways. And I think that would be an important ethical complication in this context
[00:30]
Questioner: Eliezer suggests that one of the many problems with arbitrary stamps in AI space is that human values are very complex. So virtually any goal system will go horribly wrong because it will be doing things we don’t quite care about, and that’s as bad as paperclips. How complex do you think human values will be?
Nick: It looks like human values are very complicated. Even if they were very simple, even if it turned out its just pleasure say, which compared to other things of what has value, like democracy flourishing and art. As far as we can think of values that’s one of the more simplistic possibilities. Even that if you start to think of it from a physicalistic view, and you have to now specify which atoms have to go how and where for there to be pleasure. It would be a pretty difficult thing to write down, Like the Schrödinger Equation for pleasure.
So in that sense it seems fair that our values are very complex. So there are two issues here. There is a kind of technical problem of figuring out that if you knew what our values are, in the sense that we think that we normally know what our values are, how we could get the AI to share those values, like pleasure or absence of pain or anything like that.
And there is the additional philosophical problem which is if we are unsure of what are values are, if we are groping about in axiology trying to figure out how much to value different things, and maybe there are values we have been blind to today, then how do you also get all of that on board, on top of what we already think has value, that potential of moral growth? Both of those are very serious problems and difficult challenges.
There are a number of different ways you can try to go. One approach that is interesting is what we might call is indirect normativity. Where the idea is rather than specifying explicitly what you want the AI to achieve, like maximizing pleasure while respecting individual autonomy and pay special attention to the poor. Rather than creating a list, what you try to do instead is specify a process or mechanism by which the AI could find out what it is supposed to do.
One of these ideas that has come out is this idea Coherent Extrapolated Volition, where the idea is if you could try to tell the AI to do that which we would have asked it to do if we had thought about the problem longer, and if we had been smarter, and if we had some other qualifications. Basically, if you could describe some sort of idealized process whereby we at the end, if we underwent that process would be able to create a more detailed list, then maybe point the AI to that and make the AI’s value to run this process and do what comes out of the end of that, rather than go with where our current list gets us about what we want to do and what has value.
[00:33]
Questioner: Isn’t there are risk that.. the AI would decide that if we thought about it for 1000 years really, really carefully, that we would just decide to just let the AIs to take over?
Nick: Yeah, that seems to be a possibility. And then that raises some interesting questions. Like if that is really what our CEV would do. Let’s assume that everything has been implemented in the right way, like there is no flaw on the realization of this. So how should we think about this?
Well on the one hand, you might say if this is really what our wiser selves would want. What we would want if we were saved from these errors and illusions we are suffering under, then maybe we should go ahead with that. On the other hand, you could say, this is really a pretty tall order. That we’re supposed to sacrifice not just a bit, but ourselves and everybody else, for this abstract idea that we don’t really feel any strong connection to. I think that’s one of the risks, but who knows what will be the outcome of this CEV?
And there are further qualms one might have that need to be spelled out. Like exactly whose volition is it that is supposed to be extrapolated. Humanity’s? Well then, who is humanity? Like does it include past generations for example? How far back? Does it include embryos that died?
Who knows whether the core of humanity is nice? Maybe there are a lot of suppressed sadists out there, that we don’t realize, because they know that they would be punished by society. Maybe if they went through this procedure, who knows what would come out?
So it would be dangerous to run something like that, without some sort of safeguard check at the end. On the other hand, there is worry that if you put in too many of these checks, then in effect you move the whole thing back to what you want now. Because if you were allowed to look at an extrapolation, see whether you like it, or if you dislike it you run another one by changing the premises and you were allowed to keep going like that until you were happy with the result then basically it would be you now, making the decision. So, it’s worth thinking about, whether there is some sort of compromise or blend that might be the most appealing.
[00:36]
Questioner: You mentioned before about a computer producing sentience itself in running a scenario. What are the chances that that is the society that we live in today?
Nick: I don’t know, so what exactly are the chances? I think significant. I don’t know, it’s a subjective judgment here. maybe less than 50%? Like 1 in 10?
There’s a whole different topic, maybe we should save that topic for a different time..
[00:37]
Questioner: If I wanted to study this area generally, existential risk, what kind of subject would you recommend I pursue? We’re all undergrads, so after our bachelors we will start on master or go into a job. If I wanted to study it, what kind of master would you recommend?
Nick: Well part of it would depend on your talent, like if you’re a quantitative guy or a verbal guy. There isn’t really an ideal sort of educational program anywhere, to deal with these things. You’d want to get a fairly broad education, there are many fields that could be relevant. If one looks at where people are coming from so far that have had something useful to say, a fair chunk of them are philosophers, some computer scientists, some economists, maybe physics.
Those fields have one thing in common in that they are fairly versatile. Like if you’re doing Philosophy, you can do Philosophy of X, or of Y, or of almost anything. Economics as well. It gives you a general set of tools that you can use to analyze different things, and computer science has these ways of thinking and structuring a problem that is useful for many things
So it’s not obvious which of those disciplines would be best, generically. I think that would depend on the individual, but then what I would suggest is that while you were doing it, you also try to read in other areas other than the one you were studying. And try to do it at a place where there are a lot of other people around with a support group and advisor that encouraged you and gave you some freedom to pursue different things.
[00:38]
Questioner: Would you consider AI created by human beings as some sort of consequence of evolutionary process? Like in a way that human beings tried to overcome their own limitations and as it’s a really long time to get it on a dna level you just get it quicker on a more computational level?
Nick: So whether we would use evolutionary algorithms to produce super- intelligence or..?
Questioner: If AI itself is part of evolution..
Nick: So there’s kind of a trivial sense in which if we evolved and we created…then obviously evolution had a part to play in the overall causal explanation of why we’re going to get machine intelligence at the end. Now, for evolution to really to exert some shaping influence there have to be a number of factors at play. There would have to be a number of variants created that are different and then compete for resources and then there is a selection step. And for there to be significant evolution you have to enact this a lot of times.
So whether that will happen or not in the future is not clear at all. If you have a signal tone for me, in that if a world order arises at a top level. Where there is only one decision making agency, which could be democratic world government or AI that rules everybody, or a self-enforcing moral code, or tyranny or a nice thing or bad thing
But if you have that kind of structure there will at least be, in principal ability, for that unitary agent to control evolution within itself, like it could change selection pressures by taxing or subsidizing different kinds of life forms.
If you don’t have a singleton then you have different agencies that might be in competition with one another, and in principle in that scenario evolutionary pressures can come into play. But I think the way that it might pan out would be different from the way that we’re used to seeing biological evolution, so for one thing you might have these potentially immortal life forms, that is they have software minds that don’t naturally die, that could modify themselves.
If they knew that their current type, if they continued to pursue their current strategy would be outcompeted and they didn’t like that, they could change themselves immediately right away rather than wait to be eliminated.
So you might get, if there were to be a long evolutionary process ahead and agents could anticipate that, you might get the effects of that instantaneously from anticipation.
So I think you probably wouldn’t see the evolutionary processes playing out but there might be some of the constraints that could be reflected more immediately by the fact that different agencies had to pursue strategies that they could see would be viable.
[00:41]
Questioner: So do you think it’s possible that our minds could be scanned and then be uploaded into a computer machine in some way and then could you create many copies of ourselves as those machines?
Nick: So this is what in technical terminology is “whole brain emulation” or in more popular terminology “uploading”. So obviously this is impossible now, but seems like it’s consistent with everything we know about physics and chemistry and so forth. So I think that will become feasible barring some kind of catastrophic thing that puts a stop to scientific and technological progress.
So the way I imagine it would work is that you take a particular brain, freeze it or vitrify it, and then slice it up into thin slices that would be fed through some array of microscopes that would scan each slice with sufficient resolution and then automated image analysis algorithms would work on this to reconstruct the 3 dimensional neural network that your own organic brain implemented and I have this sort of information structure in a computer.
At this point you need computational neuroscience to tell you what each component does. So you need to have a good theory of what say a pyramidal cell does, what a different kind of…And then you would combine those little computational models of what each type of neuron does with this 3D map of the network and run it. And if everything went well you would have transferred the mind, with memories and personalities intact to the computer. And there is an open question of just how much resolution would you need to have, how much detail you would need to capture of the original mind in order to successfully do this. But I think there would be some level of detail which as I said before, might be on the level of synapses or thereabouts, possibly higher, that would suffice. So then you would be able to do this. And then after you’re software , you could be copied, or speeded up or slowed down or paused or stuff like that
[00:44]
Questioner: There has been a lot of talk of controlling the AI and evaluating the risk. My question would be assuming that we have created a far more perfect AI than ourselves is there a credible reason for human beings to continue existing?
Nick: Um, yeah, I certainly have the reason that if we value our own existence we seem to have a…Do you mean to say that there would be a moral reason to exist or if we would have a self interested reason to exist.
Questioner: Well I guess it would be your opinion..
Nick: My opinon is that I would rather not see the genocide of the entire human species. Rather that we all live happily ever after. If those are the only two alternatives, I think yeah! Let’s all live happily ever after! Is where I would come down on that.
[00:45]
Questioner: By keeping human species around You’re going to have a situation presumably where you have extremely, extremely advanced AIs where they have few decades or few centuries or whatever and they will be far, far beyond our comprehension, and even if we still integrate to some degree with machines (mumble) biological humans then they’ll just be completely inconceivable to us. So isn’t there a danger that our stupidity will hamper their perfection?
Nick: Would hamper their perfection?? Well there’s enough space for there to be many different kinds of perfection pursued. Like right now we have a bunch of dust mites crawling around everywhere, but not really hampering our pursuit of art or truth or beauty. They’re going about their business and we’re going about ours.
I guess you could have a future where there would be a lot of room in the universe for planetary sized computers thinking their grand thoughts while…I’m not making a prediction here, but if you wanted to have a nature preserve, with original nature or original human beings living like that, that wouldn’t preclude the other thing from happening..
Questioner: Or a dust mite might not hamper us, but things like viruses or bacteria just by being so far below us (mumble). And if you leave humans on a nature preserve and they’re aware of that, isn’t there a risk that they’ll be angry at the feeling of being irrelevant at the grand scheme of things?
Nick: I suppose. I don’t think it would bother the AI that would be able to protect itself, or remain out of reach. Now it might demean the remaining humans if we were dethroned from this position of kings, the highest life forms around, that it would be a demotion, and one would have to deal with that I suppose.
It’s unclear how much value to place on that. I mean right now in this universe which looks like it’s infinite somewhere out there are gonna be all kinds of things including god like intellects and everything in between that are already outstripping us in every possible way.
It doesn’t seem to upset us terribly; we just get on with it. So I think people will have to make some psychological..I’m sure we can adjust to it easily. Now it might be from some particular theory of value that this might be a sad thing for humanity. That we are not even locally at the top of the ladder.
Questioner: If rationalism was true, that is if it were irrational to perform wrong acts. Would we still have to worry about super-intelligence? It seems to me that we wouldn’t have.
Nick: Well you might have a system that doesn’t care about being rational, according to that definition of rationality. So I think that we would still have to worry
[00:48]
Questioner: Regarding trying to program AI without values, (mumbles) But as I understand it, what’s considered one of the most promising approach in AI now is more statistical learning type approaches.. And the problem with that is if we were to produce an AI with that, we might not understand its inner workings enough to be able to dive in and modify it in precisely the right way to give it an unalterable list of terminal values.
So if we were to end up with some big neural network that we trained in some way and ended up with something that could perform as well as humans in some particular task or something. We might be able to do that without knowing how to alter it to have some particular set of goals.
Nick: Yeah, so there are some things there to think about. One general worry that one needs to bear in mind if one tries that kinds of approach is we might give it various examples like this is a good action and this is a bad action in this context, and maybe it would learn all those examples then the question is how would it generalize to other examples outside this class?
So we could test it we could divide our examples initially into classes and train it on one and test its performance on the other, the way you would do to cross-validate. And then we think that means other cases that it hasn’t seen it would have the same kind of performance. But all the cases that we could test it on would be cases that would apply to its current level of intelligence. So presumably we’re going to do this while it’s still at human or less than human intelligence. We don’t want to wait to do this until it’s already super-intelligent.
So then the worry is that even if it were able to analyze what to do in a certain way in all of these cases, it’s only dealing with all of these cases in the training case, when it’s still at a human level of intelligence. Now maybe once it becomes smarter it will realize that there are different ways of classifying these cases that will have radically different implications for humans.
So suppose that you try to train it to… this was one of the classic example of a bad idea of how to solve the control problem: Lets train the AI to want to make people smile, what can go wrong with that? So we train it on different people and if they smile when it does something that’s like a kind of reward; it gets strength in those positions that led to the behavior that made people smile. And frowning would move the AI away from that kind of behavior. And you can imagine that this would work pretty well at a primitive state where the AI will engage in more pleasing and useful behavior because the user will smile at it and it will all work very well. But then once the AI reaches a certain level of intellectual sophistication it might realize that It could get people to smile not just by being nice but also by paralyzing their facial muscles in that constant beaming smile.
And then you would have this perverse instantiation of the constant values all along the value that it wants to make people smile, but the kinds of behaviors it would pursue to achieve this goal would suddenly radically change at a certain point once the new set of strategies became available to it, and you would get this treacherous turn, which would be dangerous. So that’s not to dismiss that whole category of approaches altogether. One would have to think through quite carefully, exactly how one would go about that.
[00:52]
There’s also the issue of, a lot of the things we would want it to learn, if we think of human values and goals and ambitions. We think of them using human concepts, not using basic physical..like place atom A to zed in a certain order, But we think like promote peace, encourage people to develop and achieve…These are things that to understand them we really need to have human concept, which a sub-human AI will not have, it’s too dumb at that stage to have that. Now once it’s super-intelligent it might easily understand all human concepts but then it’s too late. It already needs to be friendly before that. So there might only be this brief window of opportunity where its roughly human leve,l where its still safe enough not to resist our attempt to indoctrinate it but smart enough that it can actually understand what we are trying to tell it.
And again were going to have to be very careful to make sure that we can bring the system up to that interval and then freeze its development there and try to load the values in before boot strapping it farther.
And maybe(this was one of the first questions) its intelligence will not be human level in the sense of being similar to a human at any one point. Maybe it will immediately be very good at chess but very bad at poetry and then it has to reach radically superhuman levels of capability in some domains before other domains even reach human level. And in that case it’s not even clear that there will be this window of opportunity where you can load in the values. So I don’t want to dismiss that, but that’s like some additional things that one needs to think about, if one tries to develop that.
[00:54]
Questioner: How likely is it that we will have the opportunity in our lifetimes to become immortal by mind uploading?
Nick: Well first of all, by immortal here we mean living for a very long time, rather than literally never dying, which is a very different thing that would require our best theories of cosmology to turn out to be false for something like that.
So living for a very long time: Im not going to give you a probability in the end. But I can say some of the things that…Like first we would have to avoid most kinds of things like existential catastrophe that could put an end to this.
So, if you start with 100% and you remove all the things that could go wrong, so first you would have to throw away whatever total level of existential risk is, integrated over all time. Then there is the obvious risk that you will die before any of this happens, which seems to be a very substantial risk. Now you can reduce that by signing up for cryonics, but that’s of course an uncertain business as well. And there could be sub-existential catastrophes that would put an end to a lot of things like a big nuclear war or pandemics.
And then I guess there are all these situations in which not everybody who is still around gets the opportunity to participate in what came after. Even though what came after doesn’t count as an existential catastrophe… And [it can get] even more complicated, like if you took into account the simulation hypothesis, which we decided not to talk about today.
[00:56]
Q: Is there a particular year we should aim for?
Nick: As for the timelines, truth is we don’t know. So you need to think about a very smeared out probability distribution. And really smear it, because things could happen surprisingly sooner like some probability 10 years from now or 20 years now but probably more probable at 30, 40, 50 years but some probability at 80 years or 200 years..
There is just not good evidence that human beings are very good at predicting with precision these kinds of things far out in the future.
Questioner: (hard to understand) How intelligent can we really get. … we already have this complexity class of problems that we can solve or not…
Is it fair to believe that a super-intelligent machine can be actually be that exponentially intelligent... this is very close to what we could achieve …A literal definition of intelligence also, but..
Nick: Well in a sort of cheater sense we could solve all problems, sort of like everything a Turing Machine could..it could take like a piece of paper and..
a) It would take too long to actually do it, and if we tried to do it, there are things that would probably throw us off before we have completed any sort of big Turing machine simulation
There is a less figurative sense in which our abilities are already indirectly unlimited. That is, if we have the ability to create super intelligence, then in a sense we can do everything because we can create this thing that then solves the thing that we want solved. So there is this sequence of steps that we have to go through, but in the end it is solved.
So there is this level of capability that means that once you have that level of capability your indirect reach is universal, like anything that could be done, you could indirectly achieve, and we might have already surpassed that level a long time ago, save for the fact that we are sort of uncoordinated on a global level and maybe a little bit unwise.
But if you had a wise singleton then certainly you could imagine us plotting a very safe course, taking it very slowly and in the end we could be pretty confident that we would get to the end result. But maybe neither of those ideas are what you had in mind. Maybe you had more in mind The question of just how smart, in everyday sort of smart could a machine be,. So just how much more effective at social persuasion, to take one particular thing, than the most persuasive human.
So that we don’t really know. If one has a distribution of human abilities, and it seems like the best humans can do a lot better, in our intuitive sense of a lot, than the average humans. Then it would seem very surprising if the best humans like the top tenth of a percent had reached the upper limit of what was technologically feasible, that would seem to be an amazing coincidence. So one would then expect for the maximum achievable to be a lot higher. But exactly how high we don’t know.
So two more questions:
[00:59]
Q: Just like we are wondering about super-intelligent being, is it possible that that super-intelligent will worry about another super-intelligent being that it will create? Isn’t that also recursive?
Nick: So you consider where one AI designs another AI that’s smarter and then that designs another.
But it might not be clearly distinguishable from the scenario where we have one AI that modifies itself so that it ends up smarter. Whether you call it the same or different, it might be an unimportant difference.
Last question. This has to be super profound question.
[01:00]
Q: So my question is why should we even try to build a super-intelligence?
Nick: I don’t think we should now, do that. If you took a step back and thought what would a sane species do, well they would first figure out how to solve the control problem, and then they would think about it for a while to make sure that they really had the solution right and they hadn’t just deluded themselves to how to solve it, and then maybe they would build a super-intelligence.
So that’s what the sane species will do, now what humanity will do is try to do everything they can as soon as possible, so there are people who have tried to build it as we speak, in a number of different places on earth, and fortunately it looks very difficult to build it with current technology. But of course it’s getting easier over time, computers get better, computer science, the state of the art advances, we learn more about how the human brain works.
So every year it gets a little bit easier, from some unknown very difficult level, it gets easier and easier. So at some point it seems someone will probably succeed at doing it. If the world remains sort of uncoordinated and uncontrolled as it is now, it’s bound to happen soon after it becomes feasible. But we have no reason to accelerate that even more than its already happening ...
So we were thinking about what would a powerful AI thing do that had just come into existence and it didn’t know very much yet, but it had a lot of clever algorithms and a lot of processing power. Someone was suggesting maybe it would move around randomly, like a human baby does, to figure out how things move, how it can move its actuators.
Then we had a discussion if that was a wise thing or not.
But if you think about how the human species behave, we are really behaving very much like a baby were sort of moving and shaking everything that moves, just to see what happens. And the risk is that we are not in the nursery with a kind mother who has put us in a cradle, but that we are out in the jungle somewhere screaming at the top of our lungs, and maybe just alerting the lions to their supper.
So let’s wrap up. I enjoyed this a great deal, so thank you for your questions.
existential-risk.org by Nick Bostrom
existential-risk.org
(Updated 2011-12-16 due to a comment by Nick Bostrom.)
'Existential Risk FAQ' by Nick Bostrom
(2011) Version 1.0
Short answers to common questions
'Existential Risk Prevention as the Most Important Task for Humanity' by Nick Bostrom
(2011) Working paper (revised)
ABSTRACT
Existential risks are those that threaten the entire future of humanity. Many theories of value imply that even relatively small reductions in net existential risk have enormous expected value. Despite their importance, issues surrounding human-extinction risks and related hazards remain poorly understood. In this paper, I clarify the concept of existential risk and develop an improved classification scheme. I discuss the relation between existential risks and basic issues in axiology, and show how existential risk reduction (via the maxipok rule) can serve as a strongly action-guiding principle for utilitarian concerns. I also show how the notion of existential risk suggests a new way of thinking about the ideal of sustainability.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)