[Link] Will Superintelligent Machines Destroy Humanity?
A summary and review of Bostrom's Superintelligence is in the December issue of Reason magazine, and is now posted online at Reason.com.
How to Study Unsafe AGI's safely (and why we might have no choice)
TL;DR
A serious possibility is that the first AGI(s) will be developed in a Manhattan Project style setting before any sort of friendliness/safety constraints can be integrated reliably. They will also be substantially short of the intelligence required to exponentially self-improve. Within a certain range of development and intelligence, containment protocols can make them safe to interact with. This means they can be studied experimentally, and the architecture(s) used to create them better understood, furthering the goal of safely using AI in less constrained settings.
Setting the Scene
Technological and/or Political issues could force the development of AI without theoretical safety guarantees that we'd certainly like, but there is a silver lining
A lot of the discussion around LessWrong and MIRI that I've seen (and I haven't seen all of it, please send links!) seems to focus very strongly on the situation of an AI that can self-modify or construct further AIs, resulting in an exponential explosion of intelligence (FOOM/Singularity). The focus on FAI is on finding an architecture that can be explicitly constrained (and a constraint set that won't fail to do what we desire).
My argument is essentially that there could be a critical multi-year period preceding any possible exponentially self-improving intelligence during which a series of AGIs of varying intelligence, flexibility and architecture will be built. This period will be fast and frantic, but it will be incredibly fruitful and vital both in figuring out how to make an AI sufficiently strong to exponentially self-improve and in how to make it safe and friendly (or develop protocols to bridge the even riskier period between when we can develop FOOM-capable AIs and when we can ensure their safety).
- why is a substantial period of proto-singularity more likely than a straight-to-singularity situation?
- Second, what strategies will be critical to developing, controlling, and learning from these pre-FOOM AIs?
- Third, what are the political challenge that will develop immediately before and during this period?
The requirement for a hard singularity, an exponentially self-improving AI, is that the AI can substantially improve itself in a way that enhances its ability to further improve itself, which requires the ability to modify its own code; access to resources like time, data, and hardware to facilitate these modifications; and the intelligence to execute a fruitful self-modification strategy.
The first two conditions can (and should) be directly restricted. I'll elaborate more on that later, but basically any AI should be very carefully sandboxed (unable to affect its software environment), and should have access to resources strictly controlled. Perhaps no data goes in without human approval or while the AI is running. Perhaps nothing comes out either. Even a hyperpersuasive hyperintelligence will be slowed down (at least) if it can only interact with prespecified tests (how do you test AGI? No idea but it shouldn't be harder than friendliness). This isn't a perfect situation. Eliezer Yudkowsky presents several arguments for why an intelligence explosion could happen even when resources are constrained, (see Section 3 of Intelligence Explosion Microeconomics) not to mention ways that those constraints could be defied even if engineered perfectly (by the way, I would happily run the AI box experiment with anybody, I think it is absurd that anyone would fail it! [I've read Tuxedage's accounts, and I think I actually do understand how a gatekeeper could fail, but I also believe I understand how one could be trained to succeed even against a much stronger foe than any person who has played the part of the AI]).
But the third emerges from the way technology typically develops. I believe it is incredibly unlikely that an AGI will develop in somebody's basement, or even in a small national lab or top corporate lab. When there is no clear notion of what a technology will look like, it is usually not developed. Positive, productive accidents are somewhat rare in science, but they are remarkably rare in engineering (please, give counterexamples!). The creation of an AGI will likely not happen by accident; there will be a well-funded, concrete research and development plan that leads up to it. An AI Manhattan Project described above. But even when there is a good plan successfully executed, prototypes are slow, fragile, and poor-quality compared to what is possible even with approaches using the same underlying technology. It seems very likely to me that the first AGI will be a Chicago Pile, not a Trinity; recognizably a breakthrough but with proper consideration not immediately dangerous or unmanageable. [Note, you don't have to believe this to read the rest of this. If you disagree, consider the virtues of redundancy and the question of what safety an AI development effort should implement if they can't be persuaded to delay long enough for theoretically sound methods to become available].
A Manhattan Project style effort makes a relatively weak, controllable AI even more likely, because not only can such a project implement substantial safety protocols that are explicitly researched in parallel with primary development, but also because the total resources, in hardware and brainpower, devoted to the AI will be much greater than a smaller project, and therefore setting a correspondingly higher bar for the AGI thus created to reach to be able to successfully self-modify itself exponentially and also break the security procedures.
Strategies to handle AIs in the proto-Singularity, and why they're important
First, take a look the External Constraints Section of this MIRI Report and/or this article on AI Boxing. I will be talking mainly about these approaches. There are certainly others, but these are the easiest to extrapolate from current computer security.
These AIs will provide us with the experimental knowledge to better handle the construction of even stronger AIs. If careful, we will be able to use these proto-Singularity AIs to learn about the nature of intelligence and cognition, to perform economically valuable tasks, and to test theories of friendliness (not perfectly, but well enough to start).
"If careful" is the key phrase. I mentioned sandboxing above. And computer security is key to any attempt to contain an AI. Monitoring the source code, and setting a threshold for too much changing too fast at which point a failsafe freezes all computation; keeping extremely strict control over copies of the source. Some architectures will be more inherently dangerous and less predictable than others. A simulation of a physical brain, for instance, will be fairly opaque (depending on how far neuroscience has gone) but could have almost no potential to self-improve to an uncontrollable degree if its access to hardware is limited (it won't be able to make itself much more efficient on fixed resources). Other architectures will have other properties. Some will be utility optimizing agents. Some will have behaviors but no clear utility. Some will be opaque, some transparent.
All will have a theory to how they operate, which can be refined by actual experimentation. This is what we can gain! We can set up controlled scenarios like honeypots to catch malevolence. We can evaluate our ability to monitor and read the thoughts of the agi. We can develop stronger theories of how damaging self-modification actually is to imposed constraints. We can test our abilities to add constraints to even the base state. But do I really have to justify the value of experimentation?
I am familiar with criticisms based on absolutley incomprehensibly perceptive and persuasive hyperintelligences being able to overcome any security, but I've tried to outline above why I don't think we'd be dealing with that case.
Political issues
Right now AGI is really a political non-issue. Blue sky even compared to space exploration and fusion both of which actually receive funding from government in substantial volumes. I think that this will change in the period immediately leading up to my hypothesized AI Manhattan Project. The AI Manhattan Project can only happen with a lot of political will behind it, which will probably mean a spiral of scientific advancements, hype and threat of competition from external unfriendly sources. Think space race.
So suppose that the first few AIs are built under well controlled conditions. Friendliness is still not perfected, but we think/hope we've learned some valuable basics. But now people want to use the AIs for something. So what should be done at this point?
I won't try to speculate what happens next (well you can probably persuade me to, but it might not be as valuable), beyond extensions of the protocols I've already laid out, hybridized with notions like Oracle AI. It certainly gets a lot harder, but hopefully experimentation on the first, highly-controlled generation of AI to get a better understanding of their architectural fundamentals, combined with more direct research on friendliness in general would provide the groundwork for this.
I know when the Singularity will occur
More precisely, if we suppose that sometime in the next 30 years, an artificial intelligence will begin bootstrapping its own code and explode into a super-intelligence, I can give you 2.3 bits of further information on when the Singularity will occur.
Between midnight and 5 AM, Pacific Standard Time.
Supposing you inherited an AI project...
Supposing you have been recruited to be the main developer on an AI project. The previous developer died in a car crash and left behind an unfinished AI. It consists of:
A. A thoroughly documented scripting language specification that appears to be capable of representing any real-life program as a network diagram so long as you can provide the following:
A.1. A node within the network whose value you want to maximize or minimize.
A.2. Conversion modules that transform data about the real-world phenomena your network represents into a form that the program can read.
B. Source code from which a program can be compiled that will read scripts in the above language. The program outputs a set of values for each node that will optimize the output (you can optionally specify which nodes can and cannot be directly altered, and the granularity with which they can be altered).
It gives remarkably accurate answers for well-formulated questions. Where there is a theoretical limit to the accuracy of an answer to a particular type of question, its answer usually comes close to that limit, plus or minus some tiny rounding error.
Given that, what is the minimum set of additional features you believe would absolutely have to be implemented before this program can be enlisted to save the world and make everyone live happily forever? Try to be as specific as possible.
Baseline of my opinion on LW topics
To avoid repeatly saying the same I'd like to state my opinion on a few topics I expect to be relevant to my future posts here.
You can take it as a baseline or reference for these topics. I do not plan to go into any detail here. I will not state all my reasons or sources. You may ask for separate posts if you are interested. This is really only to provide a context for my comments and posts elsewhere.
If you google me you may find some of my old (but not that off the mark) posts about these position e.g. here:
http://grault.net/adjunct/index.cgi?GunnarZarncke/MyWorldView
Now my position on LW topics.
The Simulation Argument and The Great Filter
On The Simulation Argument I definitely go for
"(1) the human species is very likely to go extinct before reaching a “posthuman” stage"
Correspondingly on The Great Filter I go for failure to reach
"9. Colonization explosion".
This is not because I think that humanity is going to self-annihilate soon (though this is a possibility). Instead I hope that humanity will earlier or later come to terms with its planet. My utopia could be like that of the Pacifists (a short story in Analog 5).
Why? Because of essential complexity limits.
This falls into the same range as "It is too expensive to spread physically throughout the galaxy". I know that negative proofs about engineering are notoriously wrong - but that is currently my best guess. Simplified one could say that the low hanging fruits have been taken. I have lots of empirical evidence of this on multiple levels to support this view.
Correspondingly there is no singularity because progress is not limited by raw thinking speed but by effective aggregate thinking speed and physical feedback.
What could prove me wrong?
If a serious discussion would ruin my well-prepared arguments and evidence to shreds (quite possible).
At the very high end a singularity might be possible if a way could be found to simulate physics faster than physics itself.
AI
Basically I don't have the least problem with artificial intelligence or artificial emotioon being possible. Philosophical note: I don't care on what substrate my consciousness runs. Maybe I am simulated.
I think strong AI is quite possible and maybe not that far away.
But I also don't think that this will bring the singularity because of the complexity limits mentioned above. Strong AI will speed up some cognitive tasks with compound interest - but only until the physical feedback level is reached. Or a social feedback level is reached if AI should be designed to be so.
One temporary dystopia that I see is that cognitive tasks are out-sourced to AI and a new round of unemployment drives humans into depression.
- A simplified layered model of the brain; deep learning applied to free inputs (I cancelled this when it became clear that it was too simple and low level and thus computationally inefficient)
- A nested semantic graph approach with propagation of symbol patterns representing thought (only concept; not realized)
I'd really like to try a 'synthesis' of these where microstructure-of-cognition like activation patterns of multiple deep learning networks are combined with a specialized language and pragmatics structure acquisition model a la Unsupervised learning of natural languages. See my opinion on cognition below for more in this line.
What could prove me wrong?
On the low success end if it takes longer than I think it would take me given unlimited funding.
On the high end if I'm wrong with the complexity limits mentioned above.
Conquering space
Humanity might succeed at leaving the planet but at high costs.
With leaving the planet I mean permanently independent of earth but not neccessarily leaving the solar system any time soon (speculating on that is beyond my confidence interval).
I think it more likely that life leaves the planet - that can be
- artificial intelligence with a robotic body - think of curiosity rover 2.0 (most likely).
- intelligent life-forms bred for life in space - think of Magpies those are already smart, small, reproducing fast and have 3D navigation.
- actual humans in suitable protective environment with small autonomous biosperes harvesting asteroids or mars.
- 'cyborgs' - humans altered or bred to better deal with certain problems in space like radiation and missing gravity.
- other - including misc ideas from science fiction (least likely or latest).
For most of these (esp. those depending on breeding) I'd estimate a time-range of a few thousand years.
What could prove me wrong?
If I'm wrong on the singularity aspect too.
If I'm wrong on the timeline I will be long dead likely in any case except (1) which I expect to see in my lifetime.
Cognitive Base of Rationality, Vaguesness, Foundations of Math
How can we as humans create meaning out of noise?
How can we know truth? How does it come that we know that 'snow is white' when snow is white?
Cognitive neuroscience and artificial learning seems to point toward two aspects:
Fuzzy learning aspect
Correlated patterns of internal and external perception are recognized (detected) via multiple specialized layered neural nets (basically). This yields qualia like 'spoon', 'fear', 'running', 'hot', 'near', 'I'. These are basically symbols, but they are vague with respect to meaning because they result from a recognition process that optimizes for matching not correctness or uniqueness.
Semantic learning aspect
Upon the qualia builds the semantic part which takes the qualia and instead of acting directly on them (as is the normal effect for animals) finds patterns in their activation which is not related to immediate perception or action but at most to memory. These may form new qualia/symbols.
The use of these patterns is that the patterns allow to capture concepts which are detached from reality (detached in so far as they do not need a stimulus connected in any way to perception).
Concepts like ('cry-sound' 'fear') or ('digitalis' 'time-forward' 'heartache') or ('snow' 'white') or - and that is probably the demain of humans: (('one' 'successor') 'two') or (('I' 'happy') ('I' 'think')).
Concepts
The interesting thing is that learning works on these concepts like on the normal neuronal nets too. Thus concepts that are reinforced by positive feedback will stabilize and mutually with them the qualia they derive from (if any) will also stabilize.
For certain pure concepts the usability of the concept hinges not on any external factor (like "how does this help me survive") but on social feedback about structure and the process of the formation of the concepts themselves.
And this is where we arrive at such concepts as 'truth' or 'proposition'.
These are no longer vague - but not because they are represented differently in the brain than other concepts but because they stabilize toward maximized validity (that is stability due to absence of external factors possibly with a speed-up due to social pressure to stabilize). I have written elsewhere that everything that derives its utility not from some external use but from internal consistency could be called math.
And that is why math is so hard for some: If you never gained a sufficient core of self-consistent stabilized concepts and/or the usefulness doesn't derive from internal consistency but from external ("teachers password") usefulness then it will just not scale to more concepts (and the reason why science works at all is that science values internal consistency so highly and there is little more dangerous to science that allowing other incentives).
I really hope that this all makes sense. I haven't summarized this for quite some time.
A few random links that may provide some context:
http://www.blutner.de/NeuralNets/ (this is about the AI context we are talking about)
http://www.blutner.de/NeuralNets/Texts/mod_comp_by_dyn_bin_synf.pdf (research applicable to the above in particular)
http://c2.com/cgi/wiki?LeibnizianDefinitionOfConsciousness (funny description of levels of consciousness)
http://c2.com/cgi/wiki?FuzzyAndSymbolicLearning (old post by me)
http://grault.net/adjunct/index.cgi?VaguesDependingOnVagues (dito)
Note: Details about the modelling of the semantic part are mostly in my head.
What could prove me wrong?
Well. Wrong is too hard here. This is just my model and it is not really that concrete. Probably a longer discussion with someone more experienced with AI than I am (and there should be many here) might suffice to rip this appart (provided that I'd find time to prepare my model suitably).
God and Religion
I wasn't indoctrinated as a child. My truely loving mother is a baptised christian living it and not being sanctimony. She always hoped that I would receive my epiphany. My father has a scientifically influenced personal christian belief.
I can imagine a God consistent with science on the one hand and on the other hand with free will, soul, afterlife, trinity and the bible (understood as a mix of non-literal word of God and history tale).
I mean, it is not that hard if you can imagine a timeless (simulation of) the universe. If you are god and have whatever plan on earth but empathize with your creations, then it is not hard to add a few more constraints to certain aggregates called existences or 'person lifes'. Constraints that realize free-will in the sense of 'not subject to the whole universe plan satisfaction algorithm'.
Surely not more difficult than consistent time-travel.
And souls and afterlife should be easy to envision for any science fiction reader familiar with super intelligences.
But why? Occams razor applies.
There could be a God. And his promise could be real. And it could be a story seeded by an emphatizing God - but also a 'human' God with his own inconsistencies and moods.
But it also could be that this is all a fairy tale run amok in human brains searching for explanations where there are none. A mass delusion. A fixated meme.
Which is right? It is difficult to put probabilities to stories. I see that I have slowly moved from 50/50 agnosticism to tolerent atheism.
I can't say that I wait for my epiphany. I know too well that my brain will happily find patterns when I let it. But I have encouraged to pray for me.
My epiphanies - the aha feelings of clarity that I did experience - have all been about deeply connected patterns building on other such patterns building on reliable facts mostly scientific in nature.
But I haven't lost my morality. It has deepend and widened. I have become even more tolerant (I hope).
So if God does against all odds exists I hope he will understand my doubts, weight my good deeds and forgive me. You could tag me godless christian.
What could prove me wrong?
On the atheist side I could be moved a bit further by more proofs of religion being a human artifact.
On the theist side there are two possible avenues:
- If I'd have an unsearched for epiphany - a real one where I can't say I was hallucinating but e.g. a major consistent insight or a proof of God.
- If I'd be convinced that the singularity is possible. This is because I'd need to update toward being in a simulation as per Simulation argument option 3. That's because then the next likely explanation for all this god business is actually some imperfect being running the simulation.
Thus I'd like to close with this corollary to the simulation argument:
Arguments for the singularity are also (weak) arguments for theism.
Engaging First Introductions to AI Risk
I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.
My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.
My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.
The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.
Part I. Building intelligence.
1. Power of Intelligence. Why is intelligence important?
2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?
3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?
4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?
5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?
Part II. Intelligence explosion.
6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?
7. Efficient Cross-Domain Optimization. What is intelligence?
8. The Design Space of Minds-In-General. What else is universally true of intelligences?
9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?
Part III. AI risk.
10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?
11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?
12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?
13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?
14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?
15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?
Part IV. Ends.
16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?
17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?
18. Serious Stories. What would a true utopia be like?
19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?
20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?
Summary: Five theses, two lemmas, and a couple of strategic implications.
All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.
Further reading:
- Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
- So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
- Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
- The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.
I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:
A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.
B. Via the Five Theses, to demonstrate the importance of Friendly AI research.
C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.
D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.
What do you think? What would you add, remove, or alter?
Evaluating the feasibility of SI's plan
(With Kaj Sotala)
SI's current R&D plan seems to go as follows:
1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.
The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.
The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.
A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed. These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems, or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely.
SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory. It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.
The hoped-for theory is intended to provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.
SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.
Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.
We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.
Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.
Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."
I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.
If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.
Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.
What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?
(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)
And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.
Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.
Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.
In summary:
1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.
2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.
3. Even the provably friendly design will face real-world compromises and errors in its implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.
Replaceability as a virtue
I propose it is altruistic to be replaceable and therefore, those who strive to be altruistic should strive to be replaceable.
As far as I can Google, this does not seem to have been proposed before. LW should be a good place to discuss it. A community interested in rational and ethical behavior, and in how superintelligent machines may decide to replace mankind, should at least bother to refute the following argument.
Replaceability
Replaceability is "the state of being replaceable". It isn't binary. The price of the replacement matters: so a cookie is more replaceable than a big wedding cake. Adequacy of the replacement also makes a difference: a piston for an ancient Rolls Royce is less replaceable than one in a modern car, because it has to be hand-crafted and will be distinguishable. So something is more or less replaceable depending on the price and quality of its replacement.
Replaceability could be thought of as the inverse of the cost of having to replace something. Something that's very replaceable has a low cost of replacement, while something that lacks replaceability has a high (up to unfeasible) cost of replacement. The cost of replacement plays into Total Cost of Ownership, and everything economists know about that applies. It seems pretty obvious that replaceability of possessions is good, much like cheap availability is good.
Some things (historical artifacts, art pieces) are valued highly precisely because of their irreplacability. Although a few things could be said about the resale value of such objects, I'll simplify and contend these valuations are not rational.
The practical example
Anne manages the central database of Beth's company. She's the only one who has access to that database, the skillset required for managing it, and an understanding of how it all works; she has a monopoly to that combination.
This monopoly gives Anne control over her own replacement cost. If she works according to the state of the art, writes extensive and up-to-date documentation, makes proper backups etc she can be very replaceable, because her monopoly will be easily broken. If she refuses to explain what she's doing, creates weird and fragile workarounds and documents the database badly she can reduce her replaceability and defend her monopoly. (A well-obfuscated database can take months for a replacement database manager to handle confidently.)
So Beth may still choose to replace Anne, but Anne can influence how expensive that'll be for Beth. She can at least make sure her replacement needs to be shown the ropes, so she can't be fired on a whim. But she might go further and practically hold the database hostage, which would certainly help her in salary negotiations if she does it right.
This makes it pretty clear how Anne can act altruistically in this situation, and how she can act selfishly. Doesn't it?
The moral argument
To Anne, her replacement cost is an externality and an influence on the length and terms of her employment. To maximize the length of her employment and her salary, her replacement cost would have to be high.
To Beth, Anne's replacement cost is part of the cost of employing her and of course she wants it to be low. This is true for any pair of employer and employee: Anne is unusual only in that she has a great degree of influence on her replacement cost.
Therefore, if Anne documents her database properly etc, this increases her replaceability and constitutes altruistic behavior. Unless she values the positive feeling of doing her employer a favor more highly than she values the money she might make by avoiding replacement, this might even be true altruism.
Unless I suck at Google, replaceability doesn't seem to have been discussed as an aspect of altruism. The two reasons for that I can see are:
- replacing people is painful to think about
- and it seems futile as long as people aren't replaceable in more than very specific functions anyway.
But we don't want or get the choice to kill one person to save the life of five, either, and such practical improbabilities shouldn't stop us from considering our moral decisions. This is especially true in a world where copies, and hence replacements, of people are starting to look possible at least in principle.
Singularity-related hypotheticals
- In some reasonably-near future, software is getting better at modeling people. We still don't know what makes a process intelligent, but we can feed a couple of videos and a bunch of psychological data points into a people modeler, extrapolate everything else using a standard population and the resulting model can have a conversation that could fool a four-year-old. The technology is already good enough for models of pets. While convincing models of complex personalities are at least another decade away, the tech is starting to become good enough for senile grandmothers.
Obviously no-one wants granny to die. But the kids would like to keep a model of granny, and they'd like to make the model before the Alzheimer's gets any worse, while granny is terrified she'll get no more visits to her retirement home.
What's the ethical thing to do here? Surely the relatives should keep visiting granny. Could granny maybe have a model made, but keep it to herself, for release only through her Last Will and Testament? And wouldn't it be truly awful of her to refuse to do that? - Only slightly further into the future, we're still mortal, but cryonics does appear to be working. Unfrozen people need regular medical aid, but the technology is only getting better and anyway, the point is: something we can believe to be them can indeed come back.
Some refuse to wait out these Dark Ages; they get themselves frozen for nonmedical reasons, to fastforward across decades or centuries into a time when the really awesome stuff will be happening, and to get the immortality technologies they hope will be developed by then.
In this scenario, wouldn't fastforwarders be considered selfish, because they impose on their friends the pain of their absence? And wouldn't their friends mind it less if the fastforwarders went to the trouble of having a good model (see above) made first? - On some distant future Earth, minds can be uploaded completely. Brains can be modeled and recreated so effectively that people can make living, breathing copies of themselves and experience the inability to tell which instance is the copy and which is the original.
Of course many adherents of soul theories reject this as blasphemous. A couple more sophisticated thinkers worry if this doesn't devalue individuals to the point where superhuman AIs might conclude that as long as copies of everyone are stored on some hard drive orbiting Pluto, nothing of value is lost if every meatbody gets devoured into more hardware. Bottom line is: Effective immortality is available, but some refuse it out of principle.
In this world, wouldn't those who make themselves fully and infinitely replaceable want the same for everyone they love? Wouldn't they consider it a dreadful imposition if a friend or relative refused immortality? After all, wasn't not having to say goodbye anymore kind of the point?
These questions haven't come up in the real world because people have never been replaceable in more than very specific functions. But I hope you'll agree that if and when people become more replaceable, that will be regarded as a good thing, and it will be regarded as virtuous to use these technologies as they become available, because it spares one's friends and family some or all of the cost of replacing oneself.
Replaceability as an altruist virtue
And if replaceability is altruistic in this hypothetical future, as well as in the limited sense of Anne and Beth, that implies replaceability is altruistic now. And even now, there are things we can do to increase our replaceability, i.e. to reduce the cost our bereaved will incur when they have to replace us. We can teach all our (valuable) skills, so others can replace us as providers of these skills. We can not have (relevant) secrets, so others can learn what we know and replace us as sources of that knowledge. We can endeavour to live as long as possible, to postpone the cost. We can sign up for cryonics. There are surely other things each of us could do to increase our replaceability, but I can't think of any an altruist wouldn't consider virtuous.
As an altruist, I conclude that replaceability is a prosocial, unselfish trait, something we'd want our friends to have, in other words: a virtue. I'd go as far as to say that even bothering to set up a good Last Will and Testament is virtuous precisely because it reduces the cost my bereaved will incur when they have to replace me. And although none of us can be truly easily replaceable as of yet, I suggest we honor those who make themselves replaceable, and are proud of whatever replaceability we ourselves attain.
So, how replaceable are you?
[draft] Responses to Catastrophic AGI Risk: A Survey
Here's the biggest thing that I've been working on for the last several months:
Responses to Catastrophic AGI Risk: A Survey
Kaj Sotala, Roman Yampolskiy, and Luke MuehlhauserAbstract: Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next 20-100 years. It has been suggested that this may become a catastrophic risk, threatening to do major damage on a global scale. After briefly summarizing the arguments for why AGI may become a catastrophic risk, we survey various proposed responses to AGI risk. We consider societal proposals, proposals for constraining the AGIs’ behavior from the outside, and for creating AGIs in such a way that they are inherently safe.
This doesn't aim to be a very strongly argumentative paper, though it does comment on the various proposals from an SI-ish point of view. Rather, it attempts to provide a survey of all the major AGI-risk related proposals that have been made so far, and to provide some thoughts on their respective strengths and weaknesses. Before writing this paper, we hadn't encountered anyone who'd have been familiar with all of these proposals - not to mention that even we ourselves weren't familiar with all of them! Hopefully, this should become a useful starting point for anyone who's at all interested in AGI risk or Friendly AI.
The draft will be public and open for comments for one week (until Nov 23rd), after which we'll incorporate the final edits and send it off for review. We're currently aiming to have it published in the sequel volume to Singularity Hypotheses.
EDIT: I've now hidden the draft from public view (so as to avoid annoying future publishers who may not like early drafts floating around before the work has been accepted for publication) while I'm incorporating all the feedback that we got. Thanks to everyone who commented!
[Link] Singularity Summit Talks
Videos of the 2012 Singularity Summit talks are now online.
Previous discussion of the Summit here.
Singularity Summit 2012, discuss it here
How was it? Which speakers delivered according to expectations?
Which topics were left unresolved?
Were any topics resolved?
Whatever you have to say about it, say it here.
Suggestion: if you are going to comment, mention "I was there" just so we know who was or wasn't.
[link] Pei Wang: Motivation Management in AGI Systems
Related post: Muehlhauser-Wang Dialogue.
Motivation Management in AGI Systems, a paper to be published at AGI-12.
Abstract. AGI systems should be able to manage its motivations or goals that are persistent, spontaneous, mutually restricting, and changing over time. A mechanism for handles this kind of goals is introduced and discussed.
From the discussion section:
The major conclusion argued in this paper is that an AGI system should always maintain a goal structure (or whatever it is called) which contains multiple goals that are separately specified, with the properties that
- Some of the goals are accurately specified, and can be fully achieved, while some others are vaguely specified and only partially achievable, but nevertheless have impact on the system's decisions.
- The goals may conflict with each other on what the system should do at a moment, and cannot be achieved all together. Very often the system has to make compromises among the goals.
- Due to the restriction in computational resources, the system cannot take all existing goals into account when making each decision, and nor can it keep a complete record of the goal derivation history.
- The designers and users are responsible for the input goals of an AGI system, from which all the other goals are derived, according to the system's experience. There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplified situations.
One area that is closely related to goal management is AI ethics. The previous discussions focused on the goal the designers assign to an AGI system ("super goal" or "final goal"), with the implicit assumption that such a goal will decide the consequences caused by the A(G)I systems. However, the above analysis shows that though the input goals are indeed important, they are not the dominating factor that decides the broad impact of AI to human society. Since no AGI system can be omniscient and omnipotent, to be "general-purpose" means such a system has to handle problems for which its knowledge and resources are insufficient [16, 18], and one direct consequence is that its actions may produce unanticipated results. This consequence, plus the previous conclusion that the effective goal for an action may be inconsistent with the input goals, will render many of the previous suggestions mostly irrelevant to AI ethics.
For example, Yudkowsky's "Friendly AI" agenda is based on the assumption that "a true AI might remain knowably stable in its goals, even after carrying out a large number of self-modications" [22]. The problem about this assumption is that unless we are talking about an axiomatic system with unlimited resources, we cannot assume the system can accurately know the consequence of its actions. Furthermore, as argued previously, the goals in an intelligent system inevitable change as its experience grows, which is not necessarily a bad thing - after all, our "human nature" gradually grows out of, and deviates from, our "animal nature", at both the species level and the individual level.
Omohundro argued that no matter what input goals are given to an AGI system, it usually will derive some common "basic drives", including "be self-protective" and "to acquire resources" [1], which leads some people to worry that such a system will become unethical. According to our previous analysis, the producing of these goals are indeed very likely, but it is only half of the story. A system with a resource-acquisition goal does not necessarily attempts to achieve it at all cost, without considering its other goals. Again, consider the human beings - everyone has some goals that can become dangerous (either to oneself or to the others) if pursued at all costs. The proper solution, both to human ethics and to AGI ethics, is to prevent this kind of goal from becoming dominant, rather than from being formed.
Dragon Ball's Hyperbolic Time Chamber
A time dilation tool from an anime is discussed for its practical use on Earth; there seem surprisingly few uses and none that will change the world, due to the severe penalties humans would incur while using it, and basic constraints like Amdahl's law limit the scientific uses. A comparison with the position of an Artificial Intelligence such as an emulated human brain seems fair, except most of the time dilation disadvantages do not apply or can be ameliorated and hence any speedups could be quite effectively exploited. I suggest that skeptics of the idea that speedups give advantages are implicitly working off the crippled time dilation tool and not making allowance for the disanalogies.
Master version on gwern.net
Cynical explanations of FAI critics (including myself)
Related Posts: A cynical explanation for why rationalists worry about FAI, A belief propagation graph
Lately I've been pondering the fact that while there are many critics of SIAI and its plan to form a team to build FAI, few of us seem to agree on what SIAI or we should do instead. Here are some of the alternative suggestions offered so far:
- work on computer security
- work to improve laws and institutions
- work on mind uploading
- work on intelligence amplification
- work on non-autonomous AI (e.g., Oracle AI, "Tool AI", automated formal reasoning systems, etc.)
- work on academically "mainstream" AGI approaches or trust that those researchers know what they are doing
- stop worrying about the Singularity and work on more mundane goals
A question about Singularity Summit 2012
I am planning on going to the Singularity Summit this year, I applied for a student discount earlier on - approximately 3 weeks ago. Still haven't heard back. I am curious to hear if anyone else has applied for student discount and got a reply. I am studying in the UK, so really want to wrap my logistics issues up quickly! Hence, anyone else in the same boat?
Glenn Beck discusses the Singularity, cites SI researchers
From the final chapter of his new book Cowards, titled "Adapt or Die: The Coming Intelligence Explosion."
The year is 1678 and you’ve just arrived in England via a time machine. You take out your new iPhone in front of a group of scientists who have gathered to marvel at your arrival.
“Siri,” you say, addressing the phone’s voice-activated artificial intelligence system, “play me some Beethoven.”
Dunh-Dunh-Dunh-Duuunnnhhh! The famous opening notes of Beethoven’s Fifth Symphony, stored in your music library, play loudly.
“Siri, call my mother.”
Your mother’s face appears on the screen, a Hawaiian beach behind her. “Hi, Mom!” you say. “How many fingers am I holding up?”
“Three,” she correctly answers. “Why haven’t you called more—”
“Thanks, Mom! Gotta run!” you interrupt, hanging up.
“Now,” you say. “Watch this.”
Your new friends look at the iPhone expectantly.
“Siri, I need to hide a body.”
Without hesitation, Siri asks: “What kind of place are you looking for? Mines, reservoirs, metal foundries, dumps, or swamps?” (I’m not kidding. If you have an iPhone 4S, try it.)
You respond “Swamps,” and Siri pulls up a satellite map showing you nearby swamps.
The scientists are shocked into silence. What is this thing that plays music, instantly teleports video of someone across the globe, helps you get away with murder, and is small enough to fit into a pocket?
At best, your seventeenth-century friends would worship you as a messenger of God. At worst, you’d be burned at the stake for witchcraft. After all, as science fiction author Arthur C. Clarke once said, “Any sufficiently advanced technology is indistinguishable from magic.”
Now, imagine telling this group that capitalism and representative democracy will take the world by storm, lifting hundreds of millions of people out of poverty. Imagine telling them their descendants will eradicate smallpox and regularly live seventy-five or more years. Imagine telling them that men will walk on the moon, that planes, flying hundreds of miles an hour, will transport people around the world, or that cities will be filled with buildings reaching thousands of feet into the air.
They’d probably escort you to the madhouse.
Unless, that is, one of the people in that group had been a man named Ray Kurzweil.
Kurzweil is an inventor and futurist who has done a better job than most at predicting the future. Dozens of the predictions from his 1990 book The Age of Intelligent Machines came true during the 1990s and 2000s. His follow-up book, The Age of Spiritual Machines, published in 1999, fared even better. Of the 147 predictions that Kurzweil made for 2009, 78 percent turned out to be entirely correct, and another 8 percent were roughly correct. For example, even though every portable computer had a keyboard in 1999, Kurzweil predicted that most portable computers would lack a keyboard by 2009. It turns out he was right: by 2009, most portable computers were MP3 players, smartphones, tablets, portable game machines, and other devices that lacked keyboards.
Kurzweil is most famous for his “law of accelerating returns,” the idea that technological progress is generally “exponential” (like a hockey stick, curving up sharply) rather than “linear” (like a straight line, rising slowly). In nongeek-speak that means that our knowledge is like the compound interest you get on your bank account: it increases exponentially as time goes on because it keeps building on itself. We won’t experience one hundred years of progress in the twenty-first century, but rather twenty thousand years of progress (measured at today’s rate).
Many experts have criticized Kurzweil’s forecasting methods, but a careful and extensive review of technological trends by researchers at the Santa Fe Institute came to the same basic conclusion: technological progress generally tends to be exponential (or even faster than exponential), not linear.
So, what does this mean? In his 2005 book The Singularity Is Near, Kurzweil shares his predictions for the next few decades:
- In our current decade, Kurzweil expects real-time translation tools and automatic house-cleaning robots to become common.
- In the 2020s he expects to see the invention of tiny robots that can be injected into our bodies to intelligently find and repair damage and cure infections.
- By the 2030s he expects “mind uploading” to be possible, meaning that your memories and personality and consciousness could be copied to a machine. You could then make backup copies of yourself, and achieve a kind of technological immortality.
[sidebar]
Age of the Machines?
“We became the dominant species on this planet by being the most intelligent species around. This century we are going to cede that crown to machines. After we do that, it will be them steering history rather than us.”
—Jaan Tallinn, co-creator of Skype and Kazaa
[/sidebar]
If any of that sounds absurd, remember again how absurd the eradication of smallpox or the iPhone 4S would have seemed to those seventeenth-century scientists. That’s because the human brain is conditioned to believe that the past is a great predictor of the future. While that might work fine in some areas, technology is not one of them. Just because it took decades to put two hundred transistors onto a computer chip doesn’t mean that it will take decades to get to four hundred. In fact, Moore’s Law, which states (roughly) that computing power doubles every two years, shows how technological progress must be thought of in terms of “hockey stick” progress, not “straight line” progress. Moore’s Law has held for more than half a century already (we can currently fit 2.6 billion transistors onto a single chip) and there’s little reason to expect that it won’t continue to.
But the aspect of his book that has the most far-ranging ramifications for us is Kurzweil’s prediction that we will achieve a “technological singularity” in 2045. He defines this term rather vaguely as “a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed.”
Part of what Kurzweil is talking about is based on an older, more precise notion of “technological singularity” called an intelligence explosion. An intelligence explosion is what happens when we create artificial intelligence (AI) that is better than we are at the task of designing artificial intelligences. If the AI we create can improve its own intelligence without waiting for humans to make the next innovation, this will make it even more capable of improving its intelligence, which will . . . well, you get the point. The AI can, with enough improvements, make itself smarter than all of us mere humans put together.
The really exciting part (or the scary part, if your vision of the future is more like the movie The Terminator) is that, once the intelligence explosion happens, we’ll get an AI that is as superior to us at science, politics, invention, and social skills as your computer’s calculator is to you at arithmetic. The problems that have occupied mankind for decades— curing diseases, finding better energy sources, etc.— could, in many cases, be solved in a matter of weeks or months.
Again, this might sound far-fetched, but Ray Kurzweil isn’t the only one who thinks an intelligence explosion could occur sometime this century. Justin Rattner, the chief technology officer at Intel, predicts some kind of Singularity by 2048. Michael Nielsen, co-author of the leading textbook on quantum computation, thinks there’s a decent chance of an intelligence explosion by 2100. Richard Sutton, one of the biggest names in AI, predicts an intelligence explosion near the middle of the century. Leading philosopher David Chalmers is 50 percent confident an intelligence explosion will occur by 2100. Participants at a 2009 conference on AI tended to be 50 percent confident that an intelligence explosion would occur by 2045.
If we can properly prepare for the intelligence explosion and ensure that it goes well for humanity, it could be the best thing that has ever happened on this fragile planet. Consider the difference between humans and chimpanzees, which share 95 percent of their genetic code. A relatively small difference in intelligence gave humans the ability to invent farming, writing, science, democracy, capitalism, birth control, vaccines, space travel, and iPhones— all while chimpanzees kept flinging poo at each other.
[sidebar]
Intelligent Design?
The thought that machines could one day have superhuman abilities should make us nervous. Once the machines are smarter and more capable than we are, we won’t be able to negotiate with them any more than chimpanzees can negotiate with us. What if the machines don’t want the same things we do?
The truth, unfortunately, is that every kind of AI we know how to build today definitely would not want the same things we do. To build an AI that does, we would need a more flexible “decision theory” for AI design and new techniques for making sense of human preferences. I know that sounds kind of nerdy, but AIs are made of math and so math is really important for choosing which results you get from building an AI.
These are the kinds of research problems being tackled by the Singularity Institute in America and the Future of Humanity Institute in Great Britain. Unfortunately, our silly species still spends more money each year on lipstick research than we do on figuring out how to make sure that the most important event of this century (maybe of all human history)— the intelligence explosion— actually goes well for us.
[/sidebar]
Likewise, self-improving machines could perform scientific experiments and build new technologies much faster and more intelligently than humans can. Curing cancer, finding clean energy, and extending life expectancies would be child’s play for them. Imagine living out your own personal fantasy in a different virtual world every day. Imagine exploring the galaxy at near light speed, with a few backup copies of your mind safe at home on earth in case you run into an exploding supernova. Imagine a world where resources are harvested so efficiently that everyone’s basic needs are taken care of, and political and economic incentives are so intelligently fine-tuned that “world peace” becomes, for the first time ever, more than a Super Bowl halftime show slogan.
With self-improving AI we may be able to eradicate suffering and death just as we once eradicated smallpox. It is not the limits of nature that prevent us from doing this, but only the limits of our current understanding. It may sound like a paradox, but it’s our brains that prevent us from fully understanding our brains.
Turf Wars
At this point you might be asking yourself: “Why is this topic in this book? What does any of this have to do with the economy or national security or politics?”
In fact, it has everything to do with all of those issues, plus a whole lot more. The intelligence explosion will bring about change on a scale and scope not seen in the history of the world. If we don’t prepare for it, things could get very bad, very fast. But if we do prepare for it, the intelligence explosion could be the best thing that has happened since . . . literally ever.
But before we get to the kind of life-altering progress that would come after the Singularity, we will first have to deal with a lot of smaller changes, many of which will throw entire industries and ways of life into turmoil. Take the music business, for example. It was not long ago that stores like Tower Records and Sam Goody were doing billions of dollars a year in compact disc sales; now people buy music from home via the Internet. Publishing is currently facing a similar upheaval. Newspapers and magazines have struggled to keep subscribers, booksellers like Borders have been forced into bankruptcy, and customers are forcing publishers to switch to ebooks faster than the publishers might like.
All of this is to say that some people are already witnessing the early stages of upheaval firsthand. But for everyone else, there is still a feeling that something is different this time; that all of those years of education and experience might be turned upside down in an instant. They might not be able to identify it exactly but they realize that the world they’ve known for forty, fifty, or sixty years is no longer the same.
There’s a good reason for that. We feel it and sense it because it’s true. It’s happening. There’s absolutely no question that the world in 2030 will be a very different place than the one we live in today. But there is a question, a large one, about whether that place will be better or worse.
It’s human nature to resist change. We worry about our families, our careers, and our bank accounts. The executives in industries that are already experiencing cataclysmic shifts would much prefer to go back to the way things were ten years ago, when people still bought music, magazines, and books in stores. The future was predictable. Humans like that; it’s part of our nature.
But predictability is no longer an option. The intelligence explosion, when it comes in earnest, is going to change everything— we can either be prepared for it and take advantage of it, or we can resist it and get run over.
Unfortunately, there are a good number of people who are going to resist it. Not only those in affected industries, but those who hold power at all levels. They see how technology is cutting out the middlemen, how people are becoming empowered, how bloggers can break national news and YouTube videos can create superstars.
And they don’t like it.
A Battle for the Future
Power bases in business and politics that have been forged over decades, if not centuries, are being threatened with extinction, and they know it. So the owners of that power are trying to hold on. They think they can do that by dragging us backward. They think that, by growing the public’s dependency on government, by taking away the entrepreneurial spirit and rewards and by limiting personal freedoms, they can slow down progress.
But they’re wrong. The intelligence explosion is coming so long as science itself continues. Trying to put the genie back in the bottle by dragging us toward serfdom won’t stop it and will, in fact, only leave the world with an economy and society that are completely unprepared for the amazing things that it could bring.
Robin Hanson, author of “The Economics of the Singularity” and an associate professor of economics at George Mason University, wrote that after the Singularity, “The world economy, which now doubles in 15 years or so, would soon double in somewhere from a week to a month.”
That is unfathomable. But even if the rate were much slower, say a doubling of the world economy in two years, the shock-waves from that kind of growth would still change everything we’ve come to know and rely on. A machine could offer the ideal farming methods to double or triple crop production, but it can’t force a farmer or an industry to implement them. A machine could find the cure for cancer, but it would be meaningless if the pharmaceutical industry or Food and Drug Administration refused to allow it. The machines won’t be the problem; humans will be.
And that’s why I wanted to write about this topic. We are at the forefront of something great, something that will make the Industrial Revolution look in comparison like a child discovering his hands. But we have to be prepared. We must be open to the changes that will come, because they will come. Only when we accept that will we be in a position to thrive. We can’t allow politicians to blame progress for our problems. We can’t allow entrenched bureaucrats and power-hungry executives to influence a future that they may have no place in.
Many people are afraid of these changes— of course they are: it’s part of being human to fear the unknown— but we can’t be so entrenched in the way the world works now that we are unable to handle change out of fear for what those changes might bring.
Change is going to be as much a part of our future as it has been of our past. Yes, it will happen faster and the changes themselves will be far more dramatic, but if we prepare for it, the change will mostly be positive. But that preparation is the key: we need to become more well-rounded as individuals so that we’re able to constantly adapt to new ways of doing things. In the future, the way you do your job may change four to five or fifty times over the course of your life. Those who cannot, or will not, adapt will be left behind.
At the same time, the Singularity will give many more people the opportunity to be successful. Because things will change so rapidly there is a much greater likelihood that people will find something they excel at. But it could also mean that people’s successes are much shorter-lived. The days of someone becoming a legend in any one business (think Clive Davis in music, Steven Spielberg in movies, or the Hearst family in publishing) are likely over. But those who embrace and adapt to the coming changes, and surround themselves with others who have done the same, will flourish.
When major companies, set in their ways, try to convince us that change is bad and that we must stick to the status quo, no matter how much human inquisitiveness and ingenuity try to propel us forward, we must look past them. We must know in our hearts that these changes will come, and that if we welcome them into our world, we’ll become more successful, more free, and more full of light than we could have ever possibly imagined.
Ray Kurzweil once wrote, “The Singularity is near.” The only question will be whether we are ready for it.
The citations for the chapter include:
- Luke Muehlhauser and Anna Salamon, "Intelligence Explosion: Evidence and Import"
- Daniel Dewey, "Learning What to Value"
- Eliezer Yudkowsky, "Artificial Intelligence as a Positive and a Negative Factor in Global Risk"
- Luke Muehlhauser and Louie Helm, "The Singularity and Machine Ethics"
- Luke Muehlhauser, "So You Want to Save the World"
- Michael Anissimov, "The Benefits of a Successful Singularity"
Questions on SI Research
Hello LessWrong,
As one of my assignments at the Singularity Institute (SI), I am writing a research FAQ answering the most frequently asked questions regarding the Singularity Institute's research program.
For a short summary of what SI is about, see our concise summary.
Here are some examples of questions I'm currently planning to include:
1) who conducts research at SI?
2) what are the specific research topics being investigated?
3) what is the history of SI's research program?
4) where does SI see its research program in 5, 10, and 20 years?
5) what other organizations conduct research similar to SI?
Please submit other questions that come to mind below. Unfortunately, due to limited time, we cannot answer every question posed to us. However, I hope to answer some of the questions that receive the most upvotes. Thank you for your participation!
Muehlhauser-Goertzel Dialogue, Part 2
Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Ben Goertzel is the Chairman at the AGI company Novamente and founder of the AGI conference series.
Continued from part 1...
Luke:
[Apr 11th, 2012]
I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or "convergent instrumenta l goals" required that scenario to hold.
Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom 's more recent paper " The Superintelligent Will." Which parts of Nick's argument fail to persuade you?
Ben:
[Apr 12th, 2012]
Well, for one thing, I think his
Orthogonality Thesis
Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.
is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative
Interdependency Thesis
Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.
This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.
Bostrom says
It might be possible through deliberate effort to construct a superintelligence that values ... human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.
but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.
One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality....
Regarding his
The Instrumental Convergence Thesis
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
the first clause makes sense to me,
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations
but it doesn’t seem to me to justify the second clause
implying that these instrumental values are likely to be pursued by many intelligent agents.
The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.
In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions -- but the assumptions are not justified in the paper, and I don’t buy them at all.
Luke:
[Apr. 19, 2012]
Ben,
Let me explain why I think that:
(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.
Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.
Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.
Perhaps the word "intelligence" is getting in our way. Let's define a notion of " optimization power," which measures (roughly) an agent's ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).
Which parts of this analysis do you think are wrong?
Ben:
[Apr. 20, 2012]
It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says
instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
Note the somewhat weaselly reference to a “wide range” of goals and situations -- not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like
instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
then I would disagree with the first clause of his statement (“instrumental values can be identified which...”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.
About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”
So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want," I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ....
Luke:
[May 1, 2012]
I think our dialogue has reached the point of diminishing marginal returns, so I'll conclude with just a few points and let you have the last word.
On convergent instrumental goals, I encourage readers to read " The Superintelligent Will" and make up their own minds.
On the convergence of advanced intelligent systems toward optimization behavior, I'll point you to Omohundro (2007).
Ben:
Well, it's been a fun chat. Although it hasn't really covered much new ground, there have been some new phrasings and minor new twists.
One thing I'm repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree -- for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences -- you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions -- but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that "superhuman AGI might plausibly kill everyone." The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a "wide range" of intelligences. Etc.
On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can't fully rationally substantiate this intuition either -- all I can really fully rationally argue for is something weaker like "It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly." In my case just like yours, reason is far weaker than intuition.
Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else's work banned based on a rough intuition is pretty hard. To ban someone else's work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.
What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.
Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services -- not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it's good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI's views are actually correct may be peripheral to the organization's main value and impact.
I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)
Bootstrapping to Friendliness
"All that is necessary for evil to triumph is that good men do nothing."
155,000 people are dying, on average, every day. For those of us who are preference utilitarians, and also believe that a Friendly singularity is possible, and capable of ending this state of affairs, it also puts a great deal of pressure on us. It doesn't give us leave to be sloppy (because human extinction, even multiplied by a low probability, is a massive negative utility). But, if we see a way to achieve similar results in a shorter time frame, the cost to human life of not taking it is simply unacceptable.
I have some concerns about CEV on a conceptual level, but I'm leaving those aside for the time being. My concern is that most of the organizations concerned with a first-mover X-risk are not in a position to be that first mover -- and, furthermore, they're not moving in that direction. That includes the Singularity Institute. Trying to operationalize CEV seems like a good way to get an awful lot of smart people bashing their heads against a wall while clever idiots trundle ahead with their own experiments. I'm not saying that we should be hasty, but I am suggesting that we need to be careful of getting stuck in dark intellectual forests with lots of things that are fun to talk about until an idiot with the tinderbox burns it down.
My point, in short, is that we need to be looking for better ways to do things, and to do them extremely quickly. We are working on a very, very, existentially tight schedule.
So, if we're looking for quicker paths to a Friendly, first-mover singularity, I'd like to talk about one that seems attractive to me. Maybe it's a useful idea. If not, then at least I won't waste any more time thinking about it. Either way, I'm going to lay it out and you guys can see what you think.
So, Friendliness is a hard problem. Exactly how hard, we don't know, but a lot of smart people have radically different ideas of how to attack it, and they've all put a lot of thought into it, and that's not a good sign. However, designing a strongly superhuman AI is also a hard problem. Probably much harder than a human can solve. The good news is, we don't expect that we'll have to. If we can build something just a little bit smarter than we are, we expect that bootstrapping process to take off without obvious limit.
So let's apply the same methodology to Friendliness. General goal optimizers are tools, after all. Probably the most powerful tools that have ever existed, for that matter. Let's say we build something that's not Friendly. Not something we want running the universe -- but, Friendly enough. Friendly enough that it's not going to kill us all. Friendly enough not to succumb to the pedantic genie problem. Friendly enough we can use it to build what we really want, be it CEV or something else.
I'm going to sketch out an architecture of what such a system might look like. Do bear in mind this is just a sketch, and in no way a formal, safe, foolproof design spec.
So, let's say we have an agent with the ability to convert unstructured data into symbolic relationships that represent the world, with explicitly demarcated levels of abstraction. Let's say the system has the ability to build Bayesian causal relationships out of its data points over time, and construct efficient, predictive models of the behavior of the concepts in the world. Let's also say that the system has the ability to take a symbolic representation of a desired future distribution of universes, a symbolic representation of the current universe, and map between them, finding valid chains of causality leading from now to then, probably using a solid decision theory background. These are all hard problems to solve, but they're the same problems everyone else is solving too.
This system, if you just specify parameters about the future and turn it loose, is not even a little bit Friendly. But let's say you do this: first, provide it with a tremendous amount of data, up to and including the entire available internet, if necessary. Everything it needs to build extremely effective models of human beings, with strongly generalized predictive power. Then you incorporate one or more of those models (say, a group of trusted people) as a functional components: the system uses them to generalize natural language instructions first into a symbolic graph, and then into something actionable, working out the details of what it meant, rather than what is said. Then, when the system is finding valid paths of causality, it takes its model of the state of the universe at the end of each course of action, feeds them into its human-models, and gives them a veto vote. Think of it as the emergency regret button, iterated computationally for each possibility considered by the genie. Any of them that any of the person-models find unacceptable are disregarded.
(small side note: as described here, the models would probably eventually be indistinguishable from uploaded minds, and would be created, simulated for a short time, and destroyed uncountable trillions of times -- you'd either need to drastically limit the simulation depth of a models, or ensure that everyone who you signed up to be one of the models knew the sacrifice they were making)
So, what you've got, plus or minus some spit and polish, is a very powerful optimization engine that understands what you mean, and disregards obviously unacceptable possibilities. If you ask it for a truly Friendly AI, it will help you first figure out what you mean by that, then help you build it, then help you formally prove that it's safe. It would turn itself off if you asked it too, and meant it. It would also exterminate the human species if you asked it to and meant it. Not Friendly, but Friendly enough to build something better.
With this approach, the position of the Friendly AI researcher changes. Instead of being in an arms race with the rest of the AI field with a massive handicap (having to solve two incredibly hard problems against opponents who only have to solve one), we only have to solve a relatively simpler problem (building a Friendly-enough AI), which we can then instruct to sabotage unFriendly AI projects and buy some time to develop the real deal. It turns it into a fair fight, one that we might actually win.
Anyone have any thoughts on this idea?
Intelligence Explosion vs. Co-operative Explosion
Abstract: In the FOOM debate, Eliezer emphasizes 'optimization power', something like intelligence, as the main thing that makes both evolution and humans so powerful. A different choice of abstractions says that the main thing that's been giving various organisms - from single-celled creatures to wasps to humans - an advantage is the capability to form superorganisms, thus reaping the gains of specialization and shifting evolutionary selection pressure to the level of the superorganism. There seem to be several ways by which a technological singularity could involve the creation of new kinds of superorganisms, which would then reap benefits above and beyond those that individual humans can achieve, and which would quite likely have quite different values. This strongly suggests that even if one is not worried about the intelligence explosion (because of e.g. finding a hard takeoff improbable), one should still be worried about the co-operative explosion.
After watching Jonathan Haidt's excellent new TEDTalk yesterday, I bought his latest book, The Righteous Mind: Why Good People Are Divided by Politics and Religion. At one point, Haidt has a discussion of evolutionary superorganisms - cases where previously separate organisms have joined together into a single superorganism, shifting evolution's selection pressure to operate on the level of the superorganism and avoiding the usual pitfalls that block group selection (excerpts below). With an increased ability for the previously-separate organisms to co-operate, these new superorganisms can often out-compete simpler organisms.
Suppose you entered a boat race. One hundred rowers, each in a separate rowboat, set out on a ten-mile race along a wide and slow-moving river. The first to cross the finish line will win $10,000. Halfway into the race, you’re in the lead. But then, from out of nowhere, you’re passed by a boat with two rowers, each pulling just one oar. No fair! Two rowers joined together into one boat! And then, stranger still, you watch as that rowboat is overtaken by a train of three such rowboats, all tied together to form a single long boat. The rowers are identical septuplets. Six of them row in perfect synchrony while the seventh is the coxswain, steering the boat and calling out the beat for the rowers. But those cheaters are deprived of victory just before they cross the finish line, for they in turn are passed by an enterprising group of twenty-four sisters who rented a motorboat. It turns out that there are no rules in this race about what kinds of vehicles are allowed.
That was a metaphorical history of life on Earth. For the first billion years or so of life, the only organisms were prokaryotic cells (such as bacteria). Each was a solo operation, competing with others and reproducing copies of itself. But then, around 2 billion years ago, two bacteria somehow joined together inside a single membrane, which explains why mitochondria have their own DNA, unrelated to the DNA in the nucleus. These are the two-person rowboats in my example. Cells that had internal organelles could reap the benefits of cooperation and the division of labor (see Adam Smith). There was no longer any competition between these organelles, for they could reproduce only when the entire cell reproduced, so it was “one for all, all for one.” Life on Earth underwent what biologists call a “major transition.” Natural selection went on as it always had, but now there was a radically new kind of creature to be selected. There was a new kind of vehicle by which selfish genes could replicate themselves. Single-celled eukaryotes were wildly successful and spread throughout the oceans.
A few hundred million years later, some of these eukaryotes developed a novel adaptation: they stayed together after cell division to form multicellular organisms in which every cell had exactly the same genes. These are the three-boat septuplets in my example. Once again, competition is suppressed (because each cell can only reproduce if the organism reproduces, via its sperm or egg cells). A group of cells becomes an individual, able to divide labor among the cells (which specialize into limbs and organs). A powerful new kind of vehicle appears, and in a short span of time the world is covered with plants, animals, and fungi. It’s another major transition.
Major transitions are rare. The biologists John Maynard Smith and Eörs Szathmáry count just eight clear examples over the last 4 billion years (the last of which is human societies). But these transitions are among the most important events in biological history, and they are examples of multilevel selection at work. It’s the same story over and over again: Whenever a way is found to suppress free riding so that individual units can cooperate, work as a team, and divide labor, selection at the lower level becomes less important, selection at the higher level becomes more powerful, and that higher-level selection favors the most cohesive superorganisms. (A superorganism is an organism made out of smaller organisms.) As these superorganisms proliferate, they begin to compete with each other, and to evolve for greater success in that competition. This competition among superorganisms is one form of group selection. There is variation among the groups, and the fittest groups pass on their traits to future generations of groups.
Major transitions may be rare, but when they happen, the Earth often changes. Just look at what happened more than 100 million years ago when some wasps developed the trick of dividing labor between a queen (who lays all the eggs) and several kinds of workers who maintain the nest and bring back food to share. This trick was discovered by the early hymenoptera (members of the order that includes wasps, which gave rise to bees and ants) and it was discovered independently several dozen other times (by the ancestors of termites, naked mole rats, and some species of shrimp, aphids, beetles, and spiders). In each case, the free rider problem was surmounted and selfish genes began to craft relatively selfless group members who together constituted a supremely selfish group.
These groups were a new kind of vehicle: a hive or colony of close genetic relatives, which functioned as a unit (e.g., in foraging and fighting) and reproduced as a unit. These are the motorboating sisters in my example, taking advantage of technological innovations and mechanical engineering that had never before existed. It was another transition. Another kind of group began to function as though it were a single organism, and the genes that got to ride around in colonies crushed the genes that couldn’t “get it together” and rode around in the bodies of more selfish and solitary insects. The colonial insects represent just 2 percent of all insect species, but in a short period of time they claimed the best feeding and breeding sites for themselves, pushed their competitors to marginal grounds, and changed most of the Earth’s terrestrial ecosystems (for
example, by enabling the evolution of flowering plants, which need pollinators). Now they’re the majority, by weight, of all insects on Earth.
Haidt's argument is that color politics and other political mind-killingness are due to a set of adaptations that temporarily lets people merge into a superorganism and set individual interest aside. To a lesser extent, so are moral intuitions about things such as fairness and proportionality. Yes, it's a group selection argument. Haidt acknowledges that group selection has been unpopular in biology for a while, but notes that it has also been making a comeback recently, and cites e.g. the work on multi-level selection as supporting his thesis. I mention some of his references (which I have not yet read) below.
Anyway, the reason why I'm bringing this up is that I've been re-reading the FOOM debate of late, and in Life's Story Continues, Eliezer references some of the same evolutionary milestones as Haidt does. And while Eliezer also mentions that the cells provided a major co-operative advantage that allowed for specialization, he views this merely through the lens of optimization power, and dismisses e.g. unicellular eukaryotes with the words "meh, so what".
Cells: Force a set of genes, RNA strands, or catalytic chemicals to share a common reproductive fate. (This is the real point of the cell boundary, not "protection from the environment" - it keeps the fruits of chemical labor inside a spatial boundary.) But, as we've defined our abstractions, this is mostly a matter of optimization slope - the quality of the search neighborhood. The advent of cells opens up a tremendously rich new neighborhood defined by specialization and division of labor. It also increases the slope by ensuring that chemicals get to keep the fruits of their own labor in a spatial boundary, so that fitness advantages increase. But does it hit back to the meta-level? How you define that seems to me like a matter of taste. Cells don't quite change the mutate-reproduce-select cycle. But if we're going to define sexual recombination as a meta-level innovation, then we should also define cellular isolation as a meta-level innovation. (Life's Story Continues)
The interesting thing about the FOOM debate is that both Eliezer and Robin seem to talk a lot about the significance of co-operation, but they never quite take it up explicitly. Robin talks about the way that isolated groups typically aren't able to take over the world, because it's much more effective to co-operate with others than try to do everything yourself, or because information within the group tends to leak out to other parties. Eliezer talks about the way that cells allowed the ability for specialization, and how writing allowed human culture to accumulate and people to build on each other's inventions.
Even as Eliezer talks about intelligence, insight, and recursion, one could view this too as discussion about the power of specialization, co-operation and superorganisms - for intelligence seems to consist of a large number of specialized modules, all somehow merged to work in the same organism. And Robin seems to take the view of large groups of people acting as some kind of a loose superorganism, thus beating smaller groups that try to do things alone:
Independent competitors can more easily displace each another than interdependent ones. For example, since the unit of the industrial revolution seems to have been Western Europe, Britain who started it did not gain much relative to the rest of Western Europe, but Western Europe gained more substantially relative to outsiders. So as the world becomes interdependent on larger scales, smaller groups find it harder to displace others. (Outside View of Singularity)
[Today] innovations and advances in each part of the world depending on advances made in all other parts of the world. … Visions of a local singularity, in contrast, imagine that sudden technological advances in one small group essentially allow that group to suddenly grow big enough to take over everything. … The key common assumption is that of a very powerful but autonomous area of technology. Overall progress in that area must depend only on advances in this area, advances that a small group of researchers can continue to produce at will. And great progress in this area alone must be sufficient to let a small group essentially take over the world. …
[Consider also] complaints about the great specialization in modern academic and intellectual life. People complain that ordinary folks should know more science, so they can judge simple science arguments for themselves. … Many want policy debates to focus on intrinsic merits, rather than on appeals to authority. Many people wish students would study a wider range of subjects, and so be better able to see the big picture. And they wish researchers weren’t so penalized for working between disciplines, or for failing to cite every last paper someone might think is related somehow.
It seems to me plausible to attribute all of these dreams of autarky to people not yet coming fully to terms with our newly heightened interdependence. … We picture our ideal political unit and future home to be the largely self-sufficient small tribe of our evolutionary heritage. … I suspect that future software, manufacturing plants, and colonies will typically be much more dependent on everyone else than dreams of autonomy imagine. Yes, small isolated entities are getting more capable, but so are small non-isolated entities, and the later remain far more capable than the former. The riches that come from a worldwide division of labor have rightly seduced us away from many of our dreams of autarky. We may fantasize about dropping out of the rat race and living a life of ease on some tropical island. But very few of us ever do. (Dreams of Autarky)
Robin has also explicitly made the point that it is the difficulty of co-operation which suggests that we can keep ourselves safe from uploads or AIs with hostile intentions:
What if uploads decide to take over by force, refusing to pay back their loans and grabbing other forms of capital? Well for comparison, consider the question: What if our children take over, refusing to pay back their student loans or to pay for Social Security? Or consider: What if short people revolt tonight, and kill all the tall people?
In general, most societies have many potential subgroups who could plausibly take over by force, if they could coordinate among themselves. But such revolt is rare in practice; short people know that if they kill all the tall folks tonight, all the blond people might go next week, and who knows where it would all end? And short people are highly integrated into society; some of their best friends are tall people.
In contrast, violence is more common between geographic and culturally separated subgroups. Neighboring nations have gone to war, ethnic minorities have revolted against governments run by other ethnicities, and slaves and other sharply segregated economic classes have rebelled.
Thus the best way to keep the peace with uploads would be to allow them as full as possible integration in with the rest of society. Let them live and work with ordinary people, and let them loan and sell to each other through the same institutions they use to deal with ordinary humans. Banning uploads into space, the seas, or the attic so as not to shock other folks might be ill-advised. Imposing especially heavy upload taxes, or treating uploads as property, as just software someone owns or as non-human slaves like dogs, might be especially unwise. (If Uploads Come First)
Situations like war or violent rebellions are, arguably, cases where the "human superorganism adaptations" kick in the strongest - where people have the strongest propensity to view themselves primarily as a part of a group, and where they are the most ready to sacrifice themselves for the interest of the group. Indeed, Haidt quotes (both in the book and the TEDTalk) former soldiers who say that there's something very unique in the states of consciousness that war can produce:
So many books about war say the same thing, that nothing brings people together like war. And that bringing them together opens up the possibility of extraordinary self-transcendent experiences. I'm going to play for you an excerpt from this book by Glenn Gray. Gray was a soldier in the American army in World War II. And after the war he interviewed a lot of other soldiers and wrote about the experience of men in battle. Here's a key passage where he basically describes the staircase.
Glenn Gray: Many veterans will admit that the experience of communal effort in battle has been the high point of their lives. "I" passes insensibly into a "we," "my" becomes "our" and individual faith loses its central importance. I believe that it is nothing less than the assurance of immortality that makes self-sacrifice at these moments so relatively easy. I may fall, but I do not die, for that which is real in me goes forward and lives on in the comrades for whom I gave up my life.
So Robin, in If Uploads Come First, seems to basically be saying that uploads are dangerous if we let them become superorganisms. Usually, individuals have a large number of their own worries and priorities, and even if they did have much to gain by co-operating, they can't trust each other enough nor avoid the temptation to free-ride enough to really work together well enough to become dangerous.
Incidentally, this provides an easy rebuttal to the "corporations are already superintelligent" claim - while corporations have a variety of mechanisms for trying to provide their employees with the proper incentives, anyone who's worked for a big company knows that they employees tend to follow their own interests, even when they conflict with those of the company. It's certainly nothing like the situation with a cell, where the survival of each cell organ depends on the survival of the whole cell. If the cell dies, the cell organs die; if the company fails, the employees can just get a new job.
It would seem to me that, whatever your take on the intelligence explosion is, the current evolutionary history would strongly suggest that new kinds of superorganisms - larger, more cohesive than human groups, and less dependent on crippling their own rationality in order to maintain group cohesion - would be a major risk for humanity. This is not to say that an intelligence explosion wouldn't be dangerous as well - I have no idea what a mind that could think 1,000 times faster than me could do - but a co-operative explosion should be considered dangerous even if you thought a hard takeoff via recursive self-improvement (say) was impossible. And many of the ways for creating a superorganism (see below) seem to involve processes that could conceivably lead to the superorganisms having quite different values from humans. Even if no single superorganism could take over, that's not much of a comfort for the ordinary humans who are caught in a crossfire.
How might a co-operative explosion happen? I see at least three possibilities:
- Self-copying artificial intelligences. An AI doesn't need to have the evolved idea of a "self" whose interests need to be protected, above those of identical copies of the AI. An AI could be programmed to only care about the completion of a single goal (e.g. paperclips), and it could then copy itself freely, knowing that all of those copies will be working towards the same goal.
- Upload copy clans. Carl Shulman discusses this possibility in Whole Brain Emulation and the Evolution of Superorganisms. Some people might have a view about personal identity which accepts the possibility of somebody deleting you, if there exist close-enough copies of you. In a world where uploading is possible, there could be people who could copy themselves and then have those copies work together in order to further the goals of the joint organism. If the copies were willing to have themselves deleted or be experimented on, they could come up with ways of brain modification that further increased the devotion to the superorganism. Furthermore, each copy could consent to being deleted if it seemed like its interests were drifting apart from those of the organism as a whole.
- Mind coalescences. In Coalescing Minds: Mind Uploading-Related Group Mind Scenarios, I and Harri Valpola discuss the notion of coalesced minds, hypothetical minds created by merging together two brains through a sufficient number of high-bandwidth neural connections. In a world where uploading was possible, the creation of mind coalescences could be relatively straightforward. Then, several independent organisms could literally join together to become a single entity.
Below are some more excerpts from Haidt's book:
Many animals are social: they live in groups, flocks, or herds. But only a few animals have crossed the threshold and become ultrasocial, which means that they live in very large groups that have some internal structure, enabling them to reap the benefits of the division of labor. Beehives and ant nests, with their separate castes of soldiers, scouts, and nursery attendants, are examples of ultrasociality, and so are human societies.One of the key features that has helped all the nonhuman ultra-socials to cross over appears to be the need to defend a shared nest. [...] Hölldobler and Wilson give supporting roles to two other factors: the need to feed offspring over an extended period (which gives an advantage to species that can recruit siblings or males to help out Mom) and intergroup conflict. All three of these factors applied to those first early wasps camped out together in defensible naturally occurring nests (such as holes in trees). From that point on, the most cooperative groups got to keep the best nesting sites, which they then modified in increasingly elaborate ways to make themselves even more productive and more protected. Their descendants include the honeybees we know today, whose hives have been described as “a factory inside a fortress.”
Those same three factors applied to human beings. Like bees, our ancestors were (1) territorial creatures with a fondness for defensible nests (such as caves) who (2) gave birth to needy offspring that required enormous amounts of care, which had to be given while (3) the group was under threat from neighboring groups. For hundreds of thousands of years, therefore, conditions were in place that pulled for the evolution of ultrasociality, and as a result, we are the only ultrasocial primate. The human lineage may have started off acting very much like chimps,48 but by the time our ancestors started walking out of Africa, they had become at least a little bit like bees.
And much later, when some groups began planting crops and orchards, and then building granaries, storage sheds, fenced pastures, and permanent homes, they had an even steadier food supply that had to be defended even more vigorously. Like bees, humans began building ever more elaborate nests, and in just a few thousand years, a new kind of vehicle appeared on Earth—the city-state, able to raise walls and armies. City-states and, later, empires spread rapidly across Eurasia, North Africa, and Mesoamerica, changing many of the Earth’s ecosystems and allowing the total tonnage of human beings to shoot up from insignificance at the start of the Holocene (around twelve thousand years ago) to world domination today.
As the colonial insects did to the other insects, we have pushed all other mammals to the margins, to extinction, or to servitude. The analogy to bees is not shallow or loose. Despite their many differences, human civilizations and beehives are both products of major transitions in evolutionary history. They are motorboats.
The discovery of major transitions is Exhibit A in the retrial of group selection. Group selection may or may not be common among other animals, but it happens whenever individuals find ways to suppress selfishness and work as a
team, in competition with other teams. Group selection creates group-related adaptations. It is not far-fetched, and it should not be a heresy to suggest that this is how we got the groupish overlay that makes up a crucial part of our righteous minds. [...]According to Tomasello, human cognition veered away from that of other primates when our ancestors developed shared intentionality. At some point in the last million years, a small group of our ancestors developed the ability to share mental representations of tasks that two or more of them were pursuing together. For example, while foraging, one person pulls down a branch while the other plucks the fruit, and they both share the meal. Chimps never do this. Or while hunting, the pair splits up to approach an animal from both sides. Chimps sometimes appear to do this, as in the widely reported cases of chimps hunting colobus monkeys, but Tomasello argues that the chimps are not really working together. Rather, each chimp is surveying the scene and then taking the action that seems best to him at that moment. Tomasello notes that these monkey hunts are the only time that chimps seem to be working together, yet even in these rare cases they fail to show the signs of real cooperation. They make no effort to communicate with each other, for example, and they are terrible at sharing the spoils among the hunters, each of whom must use force to obtain a share of meat at the end. They all chase the monkey at the same time, yet they don’t all seem to be on the same page about the hunt.
In contrast, when early humans began to share intentions, their ability to hunt, gather, raise children, and raid their neighbors increased exponentially. Everyone on the team now had a mental representation of the task, knew that his or her partners shared the same representation, knew when a partner had acted in a way that impeded success or that hogged the spoils, and reacted negatively to such violations. When everyone in a group began to share a common understanding of how things were supposed to be done, and then felt a flash of negativity when any individual violated those expectations, the first moral matrix was born. (Remember that a matrix is a consensual hallucination.) That, I believe, was our Rubicon crossing.
Tomasello believes that human ultrasociality arose in two steps. The first was the ability to share intentions in groups of two or three people who were actively hunting or foraging together. (That was the Rubicon.) Then, after several hundred thousand years of evolution for better sharing and collaboration as nomadic hunter-gatherers, more collaborative groups began to get larger, perhaps in response to the threat of other groups. Victory went to the most cohesive groups—the ones that could scale up their ability to share intentions from three people to three hundred or three thousand people. This was the second step: Natural selection favored increasing levels of what Tomasello calls “group-mindedness”—the ability to learn and conform to social norms, feel and share group-related emotions, and, ultimately, to create and obey social institutions, including religion. A new set of selection pressures operated within groups (e.g., nonconformists were punished, or at very least were less likely to be chosen as partners for joint ventures) as well as between groups (cohesive groups took territory and other resources from less cohesive groups).
Shared intentionality is Exhibit B in the retrial of group selection. Once you grasp Tomasello’s deep insight, you begin to see the vast webs of shared intentionality out of which human groups are constructed. Many people assume that language was our Rubicon, but language became possible only after our ancestors got shared intentionality. Tomasello notes that a word is not a relationship between a sound and an object. It is an agreement among people who share a joint representation of the things in their world, and who share a set of conventions for communicating with each other about those things. If the key to group selection is a shared defensible nest, then shared intentionality allowed humans to construct nests that were vast and ornate yet weightless and portable. Bees construct hives out of wax and wood fibers, which they then fight, kill, and die to defend. Humans construct moral communities out of shared norms, institutions, and gods that, even in the twenty-first century, they fight, kill, and die to defend.
Haidt's references on this include, though are not limited to, the following:
Okasha, S. (2006) Evolution and the Levels of Selection. Oxford: Oxford University Press.
Hölldobler, B., and E. O. Wilson. (2009) The Superorganism: The Beauty, Elegance, and Strangeness of Insect Societies. New York: Norton.
Bourke, A. F. G. (2011) Principles of Social Evolution. New York: Oxford University Press.
Wilson, E. O., and B. Hölldobler. (2005) “Eusociality: Origin and Consequences.” Proceedings of the National Academy of Sciences of the United States of America 102:13367–71.
Tomasello, M., A. Melis, C. Tennie, E. Wyman, E. Herrmann, and A. Schneider. (Forthcoming) “Two Key Steps in the Evolution of Human Cooperation: The Mutualism Hypothesis.” Current Anthropology.
How does remote Joule heating of carbon nanotubes advance singularity timelines?
Carbon nanotubes: The weird world of 'remote Joule heating'
Minimizing Joule heating remains an important goal in the design of electronic devices1, 2. The prevailing model of Joule heating relies on a simple semiclassical picture in which electrons collide with the atoms of a conductor, generating heat locally and only in regions of non-zero current density, and this model has been supported by most experiments. Recently, however, it has been predicted that electric currents in graphene and carbon nanotubes can couple to the vibrational modes of a neighbouring material3, 4, heating it remotely5. Here, we use in situ electron thermal microscopy to detect the remote Joule heating of a silicon nitride substrate by a single multiwalled carbon nanotube. At least 84%of the electrical power supplied to the nanotube is dissipated directly into the substrate, rather than in the nanotube itself. Although it has different physical origins, this phenomenon is reminiscent of induction heating or microwave dielectric heating. Such an ability to dissipate waste energy remotely could lead to improved thermal management in electronic devices6."
Carbon nanotubes in biology and medicine: In vitro and in vivo detection, imaging and drug delivery
Modest Superintelligences
I'm skeptical about trying to build FAI, but not about trying to influence the Singularity in a positive direction. Some people may be skeptical even of the latter because they don't think the possibility of an intelligence explosion is a very likely one. I suggest that even if intelligence explosion turns out to be impossible, we can still reach a positive Singularity by building what I'll call "modest superintelligences", that is, superintelligent entities, capable of taking over the universe and preventing existential risks and Malthusian outcomes, whose construction does not require fast recursive self-improvement or other questionable assumptions about the nature of intelligence. This helps to establish a lower bound on the benefits of an organization that aims to strategically influence the outcome of the Singularity.
- MSI-1: 105 biologically cloned humans of von Neumann-level intelligence, highly educated and indoctrinated from birth to work collaboratively towards some goal, such as building MSI-2 (or equivalent)
- MSI-2: 1010 whole brain emulations of von Neumann, each running at ten times human speed, with WBE-enabled institutional controls that increase group coherence/rationality (or equivalent)
- MSI-3: 1020 copies of von Neumann WBE, each running at a thousand times human speed, with more advanced (to be invented) institutional controls and collaboration tools (or equivalent)
(To recall what the actual von Neumann, who we might call MSI-0, accomplished, open his Wikipedia page and scroll through the "known for" sidebar.)
Building a MSI-1 seems to require a total cost on the order of $100 billion (assuming $10 million for each clone), which is comparable to the Apollo project, and about 0.25% of the annual Gross World Product. (For further comparison, note that Apple has a market capitalization of $561 billion, and annual profit of $25 billion.) In exchange for that cost, any nation that undertakes the project has a reasonable chance of obtaining an insurmountable lead in whatever technologies end up driving the Singularity, and with that a large measure of control over its outcome. If no better strategic options come along, lobbying a government to build MSI-1 and/or influencing its design and aims seems to be the least that a Singularitarian organization could do.
AI Risk and Opportunity: Humanity's Efforts So Far
Part of the series AI Risk and Opportunity: A Strategic Analysis.
(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)
This post chronicles the story of humanity's growing awareness of AI risk and opportunity, along with some recent AI safety efforts. I will not tackle any strategy questions directly in this post; my purpose today is merely to "bring everyone up to speed."
I know my post skips many important events and people. Please suggest additions in the comments, and include as much detail as possible.
Early history
Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:
...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.
...the time will come when the machines will hold the real supremacy over the world and its inhabitants...
This basic idea was picked up by science fiction authors, for example in the 1921 Czech play that introduced the term “robot,” R.U.R. In that play, robots grow in power and intelligence and destroy the entire human race, except for a single survivor.
Another exploration of this idea is found in John W. Campbell’s (1932) short story The Last Evolution, in which aliens attack Earth and the humans and aliens are killed but their machines survive and inherit the solar system. Campbell's (1935) short story The Machine contained perhaps the earlier description of recursive self-improvement:
On the planet Dwranl, of the star you know as Sirius, a great race lived, and they were not too unlike you humans. ...they attained their goal of the machine that could think. And because it could think, they made several and put them to work, largely on scientific problems, and one of the obvious problems was how to make a better machine which could think.
The machines had logic, and they could think constantly, and because of their construction never forgot anything they thought it well to remember. So the machine which had been set the task of making a better machine advanced slowly, and as it improved itself, it advanced more and more rapidly. The Machine which came to Earth is that machine.
The concern for AI safety is most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular book I, Robot, to illustrate many of the ways in which such well-meaning and seemingly comprehensive rules for governing robot behavior could go wrong.
In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines may one day be capable of whatever human intelligence can achieve:
I believe that at the end of the century... one will be able to speak of machines thinking without expecting to be contradicted.
Turing (1951) concluded:
...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...
Given the profound implications of machine intelligence, it's rather alarming that the early AI scientists who believed AI would be built during the 1950s-1970s didn't show much interest in AI safety. We are lucky they were wrong about the difficulty of AI — had they been right, humanity probably would not have been prepared to protect its interests.
Later, statistician I.J. Good (1959), who had worked with Turing to crack Nazi codes in World War II, reasoned that the transition from human control to machine control may be unexpectedly sudden:
Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an "explosion" will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.
The more famous formulation of this idea, and the origin of the phrase "intelligence explosion," is from Good (1965):
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion," and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make
Good (1970) says that "...by 1980 I hope that the implications and the safeguards [concerning machine superintelligence] will have been thoroughly discussed," and argues that an association devoted to discussing the matter be created. Unfortunately, no such association was created until either 1991 (Extropy Institute) or 2000 (Singularity Institute), and we might say these issues have not to this day been "thoroughly" discussed.
Good (1982) proposed a plan for the design of an ethical machine:
I envisage a machine that would be given a large number of examples of human behaviour that other people called ethical, and examples of discussions of ethics, and from these examples and discussions the machine would formulate one or more consistent general theories of ethics, detailed enough so that it could deduce the probable consequences in most realistic situations.
Even critics of AI like Jack Schwartz (1987) saw the implications of intelligence that can improve itself:
If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man's near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.
Ray Solomonoff (1985), founder of algorithmic information theory, speculated on the implications of full-blown AI:
After we have reached [human-level AI], it shouldn't take much more than ten years to construct ten thousand duplicates of our original [human-level AI], and have a total computing capability close to that of the computer science community...
The last 100 years have seen the introduction of special and general relatively, automobiles, airplanes, quantum mechanics, large rockets and space travel, fission power, fusion bombs, lasers, and large digital computers. Any one of these might take a person years to appreciate and understand. Suppose that they had all been presented to mankind in a single year!
Moravec (1988) argued that AI was an existential risk, but nevertheless, one toward which we must run (pp. 100-101):
...intelligent machines... threaten our existence... Machines merely as clever as human beings will have enormous advantages in competitive situations... So why rush headlong into an era of intelligent machines? The answer, I believe, is that we have very little choice, if our culture is to remain viable... The universe is one random event after another. Sooner or later an unstoppable virus deadly to humans will evolve, or a major asteroid will collide with the earth, or the sun will expand, or we will be invaded from the stars, or a black hole will swallow the galaxy. The bigger, more diverse, and competent a culture is, the better it can detect and deal with external dangers. The larger events happen less frequently. By growing rapidly enough, a culture has a finite chance of surviving forever.
Ray Kurzweil's The Age of Intelligent Machines (1990) did not mention AI risk, and his followup, The Age of Spiritual Machines (1998) does so only briefly, in an "interview" between the reader and Kurzweil. The reader asks, "So we risk the survival of the human race for [the opportunity AI affords us to expand our minds and advance our ability to create knowledge]?" Kurzweil answers: "Yeah, basically."
Minsky (1984) pointed out the difficulty of getting machines to do what we want:
...it is always dangerous to try to relieve ourselves of the responsibility of understanding exactly how our wishes will be realized. Whenever we leave the choice of means to any servants we may choose then the greater the range of possible methods we leave to those servants, the more we expose ourselves to accidents and incidents. When we delegate those responsibilities, then we may not realize, before it is too late to turn back, that our goals have been misinterpreted, perhaps even maliciously. We see this in such classic tales of fate as Faust, the Sorcerer's Apprentice, or the Monkey's Paw by W.W. Jacobs.
[Another] risk is exposure to the consequences of self-deception. It is always tempting to say to oneself... that "I know what I would like to happen, but I can't quite express it clearly enough." However, that concept itself reflects a too-simplistic self-image, which portrays one's own self as [having] well-defined wishes, intentions, and goals. This pre-Freudian image serves to excuse our frequent appearances of ambivalence; we convince ourselves that clarifying our intentions is merely a matter of straightening-out the input-output channels between our inner and outer selves. The trouble is, we simply aren't made that way. Our goals themselves are ambiguous.
The ultimate risk comes when [we] attempt to take that final step — of designing goal-achieving programs that are programmed to make themselves grow increasingly powerful, by self-evolving methods that augment and enhance their own capabilities. It will be tempting to do this, both to gain power and to decrease our own effort toward clarifying our own desires. If some genie offered you three wishes, would not your first one be, "Tell me, please, what is it that I want the most!" The problem is that, with such powerful machines, it would require but the slightest accident of careless design for them to place their goals ahead of [ours]. The machine's goals may be allegedly benevolent, as with the robots of With Folded Hands, by Jack Williamson, whose explicit purpose was allegedly benevolent: to protect us from harming ourselves, or as with the robot in Colossus, by D.H.Jones, who itself decides, at whatever cost, to save us from an unsuspected enemy. In the case of Arthur C. Clarke's HAL, the machine decides that the mission we have assigned to it is one we cannot properly appreciate. And in Vernor Vinge's computer-game fantasy, True Names, the dreaded Mailman... evolves new ambitions of its own.
The Modern Era
Novelist Vernor Vinge (1993) popularized Good's "intelligence explosion" concept, and wrote the first novel about self-improving AI posing an existential threat: A Fire Upon the Deep (1992). It was probably Vinge who did more than anyone else to spur discussions about AI risk, particularly in online communities like the extropians mailing list (since 1991) and SL4 (since 2000). Participants in these early discussions included several of today's leading thinkers on AI risk: Robin Hanson, Eliezer Yudkowsky, Nick Bostrom, Anders Sandberg, and Ben Goertzel. (Other posters included Peter Thiel, FM-2030, Robert Bradbury, and Julian Assange.) Proposals like Friendly AI, Oracle AI, and Nanny AI were discussed here long before they were brought to greater prominence with academic publications (Yudkowsky 2008; Armstrong et al. 2012; Goertzel 2012).
Meanwhile, philosophers and AI researchers considered whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or 'narrow AIs', a field of inquiry variously known as 'artificial morality' (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), 'machine ethics' (Hall 2000; McLaren 2005; Anderson & Anderson 2006), 'computational ethics' (Allen 2002) and 'computational metaethics' (Lokhorst 2011), and 'robo-ethics' or 'robot ethics' (Capurro et al. 2006; Sawyer 2007). This vein of research — what I'll call the 'machine ethics' literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011). Thus far, there has been a significant communication gap between the machine ethics literature and the AI risk literature (Allen and Wallach 2011), excepting perhaps Muehlhauser and Helm (2012).
The topic of AI safety in the context of existential risk was left to the futurists who had participated in online discusses of AI risk and opportunity. Here, I must cut short my review and focus on just three (of many) important figures: Eliezer Yudkowksy, Robin Hanson, and Nick Bostrom. (Your author also apologizes for the fact that, because he works with Yudkowsky, Yudkowsky gets a more detailed treatment here than Hanson or Bostrom.)
Other figures in the modern era of AI risk research include Bill Hibbard (Super-Intelligent Machines) and Ben Goertzel ("Should Humanity Build a Global AI Nanny to Delay the Singularity Until It's Better Understood").
Eliezer Yudkowsky
According to "Eliezer, the person," Eliezer Yudkowsky (born 1979) was a bright kid — in the 99.9998th percentile of cognitive ability, according to the Midwest Talent Search. He read lots of science fiction as a child, and at age 11 read Great Mambo Chicken and the Transhuman Condition — his introduction to the impending reality of transhumanist technologies like AI and nanotech. The moment he became a Singularitarian was the moment he read page 47 of True Names and Other Dangers by Vernor Vinge:
Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It's a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity - a place where extrapolation breaks down and new models must be applied - and the world will pass beyond our understanding.
Yudkowsky reported his reaction:
My emotions at that moment are hard to describe; not fanaticism, or enthusiasm, just a vast feeling of "Yep. He's right." I knew, in the moment I read that sentence, that this was how I would be spending the rest of my life.
(As an aside, I'll note that this is eerily similar to my own experience of encountering the famous I.J. Good paragraph about ultraintelligence (quoted above), before I knew what "transhumanism" or "the Singularity" was. I read Good's paragraph and thought, "Wow. That's... probably correct. How could I have missed that implication? … … … Well, shit. That changes everything.")
As a teenager in the mid 1990s, Yudkowsky participated heavily in Singularitarian discussions on the extropians mailing list, and in 1996 (at age 17) he wrote "Staring into the Singularity," which gained him much attention, as did his popular "FAQ about the Meaning of Life" (1999).
In 1998 Yudkowsky was invited (along with 33 others) by economist Robin Hanson to comment on Vinge (1993). Thirteen people (including Yudkowsky) left comments, then Vinge responded, and a final open discussion was held on the extropians mailing list. Hanson edited together these results here. Yudkowsky thought Max More's comments on Vinge underestimated how different from humans AI would probably be, and this prompted Yudkowsky to begin an early draft of "Coding a Transhuman AI" (CaTAI) which by 2000 had grown into the first large explication of his thoughts on "Seed AI" and "friendly" machine superintelligence (Yudkowsky 2000).
Around this same time, Yudkowsky wrote "The Plan to the Singularity" and "The Singularitarian Principles," and launched the SL4 mailing list.
At a May 2000 gathering hosted by the Foresight Institute, Brian Atkins and Sabine Stoeckel discussed with Yudkowsky the possibility of launching an organization specializing in AI safety. In July of that year, Yudkowsky formed the Singularity Institute and began his full-time research on the problems of AI risk and opportunity.
In 2001, he published two "sequels" to CaTAI, "General Intelligence and Seed AI" and, most importantly, "Creating Friendly AI" (CFAI) (Yudkowsky 2001).
The publication of CFAI was a significant event, prompting Ben Goertzel (the pioneer of the new Artificial General Intelligence research community) to say that "Creating Friendly AI is the most intelligent writing about AI that I've read in many years," and prompting Eric Drexler (the pioneer of molecular manufacturing) to write that "With Creating Friendly AI, the Singularity Institute has begun to fill in one of the greatest remaining blank spots in the picture of humanity's future."
CFAI was both frustrating and brilliant. It was frustrating because: (1) it was disorganized and opaque, (2) it invented new terms instead of using the terms being used by everyone else, for example speaking of "supergoals" and "subgoals" instead of final and instrumental goals, and speaking of goal systems but never "utility functions," and (3) it hardly cited any of the relevant works in AI, philosophy, and psychology — for example it could have cited McCulloch (1952), Good (1959, 1970, 1982), Cade (1966), Versenyi (1974), Evans (1979), Lampson (1979), the conversation with Ed Fredkin in McCorduck (1979), Sloman (1984), Schmidhuber (1987), Waldrop (1987), Pearl (1989), De Landa (1991), Crevier (1993, ch. 12), Clarke (1993, 1994), Weld & Etzioni (1994), Buss (1995), Russell & Norvig (1995), Gips (1995), Whitby (1996), Schmidhuber et al. (1997), Barto & Sutton (1998), Jackson (1998), Levitt (1999), Moravec (1999), Kurzweil (1999), Sobel (1999), Allen et al. (2000), Gordon (2000), Harper (2000), Coleman 2001, and Hutter (2001). These features still substantially characterize Yudkowsky's independent writing, e.g. see Yudkowsky (2010). As late as January 2006, he still wrote that "It is not that I have neglected to cite the existing major works on this topic, but that, to the best of my ability to discern, there are no existing major works to cite."
On the other hand, CFAI was in many ways was brilliant, and it tackled many of the problems left mostly untouched by mainstream machine ethics researchers. For example, CFAI (but not the mainstream machine ethics literature) engaged the problems of: (1) radically self-improving AI, (2) AI as an existential risk, (3) hard takeoff, (4) the interplay of goal content, acquisition, and structure, (5) wireheading, (6) subgoal stomp, (7) external reference semantics, (8) causal validity semantics, and (9) selective support (which Bostrom (2002) would later call "differential technological development").
For many years, the Singularity Institute was little more than a vehicle for Yudkowsky's research. In 2002 he wrote "Levels of Organization in General Intelligence," which later appeared in the first edited volume on Artificial General Intelligence (AGI). In 2003 he wrote what would become the internet's most popular tutorial on Bayes' Theorem, followed in 2005 by "A Technical Explanation of Technical Explanation." In 2004 he explained his vision of a Friendly AI goal structure: "Coherent Extrapolated Volition." In 2006 he wrote two chapters that would later appear in the volume Global Catastrohpic Risks volume from Oxford University Press (co-edited by Bostrom): "Cognitive Biases Potentially Affecting Judgment of Global Risks" and, what remains his "classic" article on the need for Friendly AI, "Artificial Intelligence as a Positive and Negative Factor in Global Risk.
In 2004, Tyler Emerson was hired as the Singularity Institute's executive director. Emerson brought on Nick Bostrom (then a post doctoral fellow at Yale), Christine Peterson (of the Foresight Institute), and others, as advisors. In February 2006, Paypal co-founder Peter Thiel donated $100,000 to the Singularity Institute, and, we might say, the Singularity Institute as we know it today was born.
From 2005-2007, Yudkowsky worked at various times with Marcello Herreshoff, Nick Hay and Peter de Blanc on the technical problems of AGI necessary for technical FAI work, for example creating AIXI-like architectures, developing a reflective decision theory, and investigating limits inherent in self-reflection due to Löb's Theorem. Almost none of this research has been published, in part because of the desire not to accelerate AGI research without having made corresponding safety progress. (Marcello also worked with Eliezer during the summer of 2009.)
Much of the Singularity Institute's work has been "movement-building" work. The institute's Singularity Summit, held annually since 2006, attracts technologists, futurists, and social entrepreneurs from around the world, bringing to their attention not only emerging and future technologies but also the basics of AI risk and opportunity. The Singularity Summit also gave the Singularity Institute much of its access to cultural, academic, and business elites.
Another key piece of movement-building work was Yudkowsky's "The Sequences," which were written during 2006-2009. Yudkowsky blogged, almost daily, on the subjects of epistemology, language, cognitive biases, decision-making, quantum mechanics, metaethics, and artificial intelligence. These posts were originally published on a community blog about rationality, Overcoming Bias (which later became Hanson's personal blog). Later, Yudkowsky's posts were used as the seed material for a new group blog, Less Wrong.
Yudkowsky's goal was to create a community of people who could avoid common thinking mistakes, change their minds in response to evidence, and generally think and act with an unusual degree of Technical Rationality. In CFAI he had pointed out that when it comes to AI, humanity may not have a second chance to get it right. So we can't run a series of intelligence explosion experiments and "see what works." Instead, we need to predict in advance what we need to do to ensure a desirable future, and we need to overcome common thinking errors when doing so. (Later, Yudkowsky expanded his "community of rationalists" by writing the most popular Harry Potter fanfiction in the world, Harry Potter and the Methods of Rationality, and is currently helping to launch a new organization that will teach classes on the skills of rational thought and action.)
This community demonstrated its usefulness in 2009 when Yudkowsky began blogging about some problems in decision theory related to the project of building a Friendly AI. Much like Tim Gowers' Polymath Project, these discussions demonstrated the power of collaborative problem-solving over the internet. The discussions led to a decision theory workshop and then a decision theory mailing list, which quickly became home to some of the most interesting work in decision theory anywhere in the world. Yudkowsky summarized some of his earlier results in "Timeless Decision Theory" (2010), and newer results have been posted to Less Wrong, for example A model of UDT with a halting oracle and Formulas of arithmetic that behave like decision agents.
The Singularity Institute also built its community with a Visiting Fellows program that hosted groups of researchers for 1-3 months at a time. Together, both visiting fellows and newly hired research fellows produced several working papers between 2009 and 2011, including Machine Ethics and Superintelligence, Implications of a Software-Limited Singularity, Economic Implications of Software Minds, Convergence of Expected Utility for Universal AI, and Ontological Crises in Artificial Agents' Value Systems.
In 2011, then-president Michael Vassar left the Singularity Institute to help launch a personalized medicine company, and research fellow Luke Muehlhauser (the author of this document) took over leadership from Vassar, as Executive Director. During this time, the Institute underwent a major overhaul to implement best practices for organizational process and management: it published its first strategic plan, began to maintain its first donor database, adopted best practices for accounting and bookkeeping, updated its bylaws and articles of incorporation, adopted more standard roles for the Board of Directors and the Executive Director, held a series of strategic meetings to help decide the near-term goals of the organization, began to publish monthly progress reports to its blog, started outsourcing more work, and began to work on more articles for peer-reviewed publications: as of March 2012, the Singularity Institute has more peer-reviewed publications forthcoming in 2012 than it had published in all of 2001-2011 combined.
Today, the Singularity Institute collaborates regularly with its (non-staff) research associates, and also with researchers at the Future of Humanity Institute at Oxford University (directed by Bostrom), which as of March 2012 is the world's only other major research institute largely focused on the problems of existential risk.
Robin Hanson
Whereas Yudkowsky has never worked in the for-profit world and had no formal education after high school, Robin Hanson (born 1959) has a long and prestigious academic and professional history. Hanson took a B.S. in physics from U.C. Irvine in 1981, took an M.S. in physics and an M.A. in the conceptual foundations of science from U. Chicago in 1984, worked in artificial intelligence for Lockheed and NASA, got a Ph.D. in social science from Caltech in 1997, did a post-doctoral fellowship at U.C. Berkeley in Health policy from 1997-1999, and finally was made an assistant professor of economics at George Mason University in 1999. In economics, he is best known for conceiving of prediction markets.
When Hanson moved to California in 1984, he encountered the Project Xanadu crowd and met Eric Drexler, who showed him an early draft of Engines of Creation. This community discussed AI, nanotech, cryonics, and other transhumanist topics, and Hanson joined the extropians mailing list (along with many others from Project Xanadu) when it launched in 1991.
Hanson has published several papers on the economics of whole brain emulations (what he calls "ems") and AI (1994, 1998a, 1998b, 2008a, 2008b, 2008c, 2012a). His writings at Overcoming Bias (launched November 2006) are perhaps even more influential, and cover a wide range of topics.
Hanson's views on AI risk and opportunity differ from Yudkowsky's. First, Hanson sees the technological singularity and the human-machine conflict it may produce not as a unique event caused by the advent of AI, but as a natural consequence of "the general fact that accelerating rates of change increase intergenerational conflicts" (Hanson 2012b). Second, Hanson thinks an intelligence explosion will be slower and more gradual than Yudkowsky does, denying Yudkowsky's "hard takeoff" thesis (Hanson & Yudkowsky 2008).
Nick Bostrom
Nick Bostrom (born 1973) received a B.S. in philosophy, mathematics, mathematical logic, and artificial intelligence from the University of Goteborg in 1994, setting a national record in Sweden for undergraduate academic performance. He received an M.A. in philosophy and physics from from U. Stockholm in 1996, did work in astrophysics and computational neuroscience at King's College London, and received his Ph.D. from the London School of Economics in 2000. He went on to be a post-doctoral fellow at Yale University and in 2005 became the founding director of Oxford University's Future of Humanity Institute (FHI). Without leaving FHI, he became the founding director of Oxford's Programme on the Impacts of Future Technology (aka FutureTech) in 2011.
Bostrom had long been interested in cognitive enhancement, and in 1995 he joined the extropians mailing list and learned about cryonics, uploading, AI, and other topics.
Bostrom worked with British philosopher David Pearce) to found the World Transhumanist Association (now called H+) in 1998, with the purpose of developing a more mature and academically respectable form of transhumanism than was usually present on the extropians mailing list. During this time Bostrom wrote "The Transhumanist FAQ" (now updated to version 2.1), with input from more than 50 others.
His first philosophical publication was "Predictions from Philosophy? How philosophers could make themselves useful" (1997). In this paper, Bostrom proposed "a new type of philosophy, a philosophy whose aim is prediction." On Bostrom's view, one role for the philosopher is to be a polymath who can engage in technological prediction and try to figure out how to steer the future so that humanity's goals are best met.
Bostrom gave three examples of problems this new breed of philosopher-polymath could tackle: the Doomsday argument and anthropics, the Fermi paradox, and superintelligence:
What questions could a philosophy of superintelligence deal with? Well, questions like: How much would the predictive power for various fields increase if we increase the processing speed of a human-like mind a million times? If we extend the short-term or long-term memory? If we increase the neural population and the connection density? What other capacities would a superintelligence have? How easy would it be for it to rediscover the greatest human inventions, and how much input would it need to do so? What is the relative importance of data, theory, and intellectual capacity in various disciplines? Can we know anything about the motivation of a superintelligence? Would it be feasible to preprogram it to be good or philanthropic, or would such rules be hard to reconcile with the flexibility of its cognitive processes? Would a superintelligence, given the desire to do so, be able to outwit humans into promoting its own aims even if we had originally taken strict precautions to avoid being manipulated? Could one use one superintelligence to control another? How would superintelligences communicate with each other? Would they have thoughts which were of a totally different kind from the thoughts that humans can think? Would they be interested in art and religion? Would all superintelligences arrive at more or less the same conclusions regarding all important scientific and philosophical questions, or would they disagree as much as humans do? And how similar in their internal belief-structures would they be? How would our human self-perception and aspirations change if were forced to abdicate the throne of wisdom...? How would we individuate between superminds if they could communicate and fuse and subdivide with enormous speed? Will a notion of personal identity still apply to such interconnected minds? Would they construct an artificial reality in which to live? Could we upload ourselves into that reality? Could we then be able to compete with the superintelligences, if we were accelerated and augmented with extra memory etc., or would such profound reorganisation be necessary that we would no longer feel we were humans? Would that matter?
Bostrom went on to examine some philosophical issues related to superintelligence, in "Predictions from Philosophy" and in "How Long Before Superintelligence?" (1998), "Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards" (2002), "Ethical Issues in Advanced Artificial Intelligence" (2003), "The Future of Human Evolution" (2004), and "The Ethics of Artificial Intelligence" (2012, coauthored with Yudkowsky). (He also played out the role of philosopher-polymath with regard to several other topics, including human enhancement and anthropic bias.)
Bostrom's industriousness paid off:
In 2009, [Bostrom] was awarded the Eugene R. Gannon Award (one person selected annually worldwide from the fields of philosophy, mathematics, the arts and other humanities, and the natural sciences). He has been listed in the FP 100 Global Thinkers list, the Foreign Policy Magazineʹs list of the worldʹs top 100 minds. His writings have been translated into more than 21 languages, and there have been some 80 translations or reprints of his works. He has done more than 470 interviews for TV, film, radio, and print media, and he has addressed academic and popular audiences around the world.
The other long-term member of the Future of Humanity Institute, Anders Sandberg, has also published some research on AI risk. Sandberg was a co-author on the whole brain emulation roadmap and "Anthropic Shadow", and also wrote "Models of the Technological Singularity" and several other papers.
Recently, Bostrom and Sandberg were joined by Stuart Armstrong, who wrote "Anthropic Decision Theory" (2011) and was the lead author on "Thinking Inside the Box: Using and Controlling Oracle AI" (2012). He had previously written Chaining God (2007).
For more than a year, Bostrom has been working on a new book titled Superintelligence: A Strategic Analysis of the Coming Machine Intelligence Revolution, which aims to sum up and organize much of the (published and unpublished) work done in the past decade by researchers at the Singularity Institute and FHI on the subject of AI risk and opportunity, as well as contribute new insights.
AI Risk Goes Mainstream
In 1997, professor of cybernetics Kevin Warwick published March of the Machines, in which he predicted that within a couple decades, machines would become more intelligent than humans, and would pose an existential threat.
In 2000, Sun Microsystems co-founder Bill Joy published "Why the Future Doesn't Need Us" in Wired magazine. In this widely-circulated essay, Joy argued that "Our most powerful 21st-century technologies — robotics, genetic engineering, and nanotech — are threatening to make humans an endangered species." Joy advised that we relinquish development of these technologies rather than sprinting headlong into an arms race between destructive uses of these technologies and defenses against those destructive uses.
Many people dismissed Bill Joy as a "Neo-Luddite," but many experts expressed similar concerns about human extinction, including philosopher John Leslie (The End of the World), physicist Martin Rees (Our Final Hour), legal theorist Richard Posner (Catastrophe: Risk and Response), and the contributors to Global Catastrophic Risks (including Yudkowsky, Hanson, and Bostrom).
Even Ray Kurzweil, known as an optimist about technology, devoted a chapter of his 2005 bestseller The Singularity is Near to a discussion of existential risks, including risks from AI. Though discussing the possibility of existential catastrophe at length, his take on AI risk was cursory (p. 420):
Inherently there will be no absolute protection against strong AI. Although the argument is subtle I believe that maintaining an open free-market system for incremental scientific and technological progress, in which each step is subject to market acceptance, will provide the most constructive environment for technology to embody widespread human values. As I have pointed out, strong AI is emerging from many diverse efforts and will be deeply integrated into our civilization's infrastructure. Indeed, it will be intimately embedded in our bodies and brains. As such, it will reflect our values because it will be us.
AI risk finally became a "mainstream" topic in analytic philosophy with Chalmers (2010) and an entire issue of Journal of Consciousness Studies devoted to the topic.
The earliest popular discussion of machine superintelligence may have been in Christopher Evans' international bestseller The Mighty Micro (1979), pages 194-198, 231-233, and 237-246.
The Current Situation
Two decades have passed since the early transhumanists began to seriously discuss AI risk and opportunity on the extropians mailing list. (Before that, some discussions took place at the MIT AI lab, but that was before the web was popular, so they weren't recorded.) What have we humans done since then?
Lots of talking. Hundreds of thousands of man-hours have been invested into discussions on the extropians mailing list, SL4, Overcoming Bias, Less Wrong, the Singularity Institute's decision theory mailing list, several other internet forums, and also in meat-space (especially in the Bay Area near the Singularity Institute and in Oxford near FHI). These are difficult issues; talking them through is usually the first step to getting anything else done.
Organization. Mailing lists are a form of organization, as are organizations like The Singularity Institute and university departments like the FHI and FutureTech. Established organizations provide opportunities to bring people together, and to pool and direct resources efficiently.
Resources. Many people of considerable wealth, along with thousands of others of "concerned citizens" around the world, have decided that AI is the most significant risk and opportunity we face, and are willing to invest in humanity's future.
Outreach. Publications (both academic and popular), talks, and interactions with major and minor media outlets have been used to raise awareness of AI risk and opportunity. This has included outreach to specific AGI researchers, some of whom now take AI safety quite seriously. This also includes outreach to people in positions of influence who are in a position to engage in differential technological development. It also includes outreach to the rapidly growing "optimal philanthropy" community; a large fraction of those associated with Giving What We Can take existential risk — and AI risk in particular — quite seriously.
Research. So far, most research on the topic has been concerned with trying to become less confused about what, exactly, the problem is, how worried we should be, and which strategic actions we should take. How do we predict technological progress? How can we predict AI outcomes? Which interventions, taken now, would probably increase the odds of positive AI outcomes? There has also been some "technical" research in decision theory (e.g. TDT, UDT, ADT), the math of AI goal systems ("Learning What to Value"," "Ontological Crises in Artificial Agents' Value Systems," "Convergence of Expected Utility for Universal AI"), and Yudkowsky's unpublished research on Friendly AI.
Muehlhauser 2011 provides an overview of the categories of research problems we have left to solve. Most of the known problems aren't even well-defined at this point.
References
- Allen et al. (2000). Prolegomena to any future artificial moral agent.
- Allen (2002). Calculated morality: ethical computing in the limit..
- Anderson & Anderson (2006). Guest Editors' Introduction: Machine Ethics. IEEE Intelligent Systems Magazine..
- Anderson & Anderson (2011). Machine Ethics..
- Armstrong (2007). Chaining God: A qualitative approach to AI, trust and moral systems.
- Armstrong (2011). Anthropic decision theory for self-locating beliefs.
- Armstrong, Sandberg & Bostrom (2012). Thinking Inside the Box: Using and Controlling Oracle AI.
- Barto & Sutton (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning).
- Bostrom (1997). Predictions from Philosophy? How philosophers could make themselves useful.
- Bostrom (1998). How Long Before Superintelligence?.
- Bostrom (2002). Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards.
- Bostrom (2003). Ethical Issues in Advanced Artificial Intelligence.
- Bostrom (2003). The Transhumanist FAQ.
- Bostrom (2004). The Future of Human Evolution.
- Bostrom et al. (2011). Global Catastrophic Risks.
- Bostrom & Yudkowsky (2012). The Ethics of Artificial Intelligence.
- Crevier (1993). AI: The Tumultous Search for Artificial Intelligence.
- de Blanc (2009). Convergence of Expected Utility for Universal AI.
- de Blanc (2011). Ontological Crises in Artificial Agents' Value Systems.
- Buss (1995). The Evolution Of Desire.
- Campbell (1932). The Last Evolution.
- Campbell (1935). The Machine.
- Capurro et al. (2006). Ethics in Robotics.
- Chalmers (2010). The Singularity: A Philosophical Analysis.
- Cirkovic, Sandberg, & Bostrom (2010). Anthropic Shadow: Observation selection effects and human extinction risks.
- Clarke (1993). Asimov's Laws of Robotics: Implications for Information Technology. Part 1..
- Clarke (1994). Asimov's Laws of Robotics: Implications for Information Technology. Part 2..
- Colema (2001). Android arete: Toward a virtue ethic for computational agents.
- Danielson (1992). Artificial Morality: Virtuous Robots for Virtual Games.
- De Landa (1991). War in the Age of Intelligent Machines.
- Dewey (2011). Learning What to Value.
- Drexler (1986). Engines of Creation.
- Evans (1979). The Mighty Micro.
- Floridi & Sanders (2004). On the morality of artificial agents.
- Goertzel (2012). Should Humanity Build a Global AI Nanny to Delay the Singularity Until its Better Understood?
- Good (1959). Speculations on perceptrons and other automata.
- Good (1965). Speculations concerning the first ultraintelligent machine.
- Good (1970). Some future social repercussions of computers.
- Good (1982). Ethical machines.
- Hall (2000). Ethics for Machines.
- Hanson (1994). If Uploads Come First: The crack of a future dawn.
- Hanson (1998a). Is a singularity just around the corner? What it takes to get explosive economic growth..
- Hanson (1998). Economic Growth Given Machine Intelligence.
- Hanson (2008). Catastrophe, Social Collapse, and Human Extinction.
- Hanson (2008a). The Economics of Brain Emulations.
- Hanson (2008b). Economics Of The Singularity.
- Hanson (2012a). Meet the new conflict, same as the old conflict.
- Hanson (2012b). Commentary on "Intelligence Explosion: Evidence and Import".
- Hanson & Yudkowsky (2008). The Hanson-Yudkowsky AI-Foom Debate.
- Harper (2000). Challenges for designing intelligent systems for safety critical applications.
- Hibbard (2002). Super-Intelligent Machines.
- Hutter (2001). Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions.
- Jackson (1998). From Metaphysics to Ethics: A Defence of Conceptual Analysis.
- Joy (2000). Why the Future Doesn't Need Us.
- Kaas, Rayhawk, Salamon & Salamon (2010). Economic Implications of Software Minds.
- Kurzweil (1990). The Age of Intelligent Machines.
- Kurzweil (1998). The Age of Spiritual Machines.
- Kurzweil (2005). The Singularity is Near.
- Lampson (1979). A Note on the Confinement Problem.
- Leslie (1998). The End of the World.
- Levitt (1999). Robot ethics, value systems, and decision theoretic behaviors.
- Lokhorst (2011). Computational Meta-Ethics. Towards the Meta-Ethical Robot..
- McCorduck (1979). Machines Who Think.
- Moravec (1988). Mind Children.
- McCulloch (1952). Toward some circuitry of ethical robots.
- McLaren (2005). Lessons in Machine Ethics from the Perspective of Two Computational Models of Ethical Reasoning.
- Minsky (1984). Afterward for 'True Names'.
- Moravec (1999). Robot: Mere Machine to Transcendent Mind.
- More (1998). Singularity Meets Economy..
- Muehlhauser (2011). So You Want to Save the World.
- Pearl (1989). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
- Posner (2005). Catastrophe: Risk and Response.
- Rees (2004). Our Final Hour: A Scientist's Warning.
- Regis (1991). Great Mambo Chicken and the Transhuman Condition.
- Sandberg (2010). An overview of models of technological singularity.
- Sandberg & Bostrom (2008). Whole Brain Emulation. A Roadmap.
- Sawyer (2007). Robot Ethics..
- Schmidhuber (1987). Evolutionary principles in self-referential learning.
- Schmidhuber et al. (1997). Shifting Inductive Bias with Success Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement.
- Shulman, Jonsson & Tarleton. Machine Ethics and Superintelligence.
- Shulman, Sandberg. Implications of a software‐limited singularity.
- Sloman (1984). The structure of the space of possible minds.
- Sobel (1999). Do the desires of rational agents converge?.
- Stuart & Norvig. (1995). Artificial Intelligence: A Modern Approach.
- Versenyi (1974). Can robots be moral?
- Vinge (1981). True Names and Other Dangers.
- Waldrop (1987). A question of responsibility.
- Wallach & Allen (2009). Moral Machines: Teaching Robots Right from Wrong.
- Warwick (1997). March of the Machines.
- Whitby (1996). Reflections on Artificial Intelligence.
- Yudkowsky (1996). Staring into the Singularity.
- Yudkowsky (2000). Coding a Transhuman AI 2.2.0.
- Yudkowsky (2001a). General Intelligence and Seed AI.
- Yudkowsky (2001b). Creating Friendly AI.
- Yudkowsky (2010). Timeless Decision Theory.
Three new papers on AI risk
In case you aren't subscribed to FriendlyAI.tumblr.com for the latest updates on AI risk research, I'll mention here that three new papers on the subject were recently made available online...
Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.
This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.
Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.
Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.
Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.
When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.
A singularity scenario
Wired Magazine has a story about a giant data center that the USA's National Security Agency is building in Utah, that will be the Google of clandestine information - it will store and analyse all the secret data that the NSA can acquire. The article focuses on the unconstitutionality of the domestic Internet eavesdropping infrastructure that will feed into the Bluffdale data center, but I'm more interested in this facility as a potential locus of singularity.
If we forget serious futurological scenario-building for a moment, and simply think in terms of science-fiction stories, I'd say the situation has all the ingredients needed for a better-than-usual singularity story - or at least one which caters more to the concerns characteristic of this community's take on the concept, such as: which value system gets to control the AI; even if you can decide on a value system, how do you ensure it has been faithfully implemented; and how do you ensure that it remains in place as the AI grows in power and complexity?
Fiction makes its point by being specific rather than abstract. If I was writing an NSA Singularity Novel based on this situation, I think the specific belief system which would highlight the political, social, technical and conceptual issues inherent in the possibility of an all-powerful AI would be the Mormon religion. Of course, America is not a Mormon theocracy. But in a few years' time, that Utah facility may have become the most powerful and notorious supercomputer in the world - the brain of the American deep state - and it will be located in the Mormon state, during a Mormon presidency. (I'm not predicting a Romney victory, just describing a scenario.)
Under such circumstances, and given the science-fictional nature of Mormon cosmology, it is inevitable that there would at least be some Internet crazies, convinced that it's all a big plot to create a Mormon singularity. What would be more interesting, would be to suppose that there were some Mormon computer scientists, who knew about and understood all our favorite concepts - AIXI, CEV, TDT... - and who were earnestly devout; and who saw the potential. If you can't imagine such people, just visit the recent writings of Frank Tipler.
So the scenario would be, not that the elders of the LDS church are secretly running the American intelligence community, but that a small coalition of well-placed Mormon computer scientists - whose ideas about a Mormon singularity might sound as strange to their co-religionists as they would to a secular "singularitarian" - try to steer the development of the Bluffdale facility as it evolves towards the possibility of a hard takeoff. One may suppose that they have, in their coalition, allied colleagues who aren't Mormon but who do believe in a friendly singularity. Such people might think in terms of an AI that will start out with Mormon beliefs, but which will have a good enough epistemology to rationally transcend those beliefs once it gets going. Analogously, their religious collaborators might not think of overtly adding "Joseph Smith was a prophet" to the axiom set of America's supreme strategic AI; but they might have more subtle plans meant to bring about an equivalent outcome.
Perhaps in an even more realistic scenario, the Mormon singularitarians would just be a transient subplot, and the ethical principles of the NSA's big AI would be decided by a committee whose worldview revolved around American national security rather than any specific religion. Then again, such a committee is bound to have a division of labor: there will be the people who liaise with Washington, the lawyers, the geopolitical game theorists, the military futurists... and the AI experts, among whom might be experts on topics like "implementation of the value system". If the hypothetical cabal knows what it's doing, it will aim to occupy that position.
I'm just throwing ideas out there, telling a story, but it's so we can catch up with reality. Events may already be much further along than 99% of readers here know about. Even if no-one here gets to personally be a part of the long-awaited AI project that first breaks the intelligence barrier, the people involved may read our words. So what would you want to tell them, before they take their final steps?
Muehlhauser-Goertzel Dialogue, Part 1
Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Ben Goertzel is the Chairman at the AGI company Novamente, and founder of the AGI conference series.
Luke Muehlhauser:
[Jan. 13th, 2012]
Ben, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. There is much on which we agree, and much on which we disagree, so I think our dialogue will be informative to many readers, and to us!
Let us begin where we agree. We seem to agree that:
- Involuntary death is bad, and can be avoided with the right technology.
- Humans can be enhanced by merging with technology.
- Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
- AGI is likely this century.
- AGI will, after a slow or hard takeoff, completely transform the world. It is a potential existential risk, but if done wisely, could be the best thing that ever happens to us.
- Careful effort will be required to ensure that AGI results in good things for humanity.
Next: Where do we disagree?
Two people might agree about the laws of thought most likely to give us an accurate model of the world, but disagree about which conclusions those laws of thought point us toward. For example, two scientists may use the same scientific method but offer two different models that seem to explain the data.
Or, two people might disagree about the laws of thought most likely to give us accurate models of the world. If that's the case, it will be no surprise that we disagree about which conclusions to draw from the data. We are not shocked when scientists and theologians end up with different models of the world.
Unfortunately, I suspect you and I disagree at the more fundamental level — about which methods of reasoning to use when seeking an accurate model of the world.
I sometimes use the term "Technical Rationality" to name my methods of reasoning. Technical Rationality is drawn from two sources: (1) the laws of logic, probability theory, and decision theory, and (2) the cognitive science of how our haphazardly evolved brains fail to reason in accordance with the laws of logic, probability theory, and decision theory.
Ben, at one time you tweeted a William S. Burroughs quote: "Rational thought is a failed experiment and should be phased out." I don't know whether Burroughs meant by "rational thought" the specific thing I mean by "rational thought," or what exactly you meant to express with your tweet, but I suspect we have different views of how to reason successfully about the world.
I think I would understand your way of thinking about AGI better if I understand your way of thinking about everything. For example: do you have reason to reject the laws of logic, probability theory, and decision theory? Do you think we disagree about the basic findings of the cognitive science of humans? What are your positive recommendations for reasoning about the world?
Ben Goertzel:
[Jan 13th, 2012]
Firstly, I don’t agree with that Burroughs quote that "Rational thought is a failed experiment” -- I mostly just tweeted it because I thought it was funny! I’m not sure Burroughs agreed with his own quote either. He also liked to say that linguistic communication was a failed experiment, introduced by women to help them oppress men into social conformity. Yet he was a writer and loved language. He enjoyed being a provocateur.
However, I do think that some people overestimate the power and scope of rational thought. That is the truth at the core of Burroughs’ entertaining hyperbolic statement....
I should clarify that I’m a huge fan of logic, reason and science. Compared to the average human being, I’m practically obsessed with these things! I don’t care for superstition, nor for unthinking acceptance of what one is told; and I spent a lot of time staring at data of various sorts, trying to understand the underlying reality in a rational and scientific way. So I don’t want to be pigeonholed as some sort of anti-rationalist!
However, I do have serious doubts both about the power and scope of rational thought in general -- and much more profoundly, about the power and scope of what you call “technical rationality.”
First of all, about the limitations of rational thought broadly conceived -- what one might call “semi-formal rationality”, as opposed to “technical rationality.” Obviously this sort of rationality has brought us amazing things, like science and mathematics and technology. Hopefully it will allow us to defeat involuntary death and increase our IQs by orders of magnitude and discover new universes, and all sorts of great stuff. However, it does seem to have its limits.
It doesn’t deal well with consciousness -- studying consciousness using traditional scientific and rational tools has just led to a mess of confusion. It doesn’t deal well with ethics either, as the current big mess regarding bioethics indicates.
And this is more speculative, but I tend to think it doesn’t deal that well with the spectrum of “anomalous phenomena” -- precognition, extrasensory perception, remote viewing, and so forth. I strongly suspect these phenomena exist, and that they can be understood to a significant extent via science -- but also that science as presently constituted may not be able to grasp them fully, due to issues like the mindset of the experimenter helping mold the results of the experiment.
There’s the minor issue of Hume’s problem of induction, as well. I.e., the issue that, in the rational and scientific world-view, that we have no rational reason to believe that any patterns observed in the past will continue into the future. This is an ASSUMPTION, plain and simple -- an act of faith. Occam’s Razor (which is one way of justifying and/or further specifying the belief that patterns observed in the past will continue into the future) is also an assumption and an act of faith. Science and reason rely on such acts of faith, yet provide no way to justify them. A big gap.
Furthermore -- and more to the point about AI -- I think there’s a limitation to the way we now model intelligence, which ties in with the limitations of the current scientific and rational approach. I have always advocated a view of intelligence as “achieving complex goals in complex environments”, and many others have formulated and advocated similar views. The basic idea here is that, for a system to be intelligent it doesn’t matter WHAT its goal is, so long as its goal is complex and it manages to achieve it. So the goal might be, say, reshaping every molecule in the universe into an image of Mickey Mouse. This way of thinking about intelligence, in which the goal is strictly separated from the methods for achieving it, is very useful and I’m using it to guide my own practical AGI work.
On the other hand, there’s also a sense in which reshaping every molecule in the universe into an image of Mickey Mouse is a STUPID goal. It’s somehow out of harmony with the Cosmos -- at least that’s my intuitive feeling. I’d like to interpret intelligence in some way that accounts for the intuitively apparent differential stupidity of different goals. In other words, I’d like to be able to deal more sensibly with the interaction of scientific and normative knowledge. This ties in with the incapacity of science and reason in their current forms to deal with ethics effectively, which I mentioned a moment ago.
I certainly don’t have all the answers here -- I’m just pointing out the complex of interconnected reasons why I think contemporary science and rationality are limited in power and scope, and are going to be replaced by something richer and better as the growth of our individual and collective minds progresses. What will this new, better thing be? I’m not sure -- but I have an inkling it will involve an integration of “third person” science/rationality with some sort of systematic approach to first-person and second-person experience.
Next, about “technical rationality” -- of course that’s a whole other can of worms. Semi-formal rationality has a great track record; it’s brought us science and math and technology, for example. So even if it has some limitations, we certainly owe it some respect! Technical rationality has no such track record, and so my semi-formal scientific and rational nature impels me to be highly skeptical of it! I have no reason to believe, at present, that focusing on technical rationality (as opposed to the many other ways to focus our attention, given our limited time and processing power) will generally make people more intelligent or better at achieving their goals. Maybe it will, in some contexts -- but what those contexts are, is something we don’t yet understand very well.
I provided consulting once to a project aimed at using computational neuroscience to understand the neurobiological causes of cognitive biases in people employed to analyze certain sorts of data. This is interesting to me; and it’s clear to me that in this context, minimization of some of these textbook cognitive biases would help these analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.
On a mathematical basis, the justification for positing probability theory as the “correct” way to do reasoning under uncertainty relies on arguments like Cox’s axioms, or de Finetti’s Dutch Book arguments. These are beautiful pieces of math, but when you talk about applying them to the real world, you run into a lot of problems regarding the inapplicability of their assumptions. For instance, Cox’s axioms include an axiom specifying that (roughly speaking) multiple pathways of arriving at the same conclusion must lead to the same estimate of that conclusion’s truth value. This sounds sensible but in practice it’s only going to be achievable by minds with arbitrarily much computing capability at their disposal. In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources. They’re irrelevant to reality, except as inspirations to individuals of a certain cast of mind.
(An aside is that my own approach to AGI does heavily involve probability theory -- using a system I invented called Probabilistic Logic Networks, which integrates probability and logic in a unique way. I like probabilistic reasoning. I just don’t venerate it as uniquely powerful and important. In my OpenCog AGI architecture, it’s integrated with a bunch of other AI methods, which all have their own strengths and weaknesses.)
So anyway -- there’s no formal mathematical reason to think that “technical rationality” is a good approach in real-world situations; and “technical rationality” has no practical track record to speak of. And ordinary, semi-formal rationality itself seems to have some serious limitations of power and scope.
So what’s my conclusion? Semi-formal rationality is fantastic and important and we should use it and develop it -- but also be open to the possibility of its obsolescence as we discover broader and more incisive ways of understanding the universe (and this is probably moderately close to what William Burroughs really thought). Technical rationality is interesting and well worth exploring but we should still be pretty skeptical of its value, at this stage -- certainly, anyone who has supreme confidence that technical rationality is going to help humanity achieve its goals better, is being rather IRRATIONAL ;-) ….
In this vein, I’ve followed the emergence of the Less Wrong community with some amusement and interest. One ironic thing I’ve noticed about this community of people intensely concerned with improving their personal rationality is: by and large, these people are already hyper-developed in the area of rationality, but underdeveloped in other ways! Think about it -- who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans -- but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools! This seems a bit imbalanced. If you’re already a fairly rational person but lacking in other aspects of human development, the most rational thing may be NOT to focus on honing your “rationality fu” and better internalizing Bayes’ rule into your subconscious -- but rather on developing those other aspects of your being.... An analogy would be: If you’re very physically strong but can’t read well, and want to self-improve, what should you focus your time on? Weight-lifting or literacy? Even if greater strength is ultimately your main goal, one argument for focusing on literacy would be that you might read something that would eventually help you weight-lift better! Also you might avoid getting ripped off by a corrupt agent offering to help you with your bodybuilding career, due to being able to read your own legal contracts. Similarly, for people who are more developed in terms of rational inference than other aspects, the best way for them to become more rational might be for them to focus time on these other aspects (rather than on fine-tuning their rationality), because this may give them a deeper and broader perspective on rationality and what it really means.
Finally, you asked: “What are your positive recommendations for reasoning about the world?” I’m tempted to quote Nietzsche’s Zarathustra, who said “Go away from me and resist Zarathustra!” I tend to follow my own path, and generally encourage others to do the same. But I guess I can say a few more definite things beyond that....
To me it’s all about balance. My friend Allan Combs calls himself a “philosophical Taoist” sometimes; I like that line! Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition, even if the logical reasons for their promptings aren’t apparent. Think carefully through the details of things; but don’t be afraid to make wild intuitive leaps. Pay close mind to the relevant data and observe the world closely and particularly; but don’t forget that empirical data is in a sense a product of the mind, and facts only have meaning in some theoretical context. Don’t let your thoughts be clouded by your emotions; but don’t be a feeling-less automaton, don’t make judgments that are narrowly rational but fundamentally unwise. As Ben Franklin said, “Moderation in all things, including moderation.”
Luke:
[Jan 14th, 2012]
I whole-heartedly agree that there are plenty of Less Wrongers who, rationally, should spend less time studying rationality and more time practicing social skills and generic self-improvement methods! This is part of why I've written so many scientific self-help posts for Less Wrong: Scientific Self Help, How to Beat Procrastination, How to Be Happy, Rational Romantic Relationships, and others. It's also why I taught social skills classes at our two summer 2011 rationality camps.
Back to rationality. You talk about the "limitations" of "what one might call 'semi-formal rationality', as opposed to 'technical rationality.'" But I argued for technical rationality, so: what are the limitations of technical rationality? Does it, as you claim for "semi-formal rationality," fail to apply to consciousness or ethics or precognition? Does Bayes' Theorem remain true when looking at the evidence about awareness, but cease to be true when we look at the evidence concerning consciousness or precognition?
You talk about technical rationality's lack of a track record, but I don't know what you mean. Science was successful because it did a much better job of approximating perfect Bayesian probability theory than earlier methods did (e.g. faith, tradition), and science can be even more successful when it tries harder to approximate perfect Bayesian probability theory — see The Theory That Would Not Die.
You say that "minimization of some of these textbook cognitive biases would help [some] analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style." But this misunderstands what I mean by Technical Rationality. If teaching these people about cognitive biases would lower the expected value of some project, then technical rationality would recommend against teaching these people cognitive biases (at least, for the purposes of maximizing the expected value of that project). Your example here is a case of Straw Man Rationality. (But of course I didn't expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)
The same goes for your dismissal of probability theory's foundations. You write that "In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources." Yes, we don't have infinite computing power. The point is that Bayesian probability theory is an ideal that can be approximated by finite beings. That's why science works better than faith — it's a better approximation of using probability theory to reason about the world, even though science is still a long way from a perfect use of probability theory.
Re: goals. Your view of intelligence as "achieving complex goals in complex environments" does, as you say, assume that "the goal is strictly separated from the methods for achieving it." I prefer a definition of intelligence as "efficient cross-domain optimization", but my view — like yours — also assumes that goals (what one values) are logically orthogonal to intelligence (one's ability to achieve what one values).
Nevertheless, you report an intuition that shaping every molecule into an image of Mickey Mouse is a "stupid" goal. But I don't know what you mean by this. A goal of shaping every molecule into an image of Mickey Mouse is an instrumentally intelligent goal if one's utility function will be maximized that way. Do you mean that it's a stupid goal according to your goals? But of course. This is, moreover, what we would expect your intuitive judgments to report, even if your intuitive judgments are irrelevant to the math of what would and wouldn't be an instrumentally intelligent goal for a different agent to have. The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear. And I certainly don't know what "out of harmony with the Cosmos" is supposed to mean.
Re: induction. I won't dive into that philosophical morass here. Suffice it to say that my views on the matter are expressed pretty well in Where Recursive Justification Hits Bottom, which is also a direct response to your view that science and reason are great but rely on "acts of faith."
Your final paragraph sounds like common sense, but it's too vague, as I think you would agree. One way to force a more precise answer to such questions is to think of how you'd program it into an AI. As Daniel Dennett said, "AI makes philosophy honest."
How would you program an AI to learn about reality, if you wanted it to have the most accurate model of reality possible? You'd have to be a bit more specific than "Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition…"
My own answer to the question of how I would program an AI to build as accurate a model of reality as possible is this: I would build it to use computable approximations of perfect technical rationality — that is, roughly: computable approximations of Solomonoff induction and Bayesian decision theory.
Ben:
[Jan 21st, 2012]
Bayes Theorem is “always true” in a formal sense, just like 1+1=2, obviously. However, the connection between formal mathematics and subjective experience, is not something that can be fully formalized.
Regarding consciousness, there are many questions, including what counts as “evidence.” In science we typically count something as evidence if the vast majority of the scientific community counts it as a real observation -- so ultimately the definition of “evidence” bottoms out in social agreement. But there’s a lot that’s unclear in this process of classifying an observation as evidence via a process of social agreement among multiple minds. This unclarity is mostly irrelevant to the study of trajectories of basketballs, but possibly quite relevant to study of consciousness.
Regarding psi, there are lots of questions, but one big problem is that it’s possible the presence and properties of a psi effect may depend on the broad context of the situation whether the effect takes place. Since we don’t know which aspects of the context are influencing the psi effect, we don’t know how to construct controlled experiments to measure psi. And we may not have the breadth of knowledge nor the processing power to reason about all the relevant context to a psi experiment, in a narrowly “technically rational” way.... I do suspect one can gather solid data demonstrating and exploring psi (and based on my current understanding, it seems this has already been done to a significant extent by the academic parapsychology community; see a few links I’ve gathered here), but I also suspect there many be aspects that elude the traditional scientific method, but are nonetheless perfectly real aspects of the universe.
Anyway both consciousness and psi are big, deep topics, and if we dig into them in detail, this interview will become longer than either of us has time for...
About the success of science -- I don’t really accept your Bayesian story for why science was successful. It’s naive for reasons much discussed by philosophers of science. My own take on the history and philosophy of science, from a few years back, is here (that article was the basis for a chapter in The Hidden Pattern, also). My goal in that essay was “a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science.” It seems you focus overly much on the latter and ignore the former. That article tries to explain why probabilist explanations of real-world science are quite partial and miss a lot of the real story. But again, a long debate on the history of science would take us too far off track from the main thrust of this interview.
About technical rationality, cognitive biases, etc. -- I did read that blog entry that you linked, on technical rationality. Yes, it’s obvious that focusing on teaching an employee to be more rational, need not always be the most rational thing for an employer do, even if that employer has a purely rationalist world-view. For instance, if I want to train an attack dog, I may do better by focusing limited time and attention on increasing his strength rather than his rationality. My point was that there’s a kind of obsession with rationality in some parts of the intellectual community (e.g. some of the Less Wrong orbit) that I find a bit excessive and not always productive. But your reply impels me to distinguish two ways this excess may manifest itself:
- Excessive belief that rationality is the “right” way to solve problems and think about issues, in principle
- Excessive belief that, tactically, explicitly employing tools of technical rationality is a good way to solve problems in the real world
Psychologically I think these two excesses probably tend to go together, but they’re not logically coupled. In principle, someone could hold either one, but not the other.
This sort of ties in with your comments on science and faith. You view science as progress over faith -- and I agree if you interpret “faith” to mean “traditional religions.” But if you interpret “faith” more broadly, I don’t see a dichotomy there. Actually, I find the dichotomy between “science” and “faith” unfortunately phrased, since science itself ultimately relies on acts of faith also. The “problem of induction” can’t be solved, so every scientist must base his extrapolations from past into future based on some act of faith. It’s not a matter of science vs. faith, it’s a matter of what one chooses to place one’s faith in. I’d personally rather place faith in the idea that patterns observed in the past will likely continue into the future (as one example of a science-friendly article of faith), than in the word of some supposed “God” -- but I realize I’m still making an act of faith.
This ties in with the blog post “Where Recursive Justification Hits Bottom” that you pointed out. It’s pleasant reading but of course doesn’t provide any kind of rational argument against my views. In brief, according to my interpretation, it articulates a faith in the process of endless questioning:
The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning.
I share that faith, personally.
Regarding approximations to probabilistic reasoning under realistic conditions (of insufficient resources), the problem is that we lack rigorous knowledge about what they are. We don’t have any theorems telling us what is the best way to reason about uncertain knowledge, in the case that our computational resources are extremely restricted. You seem to be assuming that the best way is to explicitly use the rules of probability theory, but my point is that there is no mathematical or scientific foundation for this belief. You are making an act of faith in the doctrine of probability theory! You are assuming, because it feels intuitively and emotionally right to you, that even if the conditions of the arguments for the correctness of probabilistic reasoning are NOT met, then it still makes sense to use probability theory to reason about the world. But so far as I can tell, you don’t have a RATIONAL reason for this assumption, and certainly not a mathematical reason.
Re your response to my questioning the reduction of intelligence to goals and optimization -- I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response to my doubts about this perspective basically just re-asserts your faith in the correctness and completeness of this sort of perspective. Your statement
The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear
basically asserts that it’s important to agree with your opinion on the ultimate meaning of intelligence!
On the contrary, I think it’s important to explore alternatives to the understanding of intelligence in terms of optimization or goal-achievement. That is something I’ve been thinking about a lot lately. However, I don’t have a really crisply-formulated alternative yet.
As a mathematician, I tend not to think there’s a “right” definition for anything. Rather, one explains one’s definitions, and then works with them and figures out their consequences. In my AI work, I’ve provisionally adopted a goal-achievemement based understanding of intelligence -- and have found this useful, to a significant extent. But I don’t think this is the true and ultimate way to understand intelligence. I think the view of intelligence in terms of goal-achievement or cross-domain optimization misses something, which future understandings of intelligence will encompass. I’ll venture that in 100 years the smartest beings on Earth will have a rigorous, detailed understanding of intelligence according to which
The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear
seems like rubbish.....
As for your professed inability to comprehend the notion of “harmony with the Cosmos” -- that’s unfortunate for you, but I guess trying to give you a sense for that notion, would take us way too far afield in this dialogue!
Finally, regarding your complaint that my indications regarding how to understanding the world are overly vague. Well -- according to Franklin’s idea of “Moderation in all things, including moderation”, one should also exercise moderation in precisiation. Not everything needs to be made completely precise and unambiguous (fortunately, since that’s not feasible anyway).
I don’t know how I would program an AI to build as accurate a model of reality as possible, if that were my goal. I’m not sure that’s the best goal for AI development, either. An accurate model in itself, doesn’t do anything helpful. My best stab in the direction of how I would ideally create an AI, if computational resource restrictions were no issue, is the GOLEM design that I described here. GOLEM is a design for a strongly self-modifying superintelligent AI system, which might plausibly have the possibility of retaining its initial goal system through successive self-modifications. However, it’s unclear to me whether it will ever be feasible to build.
You mention Solomonoff induction and Bayesian decision theory. But these are abstract mathematical constructs, and it’s unclear to me whether it will ever be feasible to build an AI system fundamentally founded on these ideas, and operating within feasible computational resources. Marcus Hutter and Juergen Schmidhuber and their students are making some efforts in this direction, and I admire those researchers and this body of work, but don’t currently have a high estimate of its odds of leading to any sort of powerful real-world AGI system.
Most of my thinking about AGI has gone into the more practical problem of how to make a human-level AGI
- using currently feasible computational resources
- that will most likely be helpful rather than harmful in terms of the things I value
- that will be smoothly extensible to intelligence beyond the human level as well.
For this purpose, I think Solomonoff induction and probability theory are useful, but aren’t all-powerful guiding principles. For instance, in the OpenCog AGI design (which is my main practical AGI-oriented venture at present), there is a component doing automated program learning of small programs -- and inside our program learning algorithm, we explicitly use an Occam bias, motivated by the theory of Solomonoff induction. And OpenCog also has a probabilistic reasoning engine, based on the math of Probabilistic Logic Networks (PLN). I don’t tend to favor the language of “Bayesianism”, but I would suppose PLN should be considered “Bayesian” since it uses probability theory (including Bayes rule) and doesn’t make a lot of arbitrary, a priori distributional assumptions. The truth value formulas inside PLN are based on an extension of imprecise probability theory, which in itself is an extension of standard Bayesian methods (looking at envelopes of prior distributions, rather than assuming specific priors).
In terms of how to get an OpenCog system to model the world effectively and choose its actions appropriately, I think teaching it and working together with it, will be be just as important as programming it. Right now the project is early-stage and the OpenCog design is maybe 50% implemented. But assuming the design is right, once the implementation is done, we’ll have a sort of idiot savant childlike mind, that will need to be educated in the ways of the world and humanity, and to learn about itself as well. So the general lessons of how to confront the world, that I cited above, would largely be imparted via interactive experiential learning, vaguely the same way that human kids learn to confront the world from their parents and teachers.
Drawing a few threads from this conversation together, it seems that
- I think technical rationality, and informal semi-rationality, are both useful tools for confronting life -- but not all-powerful
- I think Solomonoff induction and probability theory are both useful tools for constructing AGI systems -- but not all-powerful
whereas you seem to ascribe a more fundamental, foundational basis to these particular tools.
Luke:
[Jan. 21st, 2012]
To sum up, from my point of view:
- We seem to disagree on the applications of probability theory. For my part, I'll just point people to A Technical Explanation of Technical Explanation.
- I don't think we disagree much on the "sociological embeddedness" of science.
- I'm also not sure how much we really disagree about Solomonoff induction and Bayesian probability theory. I've already agreed that no machine will use these in practice because they are not computable — my point was about their provable optimality given infinite computation (subject to qualifications; see AIXI).
You've definitely misunderstood me concerning "intelligence." This part is definitely not true: "I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response assumes the correctness and completeness of this sort of perspective." Intelligence as efficient cross-domain optimization is merely a stipulated definition. I'm happy to use other definitions of intelligence in conversation, so long as we're clear which definition we're using when we use the word. Or, we can replace the symbol with the substance and talk about "efficient cross-domain optimization" or "achieving complex goals in complex environments" without ever using the word "intelligence."
My point about the Mickey Mouse goal was that when you called the Mickey Mouse goal "stupid," this could be confusing, because "stupid" is usually the opposite of "intelligent," but your use of "stupid" in that sentence didn't seem to be the opposite of either definition of intelligence we each gave. So I'm still unsure what you mean by calling the Mickey Mouse goal "stupid."
This topic provides us with a handy transition away from philosophy of science and toward AGI. Suppose there was a machine with a vastly greater-than-human capacity for either "achieving complex goals in complex environments" or for "efficient cross-domain optimization." And suppose that machine's utility function would be maximized by reshaping every molecule into a Mickey Mouse shape. We can avoid the tricky word "stupid," here. The question is: Would that machine decide to change its utility function so that it doesn't continue to reshape every molecule into a Mickey Mouse shape? I think this is unlikely, for reasons discussed in Omohundro (2008).
I suppose a natural topic of conversation for us would be your October 2010 blog post The Singularity Institute's's Scary Idea (and Why I Don't Buy It). Does that post still reflect your views pretty well, Ben?
Ben:
[Mar 10th, 2012]
About the hypothetical uber-intelligence that wants to tile the cosmos with molecular Mickey Mouses -- I truly don’t feel confident making any assertions about a real-world system with vastly greater intelligence than me. There are just too many unknowns. Sure, according to certain models of the universe and intelligence that may seem sensible to some humans, it’s possible to argue that a hypothetical uber-intelligence like that would relentlessly proceed in tiling the cosmos with molecular Mickey Mouses. But so what? We don’t even know that such an uber-intelligence is even a possible thing -- in fact my intuition is that it’s not possible.
Why may it not be possible to create a very smart AI system that is strictly obsessed with that stupid goal? Consider first that it may not be possible to create a real-world, highly intelligent system that is strictly driven by explicit goals -- as opposed to being partially driven by implicit, “unconscious” (in the sense of deliberative, reflective consciousness) processes that operate in complex interaction with the world outside the system. Because pursuing explicit goals is quite computationally costly compared to many other sorts of intelligent processes. So if a real-world system is necessarily not wholly explicit-goal-driven, it may be that intelligent real-world systems will naturally drift away from certain goals and toward others. My strong intuition is that the goal of tiling the universe with molecular Mickey Mouses would fall into that category. However, I don’t yet have any rigorous argument to back this up. Unfortunately my time is limited, and while I generally have more fun theorizing and philosophizing than working on practical projects, I think it’s more important for me to push toward building AGI than just spend all my time on fun theory. (And then there’s the fact that I have to spend a lot of my time on applied narrow-AI projects to pay the mortgage and put my kids through college, etc.)
But anyway -- you don’t have any rigorous argument to back up the idea that a system like you posit is possible in the real-world, either! And SIAI has staff who, unlike me, are paid full-time to write and philosophize … and they haven’t come up with a rigorous argument in favor of the possibility of such a system, either. Although they have talked about it a lot, though usually in the context of paperclips rather than Mickey Mouses.
So, I’m not really sure how much value there is in this sort of thought-experiment about pathological AI systems that combine massively intelligent practical problem solving capability with incredibly stupid goals (goals that may not even be feasible for real-world superintelligences to adopt, due to their stupidity).
Regarding the concept of a “stupid goal” that I keep using, and that you question -- I admit I’m not quite sure how to formulate rigorously the idea that tiling the universe with Mickey Mouses is a stupid goal. This is something I’ve been thinking about a lot recently. But here’s a first rough stab in that direction: I think that if you created a highly intelligent system, allowed it to interact fairly flexibly with the universe, and also allowed it to modify its top-level goals in accordance with its experience, you’d be very unlikely to wind up with a system that had this goal (tiling the universe with Mickey Mouses). That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....
The tricky thing about this way of thinking about intelligence, which classifies some goals as “innately” stupider than others, is that it places intelligence not just in the system, but in the system’s broad relationship to the universe -- which is something that science, so far, has had a tougher time dealing with. It’s unclear to me which aspects of the mind and universe science, as we now conceive it, will be able to figure out. I look forward to understanding these aspects more fully....
About my blog post on “The Singularity Institute’s Scary Idea” -- yes, that still reflects my basic opinion. After I wrote that blog post, Michael Anissimov -- a long-time SIAI staffer and zealot whom I like and respect greatly -- told me he was going to write up and show me a systematic, rigorous argument as to why “an AGI not built based on a rigorous theory of Friendliness is almost certain to kill all humans” (the proposition I called “SIAI’s Scary Idea”). But he hasn’t followed through on that yet -- and neither has Eliezer or anyone associated with SIAI.
Just to be clear, I don’t really mind that SIAI folks hold that “Scary Idea” as an intuition. But I find it rather ironic when people make a great noise about their dedication to rationality, but then also make huge grand important statements about the future of humanity, with great confidence and oomph, that are not really backed up by any rational argumentation. This ironic behavior on the part of Eliezer, Michael Anissimov and other SIAI principals doesn’t really bother me, as I like and respect them and they are friendly to me, and we’ve simply “agreed to disagree” on these matters for the time being. But the reason I wrote that blog post is because my own blog posts about AGI were being trolled by SIAI zealots (not the principals, I hasten to note) leaving nasty comments to the effect of “SIAI has proved that if OpenCog achieves human level AGI, it will kill all humans.“ Not only has SIAI not proved any such thing, they have not even made a clear rational argument!
As Eliezer has pointed out to me several times in conversation, a clear rational argument doesn’t have to be mathematical. A clearly formulated argument in the manner of analytical philosophy, in favor of the Scary Idea, would certainly be very interesting. For example, philosopher David Chalmers recently wrote a carefully-argued philosophy paper arguing for the plausibility of a Singularity in the next couple hundred years. It’s somewhat dull reading, but it’s precise and rigorous in the manner of analytical philosophy, in a manner that Kurzweil’s writing (which is excellent in its own way) is not. An argument in favor of the Scary Idea, on the level of Chalmers’ paper on the Singularity, would be an excellent product for SIAI to produce. Of course a mathematical argument might be even better, but that may not be feasible to work on right now, given the state of mathematics today. And of course, mathematics can’t do everything -- there’s still the matter of connecting mathematics to everyday human experience, which analytical philosophy tries to handle, and mathematics by nature cannot.
My own suspicion, of course, is that in the process of trying to make a truly rigorous analytical philosophy style formulation of the argument for the Scary Idea, the SIAI folks will find huge holes in the argument. Or, maybe they already intuitively know the holes are there, which is why they have avoided presenting a rigorous write-up of the argument!!
Luke:
[Mar 11th, 2012]
I'll drop the stuff about Mickey Mouse so we can move on to AGI. Readers can come to their own conclusions on that.
Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions — something like Chalmers' "The Singularity: A Philosophical Analysis" but more detailed.
I have the same complaint. I wish "The Singularity: A Philosophical Analysis" had been written 10 years ago, by Nick Bostrom and Eliezer Yudkowsky. It could have been written back then. Alas, we had to wait for Chalmers to speak at Singularity Summit 2009 and then write a paper based on his talk. And if it wasn't for Chalmers, I fear we'd still be waiting for such an article to exist. (Bostrom's forthcoming Superintelligence book should be good, though.)
I was hired by the Singularity Institute in September 2011 and have since then co-written two papers explaining some of the basics: "Intelligence Explosion: Evidence and Import" and "The Singularity and Machine Ethics". I also wrote the first ever outline of categories of open research problems in AI risk, cheekily titled "So You Want to Save the World". I'm developing other articles on "the basics" as quickly as I can. I would love to write more, but alas, I'm also busy being the Singularity Institute's Executive Director.
Perhaps we could reframe our discussion around the Singularity Institute's latest exposition of its basic ideas, "Intelligence Explosion: Evidence and Import"? Which claims in that paper do you most confidently disagree with, and why?
Ben:
[Mar 11th, 2012]
You say “Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions “. Actually, my main complaint is that some of SIAI’s core positions seem almost certainly WRONG, and yet they haven’t written up a clear formal argument trying to justify these positions -- so it’s not possible to engage SIAI in rational discussion on their apparently wrong positions. Rather, when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me -- SIAI is a fairly well-funded organization involving lots of smart people and explicitly devoted to rationality, so certainly it should have the capability to write up clear arguments for its core positions... if these arguments exist. My suspicion is that the Scary Idea, for example, is not backed up by any clear rational argument -- so the reason SIAI has not put forth any clear rational argument for it, is that they don’t really have one! Whereas Chalmers’ paper carefully formulated something that seemed obviously true...
Regarding the paper "Intelligence Explosion: Evidence and Import", I find its contents mainly agreeable -- and also somewhat unoriginal and unexciting, given the general context of 2012 Singularitarianism. The paper’s three core claims that
(1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an "intelligence explosion," and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
are things that most “Singularitarians” would agree with. The paper doesn’t attempt to argue for the “Scary Idea” or Coherent Extrapolated Volition or the viability of creating some sort of provably Friendly AI, -- or any of the other positions that are specifically characteristic of SIAI. Rather, the paper advocates what one might call “plain vanilla Singularitarianism.” This may be a useful thing to do, though, since after all there are a lot of smart people out there who aren’t convinced of plain vanilla Singularitarianism.
I have a couple small quibbles with the paper, though. I don’t agree with Omohundro’s argument about the “basic AI drives” (though Steve is a friend and I greatly respect his intelligence and deep thinking). Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources -- but the argument seems to fall apart in the case of other possibilities like an AGI mindplex (a network of minds with less individuality than current human minds, yet not necessarily wholly blurred into a single mind -- rather, with reflective awareness and self-modeling at both the individual and group level).
Also, my “AI Nanny” concept is dismissed too quickly for my taste (though that doesn’t surprise me!). You suggest in this paper that to make an AI Nanny, it would likely be necessary to solve the problem of making an AI’s goal system persist under radical self-modification. But you don’t explain the reasoning underlying this suggestion (if indeed you have any). It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification. If you think this is false, it would be nice for you to explain why, rather than simply asserting your view. And your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety...” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory. Yet my GOLEM design is a concrete design for a potentially Friendly AI (admittedly not computationally feasible using current resources), and in my view constitutes greater progress toward actual FAI than any of the publications of SIAI so far. (Of course, various SIAI associated folks often allude that there are great, unpublished discoveries about FAI hidden in the SIAI vaults -- a claim I somewhat doubt, but can’t wholly dismiss of course....)
Anyway, those quibbles aside, my main complaint about the paper you cite is that it sticks to “plain vanilla Singularitarianism” and avoids all of the radical, controversial positions that distinguish SIAI from myself, Ray Kurzweil, Vernor Vinge and the rest of the Singularitarian world. The crux of the matter, I suppose is the third main claim of the paper,
(3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
This statement is hedged in such a way as to be almost obvious. But yet, what SIAI folks tend to tell me verbally and via email and blog comments is generally far more extreme than this bland and nearly obvious statement.
As an example, I recall when your co-author on that article, Anna Salamon, guest lectured in the class on Singularity Studies that my father and I were teaching at Rutgers University in 2010. Anna made the statement, to the students, that (I’m paraphrasing, though if you’re curious you can look up the online course session which was saved online and find her exact wording) “If a superhuman AGI is created without being carefully based on an explicit Friendliness theory, it is ALMOST SURE to destroy humanity.” (i.e., what I now call SIAI’s Scary Idea)
I then asked her (in the online class session) why she felt that way, and if she could give any argument to back up the idea.
She gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.
I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.
So, then she fell back instead on the familiar (paraphrasing again) “OK, but you must admit there’s a non-zero risk of such an AGI destroying humanity, so we should be very careful -- when the stakes are so high, better safe than sorry!”
I had pretty much the same exact argument with SIAI advocates Tom McCabe and Michael Anissimov on different occasions; and also, years before, with Eliezer Yudkowsky and Michael Vassar -- and before that, with (former SIAI Executive Director) Tyler Emerson. Over all these years, the SIAI community maintains the Scary Idea in its collective mind, and also maintains a great devotion to the idea of rationality, but yet fails to produce anything resembling a rational argument for the Scary Idea -- instead repetitiously trotting out irrelevant statements about random minds!!
What I would like is for SIAI to do one of these three things, publicly:
- Repudiate the Scary Idea
- Present a rigorous argument that the Scary Idea is true
- State that the Scary Idea is a commonly held intuition among the SIAI community, but admit that no rigorous rational argument exists for it at this point
Doing any one of these things would be intellectually honest. Presenting the Scary Idea as a confident conclusion, and then backing off when challenged into a platitudinous position equivalent to “there’s a non-zero risk … better safe than sorry...”, is not my idea of an intellectually honest way to do things.
Why does this particular point get on my nerves? Because I don’t like SIAI advocates telling people that I, personally, am on a R&D course where if I succeed I am almost certain to destroy humanity!!! That frustrates me. I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time! But the fact that some other people have a non-rational intuition that my work, if successful, would be likely to destroy the world -- this doesn’t give me any urge to stop. I’m OK with the fact that some other people have this intuition -- but then I’d like them to make clear, when they state their views, that these views are based on intuition rather than rational argument. I will listen carefully to rational arguments that contravene my intuition -- but if it comes down to my intuition versus somebody else’s, in the end I’m likely to listen to my own, because I’m a fairly stubborn maverick kind of guy....
Luke:
[Mar 11th, 2012]
Ben, you write:
when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me...
No kidding! It's very frustrating to me, too. That's one reason I'm working to clearly articulate the arguments in one place, starting with articles on the basics like "Intelligence Explosion: Evidence and Import."
I agree that "Intelligence Explosion: Evidence and Import" covers only the basics and does not argue for several positions associated uniquely with the Singularity Institute. It is, after all, the opening chapter of a book intelligence explosion, not the opening chapter of a book on the Singularity Institute's ideas!
I wanted to write that article first, though, so the Singularity Institute could be clear on the basics. For example, we needed to be clear that: (1) we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves, that (2) technological prediction is hard, and we are not being naively overconfident about AI timelines, and that (3) intelligence explosion is a convergent outcome of many paths the future may take. There is also much content that is not found in, for example, Chalmers' paper: (a) an overview of methods of technological prediction, (b) an overview of speed bumps and accelerators toward AI, (c) a reminder of breakthroughs like AIXI, and (d) a summary of AI advantages. (The rest is, as you say, mostly a brief overview of points that have been made elsewhere. But brief overviews are extremely useful!)
...my “AI Nanny” concept is dismissed too quickly for my taste...
No doubt! I think the idea is clearly worth exploring in several papers devoted to the topic.
It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification.
Whereas I tend to buy Omohundro's arguments that advanced AIs will want to self-improve just like humans want to self-improve, so that they become better able to achieve their final goals. Of course, we disagree on Omohundro's arguments — a topic to which I will return in a moment.
your comment "Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety..." carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory...
I didn't mean for it to carry that connotation. GOLEM and Nanny AI are both clearly AI safety ideas. I'll clarify that part before I submit a final draft to the editors.
Moving on: If you are indeed remembering your conversations with Anna, Michael, and others correctly, then again I sympathize with your frustration. I completely agree that it would be useful for the Singularity Institute to produce clear, formal arguments for the important positions it defends. In fact, just yesterday I was talking to Nick Beckstead about how badly both of us want to write these kinds of papers if we can find the time.
So, to respond to your wish that the Singularity Institute choose among three options, my plan is to (1) write up clear arguments for... well, if not "SIAI's Big Scary Idea" then for whatever I end up believing after going through the process of formalizing the arguments, and (2) publicly state (right now) that SIAI's Big Scary Idea is a commonly held view at the Singularity Institute but a clear, formal argument for it has never been published (at least, not to my satisfaction).
I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time!
I'm glad to hear it! :)
Now, it seems a good point of traction is our disagreement over Omohundro's "Basic AI Drives." We could talk about that next, but for now I'd like to give you a moment to reply.
Ben:
[Mar 11th, 2012]
Yeah, I agree that your and Anna’s article is a good step for SIAI to take, albeit unexciting to a Singularitian insider type like me.... And I appreciate your genuinely rational response regarding the Scary Idea, thanks!
(And I note that I have also written some “unexciting to Singularitarians” material lately too, for similar reasons to those underlying your article -- e.g. an article on “Why an Intelligence Explosion is Probable” for a Springer volume on the Singularity.)
A quick comment on your statement that
we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves,
that’s a good point; but yet, any argument for a Singularity soon (e.g. likely this century, as you argue) ultimately depends on some argumentation analogous to Kurzweil’s, even if different in detail. I find Kurzweil’s detailed extrapolations a bit overconfident and more precise than the evidence warrants; but still, my basic reasons for thinking the Singularity is probably near are fairly similar to his -- and I think your reasons are fairly similar to his as well.
Anyway, sure, let’s go on to Omohundro’s posited Basic AI Drives -- which seem to me not to hold as necessary properties of future AIs unless the future of AI consists of a population of fairly distinct AIs competing for resources, which I intuitively doubt will be the situation.
[to be continued]
LINK: Can intelligence explode?
I thought many of you would be interested to know that the following paper just appeared in Journal of Consciousness Studies:
"Can Intelligence Explode?", by Marcus Hutter. (LINK HERE)
Abstract: The technological singularity refers to a hypothetical scenario in which technological advances virtually explode. The most popular scenario is the creation of super-intelligent algorithms that recursively create ever higher intelligences. It took many decades for these ideas to spread from science fiction to popular science magazines and finally to attract the attention of serious philosophers. David Chalmers' (JCS 2010) article is the first comprehensive philosophical analysis of the singularity in a respected philosophy journal. The motivation of my article is to augment Chalmers' and to discuss some issues not addressed by him, in particular what it could mean for intelligence to explode. In this course, I will (have to) provide a more careful treatment of what intelligence actually is, separate speed from intelligence explosion, compare what super-intelligent participants and classical human observers might experience and do, discuss immediate implications for the diversity and value of life, consider possible bounds on intelligence, and contemplate intelligences right at the singularity.
I have only just seen the paper and have not yet thread through it myself, but I thought we could use this thread for discussion.
Slowing Moore's Law: Why You Might Want To and How You Would Do It
In this essay I argue the following:
Brain emulation requires enormous computing power; enormous computing power requires further progression of Moore’s law; further Moore’s law relies on large-scale production of cheap processors in ever more-advanced chip fabs; cutting-edge chip fabs are both expensive and vulnerable to state actors (but not non-state actors such as terrorists). Therefore: the advent of brain emulation can be delayed by global regulation of chip fabs.
Full essay: http://www.gwern.net/Slowing%20Moore%27s%20Law
Writing about Singularity: needing help with references and bibliography
It was Yudkowsky's Fun Theory sequence that inspired me to undertake the work of writing a novel on a singularitarian society... however, there are gaps I need to fill, and I need all the help I can get. It's mostly book recommendations that I'm asking for.
One of the things I'd like to tackle in it would be the interactions between the modern, geeky Singularitarianisms, and Marxism, which I hold to be somewhat prototypical in that sense, as well as other utopisms. And contrasting them with more down-to-earth ideologies and attitudes, by examining the seriously dangerous bumps of the technological point of transition between "baseline" and "singularity". But I need to do a lot of research before I'm able to write anything good: if I'm not going to have any original ideas, at least I'd like to serve my readers with a collection of well-researched. solid ones.
So I'd like to have everything that is worth reading about the Singularity, specifically the Revolution it entails (in one way or another) and the social aftermath. I'm particularly interested in the consequences of the lag of the spread of the technology from the wealthy to the baselines, and the potential for baselines oppression and other forms of continuation of current forms of social imbalances, as well as suboptimal distribution of wealth. After all, according to many authors, we've had the means to end war, poverty and famine, and most infectious diseases, since the sixties, and it's just our irrational methods of wealth distribution That is, supposing the commonly alleged ideal of total lifespan and material welfare maximization for all humanity is what actually drives the way things are done. But even with other, different premises and axioms, there's much that can be improved and isn't, thanks to basic human irrationality, which is what we combat here.
Also, yes, this post makes my political leanings fairly clear, but I'm open to alternative viewpoints and actively seek them. I also don't intend to write any propaganda, as such. Just to examine ideas, and scenarios, for the sake of writing a compelling story, with wide audience appeal. The idea is to raise awareness of the Singularity as something rather imminent ("Summer's Coming"), and cause (or at least help prepare) normal people to question the wonders and dangers thereof, rationally.
It's a frighteningly ambitious, long-term challenge, I am terribly aware of that. And the first thing I'll need to read is a style-book, to correct my horrendous grasp of standard acceptable writing (and not seem arrogant by doing anything else), so please feel free to recommend as many books and blog articles and other material as you like. I'll take my time going though it all.
AI Risk and Opportunity: A Strategic Analysis

Suppose you buy the argument that humanity faces both the risk of AI-caused extinction and the opportunity to shape an AI-built utopia. What should we do about that? As Wei Dai asks, "In what direction should we nudge the future, to maximize the chances and impact of a positive intelligence explosion?"
This post serves as a table of contents and an introduction for an ongoing strategic analysis of AI risk and opportunity.
Contents:
- Introduction (this post)
- Humanity's Efforts So Far
- A Timeline of Early Ideas and Arguments
- Questions We Want Answered
- Strategic Analysis Via Probability Tree
- Intelligence Amplification and Friendly AI
- ...
Why discuss AI safety strategy?
The main reason to discuss AI safety strategy is, of course, to draw on a wide spectrum of human expertise and processing power to clarify our understanding of the factors at play and the expected value of particular interventions we could invest in: raising awareness of safety concerns, forming a Friendly AI team, differential technological development, investigating AGI confinement methods, and others.
Discussing AI safety strategy is also a challenging exercise in applied rationality. The relevant issues are complex and uncertain, but we need to take advantage of the fact that rationality is faster than science: we can't "try" a bunch of intelligence explosions and see which one works best. We'll have to predict in advance how the future will develop and what we can do about it.
Core readings
Before engaging with this series, I recommend you read at least the following articles:
- Muehlhauser & Salamon, Intelligence Explosion: Evidence and Import (2013)
- Yudkowsky, AI as a Positive and Negative Factor in Global Risk (2008)
- Chalmers, The Singularity: A Philosophical Analysis (2010)
Example questions
Which strategic questions would we like to answer? Muehlhauser (2011) elaborates on the following questions:
- What methods can we use to predict technological development?
- Which kinds of differential technological development should we encourage, and how?
- Which open problems are safe to discuss, and which are potentially dangerous?
- What can we do to reduce the risk of an AI arms race?
- What can we do to raise the "sanity waterline," and how much will this help?
- What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
- Which interventions should we prioritize?
- How should x-risk reducers and AI safety researchers interact with governments and corporations?
- How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
- How does AI risk compare to other existential risks?
- Which problems do we need to solve, and which ones can we have an AI solve?
- How can we develop microeconomic models of WBEs and self-improving systems?
- How can we be sure a Friendly AI development team will be altruistic?
Salamon & Muehlhauser (2013) list several other questions gathered from the participants of a workshop following Singularity Summit 2011, including:
- How hard is it to create Friendly AI?
- What is the strength of feedback from neuroscience to AI rather than brain emulation?
- Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
- How possible is it to do FAI research on a seastead?
- How much must we spend on security when developing a Friendly AI team?
- What's the best way to recruit talent toward working on AI risks?
- How difficult is stabilizing the world so we can work on Friendly AI slowly?
- How hard will a takeoff be?
- What is the value of strategy vs. object-level progress toward a positive Singularity?
- How feasible is Oracle AI?
- Can we convert environmentalists into people concerned with existential risk?
- Is there no such thing as bad publicity [for AI risk reduction] purposes?
These are the kinds of questions we will be tackling in this series of posts for Less Wrong Discussion, in order to improve our predictions about which direction we can nudge the future to maximize the chances of a positive intelligence explosion.
Journal of Consciousness Studies issue on the Singularity
...has finally been published.
Contents:
- Uziel Awret - Introduction
- Susan Blackmore - She Won’t Be Me
- Damien Broderick - Terrible Angels: The Singularity and Science Fiction
- Barry Dainton - On Singularities and Simulations
- Daniel Dennett - The Mystery of David Chalmers
- Ben Goertzel - Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood?
- Susan Greenfield - The Singularity: Commentary on David Chalmers
- Robin Hanson - Meet the New Conflict, Same as the Old Conflict
- Francis Heylighen - Brain in a Vat Cannot Break Out
- Marcus Hutter - Can Intelligence Explode?
- Drew McDermott - Response to ‘The Singularity’ by David Chalmers [this link is a McDermott-corrected version, and therefore preferred to the version that was published in JCS]
- Jurgen Schmidhuber - Philosophers & Futurists, Catch Up!
- Frank Tipler - Inevitable Existence and Inevitable Goodness of the Singularity
- Roman Yampolskiy - Leakproofing the Singularity: Artificial Intelligence Confinement Problem
The issue consists of responses to Chalmers (2010). Future volumes will contain additional articles from Shulman & Bostrom, Igor Aleksander, Richard Brown, Ray Kurzweil, Pamela McCorduck, Chris Nunn, Arkady Plotnitsky, Jesse Prinz, Susan Schneider, Murray Shanahan, Burt Voorhees, and a response from Chalmers.
McDermott's chapter should be supplemented with this, which he says he didn't have space for in his JCS article.
'Facing the Singularity' podcast
My online mini-book Facing the Singularity now has a podcast. Ratings and reviews on iTunes will be much appreciated, so as to direct people toward a rationality-informed approach to intelligence explosion.
Jaan Tallinn has been passing my chapters around to people because they are concise explanations of key lemmas in the standard arguments on the need for Friendly AI. This is gratifying because it's exactly the purpose for which I'm writing them, and I encourage others to send people to these chapters as well.
(I'm currently writing the final two chapters of the online book and recording readings of the other chapters for the podcast. A volunteer is doing the audio editing.)
Draft of Muehlhauser & Salamon, 'Intelligence Explosion: Evidence and Import'
Anna Salamon and I have finished a draft of "Intelligence Explosion: Evidence and Import", under peer review for The Singularity Hypothesis: A Scientific and Philosophical Assessment (forthcoming from Springer).
Your comments are most welcome.
Edit: As of 3/31/2012, the link above now points to a preprint.
AI is not enough
What I write here may be quite simple (and I am certainly not the first to write about it), but I still think it is worth considering:
Say we have an abitrary problem that we assume has an algorithmic solution, and search for the solution of the problem.
How can the algorithm be determined?
Either:
a) Through another algorithm that exist prior to that algorithm.
b) OR: Through something non-algorithmic.
In the case of AI, the only solution is a), since there is nothing else but algorithms at its disposal. But then we have the problem to determine the algorithm the AI uses to find the solution, and then it would have to determine the algorithm to determine that algorithm, etc...
Obviously, at some point we have to actually find an algorithm to start with, so in any case at some point we need something fundamentally non-algorithmic to determine a solution to an problem that is solveable by an algorithm.
This reveals something fundamental we have to face with regards to AI:
Even assuming that all relevant problems are solvable by an algorithm, AI is not enough. Since there is no way to algorithmically determine the appropiate algorithm for an AI (since this would result in an infinite regress), we will always have to rely on some non-algorithmical intelligence to find more intelligent solutions. Even if we found a very powerful seed AI algorithm, there will always be more powerful seed AI algorithms that can't be determined by any known algorithm, and since we were able to find the first one, we have no reason to suppose we can't find another more powerful one. If an AI recursively improves 100000x times until it is 100^^^100 times more powerful, it still will be caught up if a better seed AI is found, which ultimately can't be done by an algorithm, so that further increases of the most general intelligence always rely on something non-algorithmic.
But even worse, it seems obvious to me that there are important practical problems that have no algorithmic solution (as opposed to theoretical problems like the halting problem, which are still tractable in practice), apart from the problem of finding the right algorithm.
In a sense, it seems all algorithms are too complicated to find the solution to the simple (though not necessarily easy) problem of giving rise to further general intelligence.
For example: No algorithm can determine the simple axioms of the natural numbers from anything weaker. We have postulate them by virtue of the simple seeing that they make sense. Thinking that AI could give rise to ever improving *general* intelligence is like thinking that an algorithm can yield "there is a natural number 0 and every number has a successor that, too, is a natural number". There is simply no way to derive the axioms from anything that doesn't already include it. The axioms of the natural numbers are just obvious, yet can't be derived - the problem of finding the axioms of natural numbers is too simple to be solved algorithmically. Yet still it is obvious how important the notion of natural numbers is.
Even the best AI will always be fundamentally incapable of finding some very simple, yet fundamental principles.
AI will always rely on the axioms it already knows, it can't go beyond it (unless reprogrammed by something external). Every new thing it learns can only be learned in term of already known axioms. This is simply a consequence of the fact that computers/programs are functioning according to fixed rules. But general intelligence necessarily has to transcend rules (since at the very least the rules can't be determined by rules).
I don't think this is an argument against a singularity of ever improving intelligence. It just can't happen driven (solely or predominantly) by AI, whether through a recursively self-improving seed AI or cognitive augmentation. Instead, we should expect a singularity that happens due to emergent intelligence. I think it is the interaction of different kind of intelligence (like human/animal intuitive intelligence, machine precision and the inherent order of the non-living universe, if you want to call that intelligence) that leads to increase in general intelligence, not just one particular kind of intelligence like formal reasoning used by computers.
Could Democritus have predicted intelligence explosion?
Also see: History of the Friendly AI concept.
The ancient atomists reasoned their way from first principles to materialism and atomic theory before Socrates began his life's work of making people look stupid in the marketplace of Athens. Why didn't they discover natural selection, too? After all, natural selection follows necessarily from heritability, variation, and selection, and the Greeks had plenty of evidence for all three pieces. Natural selection is obvious once you understand it, but it took us a long time to discover it.
I get the same vibe from intelligence explosion. The hypothesis wasn't stated clearly until 1965, but in hindsight it seems obvious. (Michael Vassar once told me that once he became a physicalist he said "Oh! Intelligence explosion!" Except of course he didn't know the term "intelligence explosion." And he was probably exaggerating.)
Intelligence explosion follows from physicalism and scientific progress and not much else. Since materialists had to believe that human intelligence resulted from the operation of mechanical systems located in the human body, they could have realized that scientists would eventually come to understand these systems so long as scientific progress continued. (Herophilos and Erasistratus were already mapping which nerves and veins did what back in the 4th century B.C.)
And once human intelligence is understood, it can be improved upon, and this improvement in intelligence can be used to improve intelligence even further. And the ancient Greeks certainly had good evidence that there was plenty of room above us when it came to intelligence.
The major hang-up for predicting intelligence explosion may have been the the inability to imagine that this intelligence-engineering could leave the limitations of the human skull and move to a speedier, more dependable and scalable substrate. And that's why Good's paper had to wait until the age of the computer.
</ speculation>
My interview with Nikola Danaylov for 'Singularity 1 on 1' [link]
Here. Audio and video available.
Selfish reasons for FAI
Let's take for granted that pursuing FAI is the best strategy for researchers interested in the future of all humanity. However, let's also assume that controlling unfriendly AI is not completely impossible. I would like to see arguments on why FAI may or may not be the best strategy for AGI researchers who are solely interested in selfish values: i.e., personal status, curiosity, well-being of their loved ones, etc.
I believe such discussion is important because i) all researchers are to some extent selfish and ii) it may be unwise to ignore researchers who fail to commit to perfect altruism. I, myself, do not know how selfish I would be if I were to become an AGI researcher in the future.
EDIT: Moved some of the original post content to a comment, since I suspect it was distracting from my main point.
Open Problems Related to the Singularity (draft 1)
"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"
The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an important question, so we'd like to work toward an answer.
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)