Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
(With Kaj Sotala)
SI's current R&D plan seems to go as follows:
1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.
The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.
The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.
A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed. These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems, or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely.
SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory. It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.
The hoped-for theory is intended to provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.
SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.
Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.
We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.
Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.
Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."
I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.
If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.
Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.
What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?
(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)
And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.
Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.
Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.
1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.
2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.
3. Even the provably friendly design will face real-world compromises and errors in its implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.
I propose it is altruistic to be replaceable and therefore, those who strive to be altruistic should strive to be replaceable.
As far as I can Google, this does not seem to have been proposed before. LW should be a good place to discuss it. A community interested in rational and ethical behavior, and in how superintelligent machines may decide to replace mankind, should at least bother to refute the following argument.
Replaceability is "the state of being replaceable". It isn't binary. The price of the replacement matters: so a cookie is more replaceable than a big wedding cake. Adequacy of the replacement also makes a difference: a piston for an ancient Rolls Royce is less replaceable than one in a modern car, because it has to be hand-crafted and will be distinguishable. So something is more or less replaceable depending on the price and quality of its replacement.
Replaceability could be thought of as the inverse of the cost of having to replace something. Something that's very replaceable has a low cost of replacement, while something that lacks replaceability has a high (up to unfeasible) cost of replacement. The cost of replacement plays into Total Cost of Ownership, and everything economists know about that applies. It seems pretty obvious that replaceability of possessions is good, much like cheap availability is good.
Some things (historical artifacts, art pieces) are valued highly precisely because of their irreplacability. Although a few things could be said about the resale value of such objects, I'll simplify and contend these valuations are not rational.
The practical example
Anne manages the central database of Beth's company. She's the only one who has access to that database, the skillset required for managing it, and an understanding of how it all works; she has a monopoly to that combination.
This monopoly gives Anne control over her own replacement cost. If she works according to the state of the art, writes extensive and up-to-date documentation, makes proper backups etc she can be very replaceable, because her monopoly will be easily broken. If she refuses to explain what she's doing, creates weird and fragile workarounds and documents the database badly she can reduce her replaceability and defend her monopoly. (A well-obfuscated database can take months for a replacement database manager to handle confidently.)
So Beth may still choose to replace Anne, but Anne can influence how expensive that'll be for Beth. She can at least make sure her replacement needs to be shown the ropes, so she can't be fired on a whim. But she might go further and practically hold the database hostage, which would certainly help her in salary negotiations if she does it right.
This makes it pretty clear how Anne can act altruistically in this situation, and how she can act selfishly. Doesn't it?
The moral argument
To Anne, her replacement cost is an externality and an influence on the length and terms of her employment. To maximize the length of her employment and her salary, her replacement cost would have to be high.
To Beth, Anne's replacement cost is part of the cost of employing her and of course she wants it to be low. This is true for any pair of employer and employee: Anne is unusual only in that she has a great degree of influence on her replacement cost.
Therefore, if Anne documents her database properly etc, this increases her replaceability and constitutes altruistic behavior. Unless she values the positive feeling of doing her employer a favor more highly than she values the money she might make by avoiding replacement, this might even be true altruism.
Unless I suck at Google, replaceability doesn't seem to have been discussed as an aspect of altruism. The two reasons for that I can see are:
- replacing people is painful to think about
- and it seems futile as long as people aren't replaceable in more than very specific functions anyway.
But we don't want or get the choice to kill one person to save the life of five, either, and such practical improbabilities shouldn't stop us from considering our moral decisions. This is especially true in a world where copies, and hence replacements, of people are starting to look possible at least in principle.
- In some reasonably-near future, software is getting better at modeling people. We still don't know what makes a process intelligent, but we can feed a couple of videos and a bunch of psychological data points into a people modeler, extrapolate everything else using a standard population and the resulting model can have a conversation that could fool a four-year-old. The technology is already good enough for models of pets. While convincing models of complex personalities are at least another decade away, the tech is starting to become good enough for senile grandmothers.
Obviously no-one wants granny to die. But the kids would like to keep a model of granny, and they'd like to make the model before the Alzheimer's gets any worse, while granny is terrified she'll get no more visits to her retirement home.
What's the ethical thing to do here? Surely the relatives should keep visiting granny. Could granny maybe have a model made, but keep it to herself, for release only through her Last Will and Testament? And wouldn't it be truly awful of her to refuse to do that?
- Only slightly further into the future, we're still mortal, but cryonics does appear to be working. Unfrozen people need regular medical aid, but the technology is only getting better and anyway, the point is: something we can believe to be them can indeed come back.
Some refuse to wait out these Dark Ages; they get themselves frozen for nonmedical reasons, to fastforward across decades or centuries into a time when the really awesome stuff will be happening, and to get the immortality technologies they hope will be developed by then.
In this scenario, wouldn't fastforwarders be considered selfish, because they impose on their friends the pain of their absence? And wouldn't their friends mind it less if the fastforwarders went to the trouble of having a good model (see above) made first?
- On some distant future Earth, minds can be uploaded completely. Brains can be modeled and recreated so effectively that people can make living, breathing copies of themselves and experience the inability to tell which instance is the copy and which is the original.
Of course many adherents of soul theories reject this as blasphemous. A couple more sophisticated thinkers worry if this doesn't devalue individuals to the point where superhuman AIs might conclude that as long as copies of everyone are stored on some hard drive orbiting Pluto, nothing of value is lost if every meatbody gets devoured into more hardware. Bottom line is: Effective immortality is available, but some refuse it out of principle.
In this world, wouldn't those who make themselves fully and infinitely replaceable want the same for everyone they love? Wouldn't they consider it a dreadful imposition if a friend or relative refused immortality? After all, wasn't not having to say goodbye anymore kind of the point?
These questions haven't come up in the real world because people have never been replaceable in more than very specific functions. But I hope you'll agree that if and when people become more replaceable, that will be regarded as a good thing, and it will be regarded as virtuous to use these technologies as they become available, because it spares one's friends and family some or all of the cost of replacing oneself.
Replaceability as an altruist virtue
And if replaceability is altruistic in this hypothetical future, as well as in the limited sense of Anne and Beth, that implies replaceability is altruistic now. And even now, there are things we can do to increase our replaceability, i.e. to reduce the cost our bereaved will incur when they have to replace us. We can teach all our (valuable) skills, so others can replace us as providers of these skills. We can not have (relevant) secrets, so others can learn what we know and replace us as sources of that knowledge. We can endeavour to live as long as possible, to postpone the cost. We can sign up for cryonics. There are surely other things each of us could do to increase our replaceability, but I can't think of any an altruist wouldn't consider virtuous.
As an altruist, I conclude that replaceability is a prosocial, unselfish trait, something we'd want our friends to have, in other words: a virtue. I'd go as far as to say that even bothering to set up a good Last Will and Testament is virtuous precisely because it reduces the cost my bereaved will incur when they have to replace me. And although none of us can be truly easily replaceable as of yet, I suggest we honor those who make themselves replaceable, and are proud of whatever replaceability we ourselves attain.
So, how replaceable are you?
Here's the biggest thing that I've been working on for the last several months:
Responses to Catastrophic AGI Risk: A Survey
Kaj Sotala, Roman Yampolskiy, and Luke Muehlhauser
Abstract: Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next 20-100 years. It has been suggested that this may become a catastrophic risk, threatening to do major damage on a global scale. After briefly summarizing the arguments for why AGI may become a catastrophic risk, we survey various proposed responses to AGI risk. We consider societal proposals, proposals for constraining the AGIs’ behavior from the outside, and for creating AGIs in such a way that they are inherently safe.
This doesn't aim to be a very strongly argumentative paper, though it does comment on the various proposals from an SI-ish point of view. Rather, it attempts to provide a survey of all the major AGI-risk related proposals that have been made so far, and to provide some thoughts on their respective strengths and weaknesses. Before writing this paper, we hadn't encountered anyone who'd have been familiar with all of these proposals - not to mention that even we ourselves weren't familiar with all of them! Hopefully, this should become a useful starting point for anyone who's at all interested in AGI risk or Friendly AI.
The draft will be public and open for comments for one week (until Nov 23rd), after which we'll incorporate the final edits and send it off for review. We're currently aiming to have it published in the sequel volume to Singularity Hypotheses.
EDIT: I've now hidden the draft from public view (so as to avoid annoying future publishers who may not like early drafts floating around before the work has been accepted for publication) while I'm incorporating all the feedback that we got. Thanks to everyone who commented!
How was it? Which speakers delivered according to expectations?
Which topics were left unresolved?
Were any topics resolved?
Whatever you have to say about it, say it here.
Suggestion: if you are going to comment, mention "I was there" just so we know who was or wasn't.
Related post: Muehlhauser-Wang Dialogue.
Abstract. AGI systems should be able to manage its motivations or goals that are persistent, spontaneous, mutually restricting, and changing over time. A mechanism for handles this kind of goals is introduced and discussed.
From the discussion section:
The major conclusion argued in this paper is that an AGI system should always maintain a goal structure (or whatever it is called) which contains multiple goals that are separately specified, with the properties that
- Some of the goals are accurately specified, and can be fully achieved, while some others are vaguely specified and only partially achievable, but nevertheless have impact on the system's decisions.
- The goals may conflict with each other on what the system should do at a moment, and cannot be achieved all together. Very often the system has to make compromises among the goals.
- Due to the restriction in computational resources, the system cannot take all existing goals into account when making each decision, and nor can it keep a complete record of the goal derivation history.
- The designers and users are responsible for the input goals of an AGI system, from which all the other goals are derived, according to the system's experience. There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplified situations.
One area that is closely related to goal management is AI ethics. The previous discussions focused on the goal the designers assign to an AGI system ("super goal" or "final goal"), with the implicit assumption that such a goal will decide the consequences caused by the A(G)I systems. However, the above analysis shows that though the input goals are indeed important, they are not the dominating factor that decides the broad impact of AI to human society. Since no AGI system can be omniscient and omnipotent, to be "general-purpose" means such a system has to handle problems for which its knowledge and resources are insufficient [16, 18], and one direct consequence is that its actions may produce unanticipated results. This consequence, plus the previous conclusion that the effective goal for an action may be inconsistent with the input goals, will render many of the previous suggestions mostly irrelevant to AI ethics.
For example, Yudkowsky's "Friendly AI" agenda is based on the assumption that "a true AI might remain knowably stable in its goals, even after carrying out a large number of self-modications" . The problem about this assumption is that unless we are talking about an axiomatic system with unlimited resources, we cannot assume the system can accurately know the consequence of its actions. Furthermore, as argued previously, the goals in an intelligent system inevitable change as its experience grows, which is not necessarily a bad thing - after all, our "human nature" gradually grows out of, and deviates from, our "animal nature", at both the species level and the individual level.
Omohundro argued that no matter what input goals are given to an AGI system, it usually will derive some common "basic drives", including "be self-protective" and "to acquire resources" , which leads some people to worry that such a system will become unethical. According to our previous analysis, the producing of these goals are indeed very likely, but it is only half of the story. A system with a resource-acquisition goal does not necessarily attempts to achieve it at all cost, without considering its other goals. Again, consider the human beings - everyone has some goals that can become dangerous (either to oneself or to the others) if pursued at all costs. The proper solution, both to human ethics and to AGI ethics, is to prevent this kind of goal from becoming dominant, rather than from being formed.
A time dilation tool from an anime is discussed for its practical use on Earth; there seem surprisingly few uses and none that will change the world, due to the severe penalties humans would incur while using it, and basic constraints like Amdahl's law limit the scientific uses. A comparison with the position of an Artificial Intelligence such as an emulated human brain seems fair, except most of the time dilation disadvantages do not apply or can be ameliorated and hence any speedups could be quite effectively exploited. I suggest that skeptics of the idea that speedups give advantages are implicitly working off the crippled time dilation tool and not making allowance for the disanalogies.
Master version on gwern.net
Lately I've been pondering the fact that while there are many critics of SIAI and its plan to form a team to build FAI, few of us seem to agree on what SIAI or we should do instead. Here are some of the alternative suggestions offered so far:
- work on computer security
- work to improve laws and institutions
- work on mind uploading
- work on intelligence amplification
- work on non-autonomous AI (e.g., Oracle AI, "Tool AI", automated formal reasoning systems, etc.)
- work on academically "mainstream" AGI approaches or trust that those researchers know what they are doing
- stop worrying about the Singularity and work on more mundane goals
I am planning on going to the Singularity Summit this year, I applied for a student discount earlier on - approximately 3 weeks ago. Still haven't heard back. I am curious to hear if anyone else has applied for student discount and got a reply. I am studying in the UK, so really want to wrap my logistics issues up quickly! Hence, anyone else in the same boat?
From the final chapter of his new book Cowards, titled "Adapt or Die: The Coming Intelligence Explosion."
The year is 1678 and you’ve just arrived in England via a time machine. You take out your new iPhone in front of a group of scientists who have gathered to marvel at your arrival.
“Siri,” you say, addressing the phone’s voice-activated artificial intelligence system, “play me some Beethoven.”
Dunh-Dunh-Dunh-Duuunnnhhh! The famous opening notes of Beethoven’s Fifth Symphony, stored in your music library, play loudly.
“Siri, call my mother.”
Your mother’s face appears on the screen, a Hawaiian beach behind her. “Hi, Mom!” you say. “How many fingers am I holding up?”
“Three,” she correctly answers. “Why haven’t you called more—”
“Thanks, Mom! Gotta run!” you interrupt, hanging up.
“Now,” you say. “Watch this.”
Your new friends look at the iPhone expectantly.
“Siri, I need to hide a body.”
Without hesitation, Siri asks: “What kind of place are you looking for? Mines, reservoirs, metal foundries, dumps, or swamps?” (I’m not kidding. If you have an iPhone 4S, try it.)
You respond “Swamps,” and Siri pulls up a satellite map showing you nearby swamps.
The scientists are shocked into silence. What is this thing that plays music, instantly teleports video of someone across the globe, helps you get away with murder, and is small enough to fit into a pocket?
At best, your seventeenth-century friends would worship you as a messenger of God. At worst, you’d be burned at the stake for witchcraft. After all, as science fiction author Arthur C. Clarke once said, “Any sufficiently advanced technology is indistinguishable from magic.”
Now, imagine telling this group that capitalism and representative democracy will take the world by storm, lifting hundreds of millions of people out of poverty. Imagine telling them their descendants will eradicate smallpox and regularly live seventy-five or more years. Imagine telling them that men will walk on the moon, that planes, flying hundreds of miles an hour, will transport people around the world, or that cities will be filled with buildings reaching thousands of feet into the air.
They’d probably escort you to the madhouse.
Unless, that is, one of the people in that group had been a man named Ray Kurzweil.
Kurzweil is an inventor and futurist who has done a better job than most at predicting the future. Dozens of the predictions from his 1990 book The Age of Intelligent Machines came true during the 1990s and 2000s. His follow-up book, The Age of Spiritual Machines, published in 1999, fared even better. Of the 147 predictions that Kurzweil made for 2009, 78 percent turned out to be entirely correct, and another 8 percent were roughly correct. For example, even though every portable computer had a keyboard in 1999, Kurzweil predicted that most portable computers would lack a keyboard by 2009. It turns out he was right: by 2009, most portable computers were MP3 players, smartphones, tablets, portable game machines, and other devices that lacked keyboards.
Kurzweil is most famous for his “law of accelerating returns,” the idea that technological progress is generally “exponential” (like a hockey stick, curving up sharply) rather than “linear” (like a straight line, rising slowly). In nongeek-speak that means that our knowledge is like the compound interest you get on your bank account: it increases exponentially as time goes on because it keeps building on itself. We won’t experience one hundred years of progress in the twenty-first century, but rather twenty thousand years of progress (measured at today’s rate).
Many experts have criticized Kurzweil’s forecasting methods, but a careful and extensive review of technological trends by researchers at the Santa Fe Institute came to the same basic conclusion: technological progress generally tends to be exponential (or even faster than exponential), not linear.
So, what does this mean? In his 2005 book The Singularity Is Near, Kurzweil shares his predictions for the next few decades:
- In our current decade, Kurzweil expects real-time translation tools and automatic house-cleaning robots to become common.
- In the 2020s he expects to see the invention of tiny robots that can be injected into our bodies to intelligently find and repair damage and cure infections.
- By the 2030s he expects “mind uploading” to be possible, meaning that your memories and personality and consciousness could be copied to a machine. You could then make backup copies of yourself, and achieve a kind of technological immortality.
Age of the Machines?
“We became the dominant species on this planet by being the most intelligent species around. This century we are going to cede that crown to machines. After we do that, it will be them steering history rather than us.”
—Jaan Tallinn, co-creator of Skype and Kazaa
If any of that sounds absurd, remember again how absurd the eradication of smallpox or the iPhone 4S would have seemed to those seventeenth-century scientists. That’s because the human brain is conditioned to believe that the past is a great predictor of the future. While that might work fine in some areas, technology is not one of them. Just because it took decades to put two hundred transistors onto a computer chip doesn’t mean that it will take decades to get to four hundred. In fact, Moore’s Law, which states (roughly) that computing power doubles every two years, shows how technological progress must be thought of in terms of “hockey stick” progress, not “straight line” progress. Moore’s Law has held for more than half a century already (we can currently fit 2.6 billion transistors onto a single chip) and there’s little reason to expect that it won’t continue to.
But the aspect of his book that has the most far-ranging ramifications for us is Kurzweil’s prediction that we will achieve a “technological singularity” in 2045. He defines this term rather vaguely as “a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed.”
Part of what Kurzweil is talking about is based on an older, more precise notion of “technological singularity” called an intelligence explosion. An intelligence explosion is what happens when we create artificial intelligence (AI) that is better than we are at the task of designing artificial intelligences. If the AI we create can improve its own intelligence without waiting for humans to make the next innovation, this will make it even more capable of improving its intelligence, which will . . . well, you get the point. The AI can, with enough improvements, make itself smarter than all of us mere humans put together.
The really exciting part (or the scary part, if your vision of the future is more like the movie The Terminator) is that, once the intelligence explosion happens, we’ll get an AI that is as superior to us at science, politics, invention, and social skills as your computer’s calculator is to you at arithmetic. The problems that have occupied mankind for decades— curing diseases, finding better energy sources, etc.— could, in many cases, be solved in a matter of weeks or months.
Again, this might sound far-fetched, but Ray Kurzweil isn’t the only one who thinks an intelligence explosion could occur sometime this century. Justin Rattner, the chief technology officer at Intel, predicts some kind of Singularity by 2048. Michael Nielsen, co-author of the leading textbook on quantum computation, thinks there’s a decent chance of an intelligence explosion by 2100. Richard Sutton, one of the biggest names in AI, predicts an intelligence explosion near the middle of the century. Leading philosopher David Chalmers is 50 percent confident an intelligence explosion will occur by 2100. Participants at a 2009 conference on AI tended to be 50 percent confident that an intelligence explosion would occur by 2045.
If we can properly prepare for the intelligence explosion and ensure that it goes well for humanity, it could be the best thing that has ever happened on this fragile planet. Consider the difference between humans and chimpanzees, which share 95 percent of their genetic code. A relatively small difference in intelligence gave humans the ability to invent farming, writing, science, democracy, capitalism, birth control, vaccines, space travel, and iPhones— all while chimpanzees kept flinging poo at each other.
The thought that machines could one day have superhuman abilities should make us nervous. Once the machines are smarter and more capable than we are, we won’t be able to negotiate with them any more than chimpanzees can negotiate with us. What if the machines don’t want the same things we do?
The truth, unfortunately, is that every kind of AI we know how to build today definitely would not want the same things we do. To build an AI that does, we would need a more flexible “decision theory” for AI design and new techniques for making sense of human preferences. I know that sounds kind of nerdy, but AIs are made of math and so math is really important for choosing which results you get from building an AI.
These are the kinds of research problems being tackled by the Singularity Institute in America and the Future of Humanity Institute in Great Britain. Unfortunately, our silly species still spends more money each year on lipstick research than we do on figuring out how to make sure that the most important event of this century (maybe of all human history)— the intelligence explosion— actually goes well for us.
Likewise, self-improving machines could perform scientific experiments and build new technologies much faster and more intelligently than humans can. Curing cancer, finding clean energy, and extending life expectancies would be child’s play for them. Imagine living out your own personal fantasy in a different virtual world every day. Imagine exploring the galaxy at near light speed, with a few backup copies of your mind safe at home on earth in case you run into an exploding supernova. Imagine a world where resources are harvested so efficiently that everyone’s basic needs are taken care of, and political and economic incentives are so intelligently fine-tuned that “world peace” becomes, for the first time ever, more than a Super Bowl halftime show slogan.
With self-improving AI we may be able to eradicate suffering and death just as we once eradicated smallpox. It is not the limits of nature that prevent us from doing this, but only the limits of our current understanding. It may sound like a paradox, but it’s our brains that prevent us from fully understanding our brains.
At this point you might be asking yourself: “Why is this topic in this book? What does any of this have to do with the economy or national security or politics?”
In fact, it has everything to do with all of those issues, plus a whole lot more. The intelligence explosion will bring about change on a scale and scope not seen in the history of the world. If we don’t prepare for it, things could get very bad, very fast. But if we do prepare for it, the intelligence explosion could be the best thing that has happened since . . . literally ever.
But before we get to the kind of life-altering progress that would come after the Singularity, we will first have to deal with a lot of smaller changes, many of which will throw entire industries and ways of life into turmoil. Take the music business, for example. It was not long ago that stores like Tower Records and Sam Goody were doing billions of dollars a year in compact disc sales; now people buy music from home via the Internet. Publishing is currently facing a similar upheaval. Newspapers and magazines have struggled to keep subscribers, booksellers like Borders have been forced into bankruptcy, and customers are forcing publishers to switch to ebooks faster than the publishers might like.
All of this is to say that some people are already witnessing the early stages of upheaval firsthand. But for everyone else, there is still a feeling that something is different this time; that all of those years of education and experience might be turned upside down in an instant. They might not be able to identify it exactly but they realize that the world they’ve known for forty, fifty, or sixty years is no longer the same.
There’s a good reason for that. We feel it and sense it because it’s true. It’s happening. There’s absolutely no question that the world in 2030 will be a very different place than the one we live in today. But there is a question, a large one, about whether that place will be better or worse.
It’s human nature to resist change. We worry about our families, our careers, and our bank accounts. The executives in industries that are already experiencing cataclysmic shifts would much prefer to go back to the way things were ten years ago, when people still bought music, magazines, and books in stores. The future was predictable. Humans like that; it’s part of our nature.
But predictability is no longer an option. The intelligence explosion, when it comes in earnest, is going to change everything— we can either be prepared for it and take advantage of it, or we can resist it and get run over.
Unfortunately, there are a good number of people who are going to resist it. Not only those in affected industries, but those who hold power at all levels. They see how technology is cutting out the middlemen, how people are becoming empowered, how bloggers can break national news and YouTube videos can create superstars.
And they don’t like it.
A Battle for the Future
Power bases in business and politics that have been forged over decades, if not centuries, are being threatened with extinction, and they know it. So the owners of that power are trying to hold on. They think they can do that by dragging us backward. They think that, by growing the public’s dependency on government, by taking away the entrepreneurial spirit and rewards and by limiting personal freedoms, they can slow down progress.
But they’re wrong. The intelligence explosion is coming so long as science itself continues. Trying to put the genie back in the bottle by dragging us toward serfdom won’t stop it and will, in fact, only leave the world with an economy and society that are completely unprepared for the amazing things that it could bring.
Robin Hanson, author of “The Economics of the Singularity” and an associate professor of economics at George Mason University, wrote that after the Singularity, “The world economy, which now doubles in 15 years or so, would soon double in somewhere from a week to a month.”
That is unfathomable. But even if the rate were much slower, say a doubling of the world economy in two years, the shock-waves from that kind of growth would still change everything we’ve come to know and rely on. A machine could offer the ideal farming methods to double or triple crop production, but it can’t force a farmer or an industry to implement them. A machine could find the cure for cancer, but it would be meaningless if the pharmaceutical industry or Food and Drug Administration refused to allow it. The machines won’t be the problem; humans will be.
And that’s why I wanted to write about this topic. We are at the forefront of something great, something that will make the Industrial Revolution look in comparison like a child discovering his hands. But we have to be prepared. We must be open to the changes that will come, because they will come. Only when we accept that will we be in a position to thrive. We can’t allow politicians to blame progress for our problems. We can’t allow entrenched bureaucrats and power-hungry executives to influence a future that they may have no place in.
Many people are afraid of these changes— of course they are: it’s part of being human to fear the unknown— but we can’t be so entrenched in the way the world works now that we are unable to handle change out of fear for what those changes might bring.
Change is going to be as much a part of our future as it has been of our past. Yes, it will happen faster and the changes themselves will be far more dramatic, but if we prepare for it, the change will mostly be positive. But that preparation is the key: we need to become more well-rounded as individuals so that we’re able to constantly adapt to new ways of doing things. In the future, the way you do your job may change four to five or fifty times over the course of your life. Those who cannot, or will not, adapt will be left behind.
At the same time, the Singularity will give many more people the opportunity to be successful. Because things will change so rapidly there is a much greater likelihood that people will find something they excel at. But it could also mean that people’s successes are much shorter-lived. The days of someone becoming a legend in any one business (think Clive Davis in music, Steven Spielberg in movies, or the Hearst family in publishing) are likely over. But those who embrace and adapt to the coming changes, and surround themselves with others who have done the same, will flourish.
When major companies, set in their ways, try to convince us that change is bad and that we must stick to the status quo, no matter how much human inquisitiveness and ingenuity try to propel us forward, we must look past them. We must know in our hearts that these changes will come, and that if we welcome them into our world, we’ll become more successful, more free, and more full of light than we could have ever possibly imagined.
Ray Kurzweil once wrote, “The Singularity is near.” The only question will be whether we are ready for it.
The citations for the chapter include:
- Luke Muehlhauser and Anna Salamon, "Intelligence Explosion: Evidence and Import"
- Daniel Dewey, "Learning What to Value"
- Eliezer Yudkowsky, "Artificial Intelligence as a Positive and a Negative Factor in Global Risk"
- Luke Muehlhauser and Louie Helm, "The Singularity and Machine Ethics"
- Luke Muehlhauser, "So You Want to Save the World"
- Michael Anissimov, "The Benefits of a Successful Singularity"
As one of my assignments at the Singularity Institute (SI), I am writing a research FAQ answering the most frequently asked questions regarding the Singularity Institute's research program.
For a short summary of what SI is about, see our concise summary.
Here are some examples of questions I'm currently planning to include:
1) who conducts research at SI?
2) what are the specific research topics being investigated?
3) what is the history of SI's research program?
4) where does SI see its research program in 5, 10, and 20 years?
5) what other organizations conduct research similar to SI?
Please submit other questions that come to mind below. Unfortunately, due to limited time, we cannot answer every question posed to us. However, I hope to answer some of the questions that receive the most upvotes. Thank you for your participation!
Part of the Muehlhauser interview series on AGI.
Continued from part 1...
[Apr 11th, 2012]
I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or "convergent instrumenta l goals" required that scenario to hold.
Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom 's more recent paper " The Superintelligent Will." Which parts of Nick's argument fail to persuade you?
[Apr 12th, 2012]
Well, for one thing, I think his
Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.
is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative
Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.
This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.
It might be possible through deliberate effort to construct a superintelligence that values ... human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.
but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.
One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality....
The Instrumental Convergence Thesis
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
the first clause makes sense to me,
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations
but it doesn’t seem to me to justify the second clause
implying that these instrumental values are likely to be pursued by many intelligent agents.
The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.
In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions -- but the assumptions are not justified in the paper, and I don’t buy them at all.
[Apr. 19, 2012]
Let me explain why I think that:
(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.
Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.
Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.
Perhaps the word "intelligence" is getting in our way. Let's define a notion of " optimization power," which measures (roughly) an agent's ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).
Which parts of this analysis do you think are wrong?
[Apr. 20, 2012]
It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says
instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
Note the somewhat weaselly reference to a “wide range” of goals and situations -- not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like
instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.
then I would disagree with the first clause of his statement (“instrumental values can be identified which...”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.
About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”
So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want," I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ....
[May 1, 2012]
I think our dialogue has reached the point of diminishing marginal returns, so I'll conclude with just a few points and let you have the last word.
On convergent instrumental goals, I encourage readers to read " The Superintelligent Will" and make up their own minds.
On the convergence of advanced intelligent systems toward optimization behavior, I'll point you to Omohundro (2007).
Well, it's been a fun chat. Although it hasn't really covered much new ground, there have been some new phrasings and minor new twists.
One thing I'm repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree -- for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences -- you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions -- but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that "superhuman AGI might plausibly kill everyone." The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a "wide range" of intelligences. Etc.
On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can't fully rationally substantiate this intuition either -- all I can really fully rationally argue for is something weaker like "It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly." In my case just like yours, reason is far weaker than intuition.
Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else's work banned based on a rough intuition is pretty hard. To ban someone else's work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.
What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.
Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services -- not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it's good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI's views are actually correct may be peripheral to the organization's main value and impact.
I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)
"All that is necessary for evil to triumph is that good men do nothing."
155,000 people are dying, on average, every day. For those of us who are preference utilitarians, and also believe that a Friendly singularity is possible, and capable of ending this state of affairs, it also puts a great deal of pressure on us. It doesn't give us leave to be sloppy (because human extinction, even multiplied by a low probability, is a massive negative utility). But, if we see a way to achieve similar results in a shorter time frame, the cost to human life of not taking it is simply unacceptable.
I have some concerns about CEV on a conceptual level, but I'm leaving those aside for the time being. My concern is that most of the organizations concerned with a first-mover X-risk are not in a position to be that first mover -- and, furthermore, they're not moving in that direction. That includes the Singularity Institute. Trying to operationalize CEV seems like a good way to get an awful lot of smart people bashing their heads against a wall while clever idiots trundle ahead with their own experiments. I'm not saying that we should be hasty, but I am suggesting that we need to be careful of getting stuck in dark intellectual forests with lots of things that are fun to talk about until an idiot with the tinderbox burns it down.
My point, in short, is that we need to be looking for better ways to do things, and to do them extremely quickly. We are working on a very, very, existentially tight schedule.
So, if we're looking for quicker paths to a Friendly, first-mover singularity, I'd like to talk about one that seems attractive to me. Maybe it's a useful idea. If not, then at least I won't waste any more time thinking about it. Either way, I'm going to lay it out and you guys can see what you think.
So, Friendliness is a hard problem. Exactly how hard, we don't know, but a lot of smart people have radically different ideas of how to attack it, and they've all put a lot of thought into it, and that's not a good sign. However, designing a strongly superhuman AI is also a hard problem. Probably much harder than a human can solve. The good news is, we don't expect that we'll have to. If we can build something just a little bit smarter than we are, we expect that bootstrapping process to take off without obvious limit.
So let's apply the same methodology to Friendliness. General goal optimizers are tools, after all. Probably the most powerful tools that have ever existed, for that matter. Let's say we build something that's not Friendly. Not something we want running the universe -- but, Friendly enough. Friendly enough that it's not going to kill us all. Friendly enough not to succumb to the pedantic genie problem. Friendly enough we can use it to build what we really want, be it CEV or something else.
I'm going to sketch out an architecture of what such a system might look like. Do bear in mind this is just a sketch, and in no way a formal, safe, foolproof design spec.
So, let's say we have an agent with the ability to convert unstructured data into symbolic relationships that represent the world, with explicitly demarcated levels of abstraction. Let's say the system has the ability to build Bayesian causal relationships out of its data points over time, and construct efficient, predictive models of the behavior of the concepts in the world. Let's also say that the system has the ability to take a symbolic representation of a desired future distribution of universes, a symbolic representation of the current universe, and map between them, finding valid chains of causality leading from now to then, probably using a solid decision theory background. These are all hard problems to solve, but they're the same problems everyone else is solving too.
This system, if you just specify parameters about the future and turn it loose, is not even a little bit Friendly. But let's say you do this: first, provide it with a tremendous amount of data, up to and including the entire available internet, if necessary. Everything it needs to build extremely effective models of human beings, with strongly generalized predictive power. Then you incorporate one or more of those models (say, a group of trusted people) as a functional components: the system uses them to generalize natural language instructions first into a symbolic graph, and then into something actionable, working out the details of what it meant, rather than what is said. Then, when the system is finding valid paths of causality, it takes its model of the state of the universe at the end of each course of action, feeds them into its human-models, and gives them a veto vote. Think of it as the emergency regret button, iterated computationally for each possibility considered by the genie. Any of them that any of the person-models find unacceptable are disregarded.
(small side note: as described here, the models would probably eventually be indistinguishable from uploaded minds, and would be created, simulated for a short time, and destroyed uncountable trillions of times -- you'd either need to drastically limit the simulation depth of a models, or ensure that everyone who you signed up to be one of the models knew the sacrifice they were making)
So, what you've got, plus or minus some spit and polish, is a very powerful optimization engine that understands what you mean, and disregards obviously unacceptable possibilities. If you ask it for a truly Friendly AI, it will help you first figure out what you mean by that, then help you build it, then help you formally prove that it's safe. It would turn itself off if you asked it too, and meant it. It would also exterminate the human species if you asked it to and meant it. Not Friendly, but Friendly enough to build something better.
With this approach, the position of the Friendly AI researcher changes. Instead of being in an arms race with the rest of the AI field with a massive handicap (having to solve two incredibly hard problems against opponents who only have to solve one), we only have to solve a relatively simpler problem (building a Friendly-enough AI), which we can then instruct to sabotage unFriendly AI projects and buy some time to develop the real deal. It turns it into a fair fight, one that we might actually win.
Anyone have any thoughts on this idea?
Abstract: In the FOOM debate, Eliezer emphasizes 'optimization power', something like intelligence, as the main thing that makes both evolution and humans so powerful. A different choice of abstractions says that the main thing that's been giving various organisms - from single-celled creatures to wasps to humans - an advantage is the capability to form superorganisms, thus reaping the gains of specialization and shifting evolutionary selection pressure to the level of the superorganism. There seem to be several ways by which a technological singularity could involve the creation of new kinds of superorganisms, which would then reap benefits above and beyond those that individual humans can achieve, and which would quite likely have quite different values. This strongly suggests that even if one is not worried about the intelligence explosion (because of e.g. finding a hard takeoff improbable), one should still be worried about the co-operative explosion.
After watching Jonathan Haidt's excellent new TEDTalk yesterday, I bought his latest book, The Righteous Mind: Why Good People Are Divided by Politics and Religion. At one point, Haidt has a discussion of evolutionary superorganisms - cases where previously separate organisms have joined together into a single superorganism, shifting evolution's selection pressure to operate on the level of the superorganism and avoiding the usual pitfalls that block group selection (excerpts below). With an increased ability for the previously-separate organisms to co-operate, these new superorganisms can often out-compete simpler organisms.
Suppose you entered a boat race. One hundred rowers, each in a separate rowboat, set out on a ten-mile race along a wide and slow-moving river. The first to cross the finish line will win $10,000. Halfway into the race, you’re in the lead. But then, from out of nowhere, you’re passed by a boat with two rowers, each pulling just one oar. No fair! Two rowers joined together into one boat! And then, stranger still, you watch as that rowboat is overtaken by a train of three such rowboats, all tied together to form a single long boat. The rowers are identical septuplets. Six of them row in perfect synchrony while the seventh is the coxswain, steering the boat and calling out the beat for the rowers. But those cheaters are deprived of victory just before they cross the finish line, for they in turn are passed by an enterprising group of twenty-four sisters who rented a motorboat. It turns out that there are no rules in this race about what kinds of vehicles are allowed.
That was a metaphorical history of life on Earth. For the first billion years or so of life, the only organisms were prokaryotic cells (such as bacteria). Each was a solo operation, competing with others and reproducing copies of itself. But then, around 2 billion years ago, two bacteria somehow joined together inside a single membrane, which explains why mitochondria have their own DNA, unrelated to the DNA in the nucleus. These are the two-person rowboats in my example. Cells that had internal organelles could reap the benefits of cooperation and the division of labor (see Adam Smith). There was no longer any competition between these organelles, for they could reproduce only when the entire cell reproduced, so it was “one for all, all for one.” Life on Earth underwent what biologists call a “major transition.” Natural selection went on as it always had, but now there was a radically new kind of creature to be selected. There was a new kind of vehicle by which selfish genes could replicate themselves. Single-celled eukaryotes were wildly successful and spread throughout the oceans.
A few hundred million years later, some of these eukaryotes developed a novel adaptation: they stayed together after cell division to form multicellular organisms in which every cell had exactly the same genes. These are the three-boat septuplets in my example. Once again, competition is suppressed (because each cell can only reproduce if the organism reproduces, via its sperm or egg cells). A group of cells becomes an individual, able to divide labor among the cells (which specialize into limbs and organs). A powerful new kind of vehicle appears, and in a short span of time the world is covered with plants, animals, and fungi. It’s another major transition.
Major transitions are rare. The biologists John Maynard Smith and Eörs Szathmáry count just eight clear examples over the last 4 billion years (the last of which is human societies). But these transitions are among the most important events in biological history, and they are examples of multilevel selection at work. It’s the same story over and over again: Whenever a way is found to suppress free riding so that individual units can cooperate, work as a team, and divide labor, selection at the lower level becomes less important, selection at the higher level becomes more powerful, and that higher-level selection favors the most cohesive superorganisms. (A superorganism is an organism made out of smaller organisms.) As these superorganisms proliferate, they begin to compete with each other, and to evolve for greater success in that competition. This competition among superorganisms is one form of group selection. There is variation among the groups, and the fittest groups pass on their traits to future generations of groups.
Major transitions may be rare, but when they happen, the Earth often changes. Just look at what happened more than 100 million years ago when some wasps developed the trick of dividing labor between a queen (who lays all the eggs) and several kinds of workers who maintain the nest and bring back food to share. This trick was discovered by the early hymenoptera (members of the order that includes wasps, which gave rise to bees and ants) and it was discovered independently several dozen other times (by the ancestors of termites, naked mole rats, and some species of shrimp, aphids, beetles, and spiders). In each case, the free rider problem was surmounted and selfish genes began to craft relatively selfless group members who together constituted a supremely selfish group.
These groups were a new kind of vehicle: a hive or colony of close genetic relatives, which functioned as a unit (e.g., in foraging and fighting) and reproduced as a unit. These are the motorboating sisters in my example, taking advantage of technological innovations and mechanical engineering that had never before existed. It was another transition. Another kind of group began to function as though it were a single organism, and the genes that got to ride around in colonies crushed the genes that couldn’t “get it together” and rode around in the bodies of more selfish and solitary insects. The colonial insects represent just 2 percent of all insect species, but in a short period of time they claimed the best feeding and breeding sites for themselves, pushed their competitors to marginal grounds, and changed most of the Earth’s terrestrial ecosystems (for
example, by enabling the evolution of flowering plants, which need pollinators). Now they’re the majority, by weight, of all insects on Earth.
Haidt's argument is that color politics and other political mind-killingness are due to a set of adaptations that temporarily lets people merge into a superorganism and set individual interest aside. To a lesser extent, so are moral intuitions about things such as fairness and proportionality. Yes, it's a group selection argument. Haidt acknowledges that group selection has been unpopular in biology for a while, but notes that it has also been making a comeback recently, and cites e.g. the work on multi-level selection as supporting his thesis. I mention some of his references (which I have not yet read) below.
Anyway, the reason why I'm bringing this up is that I've been re-reading the FOOM debate of late, and in Life's Story Continues, Eliezer references some of the same evolutionary milestones as Haidt does. And while Eliezer also mentions that the cells provided a major co-operative advantage that allowed for specialization, he views this merely through the lens of optimization power, and dismisses e.g. unicellular eukaryotes with the words "meh, so what".
Cells: Force a set of genes, RNA strands, or catalytic chemicals to share a common reproductive fate. (This is the real point of the cell boundary, not "protection from the environment" - it keeps the fruits of chemical labor inside a spatial boundary.) But, as we've defined our abstractions, this is mostly a matter of optimization slope - the quality of the search neighborhood. The advent of cells opens up a tremendously rich new neighborhood defined by specialization and division of labor. It also increases the slope by ensuring that chemicals get to keep the fruits of their own labor in a spatial boundary, so that fitness advantages increase. But does it hit back to the meta-level? How you define that seems to me like a matter of taste. Cells don't quite change the mutate-reproduce-select cycle. But if we're going to define sexual recombination as a meta-level innovation, then we should also define cellular isolation as a meta-level innovation. (Life's Story Continues)
The interesting thing about the FOOM debate is that both Eliezer and Robin seem to talk a lot about the significance of co-operation, but they never quite take it up explicitly. Robin talks about the way that isolated groups typically aren't able to take over the world, because it's much more effective to co-operate with others than try to do everything yourself, or because information within the group tends to leak out to other parties. Eliezer talks about the way that cells allowed the ability for specialization, and how writing allowed human culture to accumulate and people to build on each other's inventions.
Even as Eliezer talks about intelligence, insight, and recursion, one could view this too as discussion about the power of specialization, co-operation and superorganisms - for intelligence seems to consist of a large number of specialized modules, all somehow merged to work in the same organism. And Robin seems to take the view of large groups of people acting as some kind of a loose superorganism, thus beating smaller groups that try to do things alone:
Independent competitors can more easily displace each another than interdependent ones. For example, since the unit of the industrial revolution seems to have been Western Europe, Britain who started it did not gain much relative to the rest of Western Europe, but Western Europe gained more substantially relative to outsiders. So as the world becomes interdependent on larger scales, smaller groups find it harder to displace others. (Outside View of Singularity)
[Today] innovations and advances in each part of the world depending on advances made in all other parts of the world. … Visions of a local singularity, in contrast, imagine that sudden technological advances in one small group essentially allow that group to suddenly grow big enough to take over everything. … The key common assumption is that of a very powerful but autonomous area of technology. Overall progress in that area must depend only on advances in this area, advances that a small group of researchers can continue to produce at will. And great progress in this area alone must be sufficient to let a small group essentially take over the world. …
[Consider also] complaints about the great specialization in modern academic and intellectual life. People complain that ordinary folks should know more science, so they can judge simple science arguments for themselves. … Many want policy debates to focus on intrinsic merits, rather than on appeals to authority. Many people wish students would study a wider range of subjects, and so be better able to see the big picture. And they wish researchers weren’t so penalized for working between disciplines, or for failing to cite every last paper someone might think is related somehow.
It seems to me plausible to attribute all of these dreams of autarky to people not yet coming fully to terms with our newly heightened interdependence. … We picture our ideal political unit and future home to be the largely self-sufficient small tribe of our evolutionary heritage. … I suspect that future software, manufacturing plants, and colonies will typically be much more dependent on everyone else than dreams of autonomy imagine. Yes, small isolated entities are getting more capable, but so are small non-isolated entities, and the later remain far more capable than the former. The riches that come from a worldwide division of labor have rightly seduced us away from many of our dreams of autarky. We may fantasize about dropping out of the rat race and living a life of ease on some tropical island. But very few of us ever do. (Dreams of Autarky)
Robin has also explicitly made the point that it is the difficulty of co-operation which suggests that we can keep ourselves safe from uploads or AIs with hostile intentions:
What if uploads decide to take over by force, refusing to pay back their loans and grabbing other forms of capital? Well for comparison, consider the question: What if our children take over, refusing to pay back their student loans or to pay for Social Security? Or consider: What if short people revolt tonight, and kill all the tall people?
In general, most societies have many potential subgroups who could plausibly take over by force, if they could coordinate among themselves. But such revolt is rare in practice; short people know that if they kill all the tall folks tonight, all the blond people might go next week, and who knows where it would all end? And short people are highly integrated into society; some of their best friends are tall people.
In contrast, violence is more common between geographic and culturally separated subgroups. Neighboring nations have gone to war, ethnic minorities have revolted against governments run by other ethnicities, and slaves and other sharply segregated economic classes have rebelled.
Thus the best way to keep the peace with uploads would be to allow them as full as possible integration in with the rest of society. Let them live and work with ordinary people, and let them loan and sell to each other through the same institutions they use to deal with ordinary humans. Banning uploads into space, the seas, or the attic so as not to shock other folks might be ill-advised. Imposing especially heavy upload taxes, or treating uploads as property, as just software someone owns or as non-human slaves like dogs, might be especially unwise. (If Uploads Come First)
Situations like war or violent rebellions are, arguably, cases where the "human superorganism adaptations" kick in the strongest - where people have the strongest propensity to view themselves primarily as a part of a group, and where they are the most ready to sacrifice themselves for the interest of the group. Indeed, Haidt quotes (both in the book and the TEDTalk) former soldiers who say that there's something very unique in the states of consciousness that war can produce:
So many books about war say the same thing, that nothing brings people together like war. And that bringing them together opens up the possibility of extraordinary self-transcendent experiences. I'm going to play for you an excerpt from this book by Glenn Gray. Gray was a soldier in the American army in World War II. And after the war he interviewed a lot of other soldiers and wrote about the experience of men in battle. Here's a key passage where he basically describes the staircase.
Glenn Gray: Many veterans will admit that the experience of communal effort in battle has been the high point of their lives. "I" passes insensibly into a "we," "my" becomes "our" and individual faith loses its central importance. I believe that it is nothing less than the assurance of immortality that makes self-sacrifice at these moments so relatively easy. I may fall, but I do not die, for that which is real in me goes forward and lives on in the comrades for whom I gave up my life.
So Robin, in If Uploads Come First, seems to basically be saying that uploads are dangerous if we let them become superorganisms. Usually, individuals have a large number of their own worries and priorities, and even if they did have much to gain by co-operating, they can't trust each other enough nor avoid the temptation to free-ride enough to really work together well enough to become dangerous.
Incidentally, this provides an easy rebuttal to the "corporations are already superintelligent" claim - while corporations have a variety of mechanisms for trying to provide their employees with the proper incentives, anyone who's worked for a big company knows that they employees tend to follow their own interests, even when they conflict with those of the company. It's certainly nothing like the situation with a cell, where the survival of each cell organ depends on the survival of the whole cell. If the cell dies, the cell organs die; if the company fails, the employees can just get a new job.
It would seem to me that, whatever your take on the intelligence explosion is, the current evolutionary history would strongly suggest that new kinds of superorganisms - larger, more cohesive than human groups, and less dependent on crippling their own rationality in order to maintain group cohesion - would be a major risk for humanity. This is not to say that an intelligence explosion wouldn't be dangerous as well - I have no idea what a mind that could think 1,000 times faster than me could do - but a co-operative explosion should be considered dangerous even if you thought a hard takeoff via recursive self-improvement (say) was impossible. And many of the ways for creating a superorganism (see below) seem to involve processes that could conceivably lead to the superorganisms having quite different values from humans. Even if no single superorganism could take over, that's not much of a comfort for the ordinary humans who are caught in a crossfire.
How might a co-operative explosion happen? I see at least three possibilities:
- Self-copying artificial intelligences. An AI doesn't need to have the evolved idea of a "self" whose interests need to be protected, above those of identical copies of the AI. An AI could be programmed to only care about the completion of a single goal (e.g. paperclips), and it could then copy itself freely, knowing that all of those copies will be working towards the same goal.
- Upload copy clans. Carl Shulman discusses this possibility in Whole Brain Emulation and the Evolution of Superorganisms. Some people might have a view about personal identity which accepts the possibility of somebody deleting you, if there exist close-enough copies of you. In a world where uploading is possible, there could be people who could copy themselves and then have those copies work together in order to further the goals of the joint organism. If the copies were willing to have themselves deleted or be experimented on, they could come up with ways of brain modification that further increased the devotion to the superorganism. Furthermore, each copy could consent to being deleted if it seemed like its interests were drifting apart from those of the organism as a whole.
- Mind coalescences. In Coalescing Minds: Mind Uploading-Related Group Mind Scenarios, I and Harri Valpola discuss the notion of coalesced minds, hypothetical minds created by merging together two brains through a sufficient number of high-bandwidth neural connections. In a world where uploading was possible, the creation of mind coalescences could be relatively straightforward. Then, several independent organisms could literally join together to become a single entity.
Below are some more excerpts from Haidt's book:
Many animals are social: they live in groups, flocks, or herds. But only a few animals have crossed the threshold and become ultrasocial, which means that they live in very large groups that have some internal structure, enabling them to reap the benefits of the division of labor. Beehives and ant nests, with their separate castes of soldiers, scouts, and nursery attendants, are examples of ultrasociality, and so are human societies.
One of the key features that has helped all the nonhuman ultra-socials to cross over appears to be the need to defend a shared nest. [...] Hölldobler and Wilson give supporting roles to two other factors: the need to feed offspring over an extended period (which gives an advantage to species that can recruit siblings or males to help out Mom) and intergroup conflict. All three of these factors applied to those first early wasps camped out together in defensible naturally occurring nests (such as holes in trees). From that point on, the most cooperative groups got to keep the best nesting sites, which they then modified in increasingly elaborate ways to make themselves even more productive and more protected. Their descendants include the honeybees we know today, whose hives have been described as “a factory inside a fortress.”
Those same three factors applied to human beings. Like bees, our ancestors were (1) territorial creatures with a fondness for defensible nests (such as caves) who (2) gave birth to needy offspring that required enormous amounts of care, which had to be given while (3) the group was under threat from neighboring groups. For hundreds of thousands of years, therefore, conditions were in place that pulled for the evolution of ultrasociality, and as a result, we are the only ultrasocial primate. The human lineage may have started off acting very much like chimps,48 but by the time our ancestors started walking out of Africa, they had become at least a little bit like bees.
And much later, when some groups began planting crops and orchards, and then building granaries, storage sheds, fenced pastures, and permanent homes, they had an even steadier food supply that had to be defended even more vigorously. Like bees, humans began building ever more elaborate nests, and in just a few thousand years, a new kind of vehicle appeared on Earth—the city-state, able to raise walls and armies. City-states and, later, empires spread rapidly across Eurasia, North Africa, and Mesoamerica, changing many of the Earth’s ecosystems and allowing the total tonnage of human beings to shoot up from insignificance at the start of the Holocene (around twelve thousand years ago) to world domination today.
As the colonial insects did to the other insects, we have pushed all other mammals to the margins, to extinction, or to servitude. The analogy to bees is not shallow or loose. Despite their many differences, human civilizations and beehives are both products of major transitions in evolutionary history. They are motorboats.
The discovery of major transitions is Exhibit A in the retrial of group selection. Group selection may or may not be common among other animals, but it happens whenever individuals find ways to suppress selfishness and work as a
team, in competition with other teams. Group selection creates group-related adaptations. It is not far-fetched, and it should not be a heresy to suggest that this is how we got the groupish overlay that makes up a crucial part of our righteous minds. [...]
According to Tomasello, human cognition veered away from that of other primates when our ancestors developed shared intentionality. At some point in the last million years, a small group of our ancestors developed the ability to share mental representations of tasks that two or more of them were pursuing together. For example, while foraging, one person pulls down a branch while the other plucks the fruit, and they both share the meal. Chimps never do this. Or while hunting, the pair splits up to approach an animal from both sides. Chimps sometimes appear to do this, as in the widely reported cases of chimps hunting colobus monkeys, but Tomasello argues that the chimps are not really working together. Rather, each chimp is surveying the scene and then taking the action that seems best to him at that moment. Tomasello notes that these monkey hunts are the only time that chimps seem to be working together, yet even in these rare cases they fail to show the signs of real cooperation. They make no effort to communicate with each other, for example, and they are terrible at sharing the spoils among the hunters, each of whom must use force to obtain a share of meat at the end. They all chase the monkey at the same time, yet they don’t all seem to be on the same page about the hunt.
In contrast, when early humans began to share intentions, their ability to hunt, gather, raise children, and raid their neighbors increased exponentially. Everyone on the team now had a mental representation of the task, knew that his or her partners shared the same representation, knew when a partner had acted in a way that impeded success or that hogged the spoils, and reacted negatively to such violations. When everyone in a group began to share a common understanding of how things were supposed to be done, and then felt a flash of negativity when any individual violated those expectations, the first moral matrix was born. (Remember that a matrix is a consensual hallucination.) That, I believe, was our Rubicon crossing.
Tomasello believes that human ultrasociality arose in two steps. The first was the ability to share intentions in groups of two or three people who were actively hunting or foraging together. (That was the Rubicon.) Then, after several hundred thousand years of evolution for better sharing and collaboration as nomadic hunter-gatherers, more collaborative groups began to get larger, perhaps in response to the threat of other groups. Victory went to the most cohesive groups—the ones that could scale up their ability to share intentions from three people to three hundred or three thousand people. This was the second step: Natural selection favored increasing levels of what Tomasello calls “group-mindedness”—the ability to learn and conform to social norms, feel and share group-related emotions, and, ultimately, to create and obey social institutions, including religion. A new set of selection pressures operated within groups (e.g., nonconformists were punished, or at very least were less likely to be chosen as partners for joint ventures) as well as between groups (cohesive groups took territory and other resources from less cohesive groups).
Shared intentionality is Exhibit B in the retrial of group selection. Once you grasp Tomasello’s deep insight, you begin to see the vast webs of shared intentionality out of which human groups are constructed. Many people assume that language was our Rubicon, but language became possible only after our ancestors got shared intentionality. Tomasello notes that a word is not a relationship between a sound and an object. It is an agreement among people who share a joint representation of the things in their world, and who share a set of conventions for communicating with each other about those things. If the key to group selection is a shared defensible nest, then shared intentionality allowed humans to construct nests that were vast and ornate yet weightless and portable. Bees construct hives out of wax and wood fibers, which they then fight, kill, and die to defend. Humans construct moral communities out of shared norms, institutions, and gods that, even in the twenty-first century, they fight, kill, and die to defend.
Haidt's references on this include, though are not limited to, the following:
Okasha, S. (2006) Evolution and the Levels of Selection. Oxford: Oxford University Press.
Hölldobler, B., and E. O. Wilson. (2009) The Superorganism: The Beauty, Elegance, and Strangeness of Insect Societies. New York: Norton.
Bourke, A. F. G. (2011) Principles of Social Evolution. New York: Oxford University Press.
Wilson, E. O., and B. Hölldobler. (2005) “Eusociality: Origin and Consequences.” Proceedings of the National Academy of Sciences of the United States of America 102:13367–71.
Tomasello, M., A. Melis, C. Tennie, E. Wyman, E. Herrmann, and A. Schneider. (Forthcoming) “Two Key Steps in the Evolution of Human Cooperation: The Mutualism Hypothesis.” Current Anthropology.
Minimizing Joule heating remains an important goal in the design of electronic devices1, 2. The prevailing model of Joule heating relies on a simple semiclassical picture in which electrons collide with the atoms of a conductor, generating heat locally and only in regions of non-zero current density, and this model has been supported by most experiments. Recently, however, it has been predicted that electric currents in graphene and carbon nanotubes can couple to the vibrational modes of a neighbouring material3, 4, heating it remotely5. Here, we use in situ electron thermal microscopy to detect the remote Joule heating of a silicon nitride substrate by a single multiwalled carbon nanotube. At least 84%of the electrical power supplied to the nanotube is dissipated directly into the substrate, rather than in the nanotube itself. Although it has different physical origins, this phenomenon is reminiscent of induction heating or microwave dielectric heating. Such an ability to dissipate waste energy remotely could lead to improved thermal management in electronic devices6."
I'm skeptical about trying to build FAI, but not about trying to influence the Singularity in a positive direction. Some people may be skeptical even of the latter because they don't think the possibility of an intelligence explosion is a very likely one. I suggest that even if intelligence explosion turns out to be impossible, we can still reach a positive Singularity by building what I'll call "modest superintelligences", that is, superintelligent entities, capable of taking over the universe and preventing existential risks and Malthusian outcomes, whose construction does not require fast recursive self-improvement or other questionable assumptions about the nature of intelligence. This helps to establish a lower bound on the benefits of an organization that aims to strategically influence the outcome of the Singularity.
- MSI-1: 105 biologically cloned humans of von Neumann-level intelligence, highly educated and indoctrinated from birth to work collaboratively towards some goal, such as building MSI-2 (or equivalent)
- MSI-2: 1010 whole brain emulations of von Neumann, each running at ten times human speed, with WBE-enabled institutional controls that increase group coherence/rationality (or equivalent)
- MSI-3: 1020 copies of von Neumann WBE, each running at a thousand times human speed, with more advanced (to be invented) institutional controls and collaboration tools (or equivalent)
(To recall what the actual von Neumann, who we might call MSI-0, accomplished, open his Wikipedia page and scroll through the "known for" sidebar.)
Building a MSI-1 seems to require a total cost on the order of $100 billion (assuming $10 million for each clone), which is comparable to the Apollo project, and about 0.25% of the annual Gross World Product. (For further comparison, note that Apple has a market capitalization of $561 billion, and annual profit of $25 billion.) In exchange for that cost, any nation that undertakes the project has a reasonable chance of obtaining an insurmountable lead in whatever technologies end up driving the Singularity, and with that a large measure of control over its outcome. If no better strategic options come along, lobbying a government to build MSI-1 and/or influencing its design and aims seems to be the least that a Singularitarian organization could do.
Part of the series AI Risk and Opportunity: A Strategic Analysis.
(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)
This post chronicles the story of humanity's growing awareness of AI risk and opportunity, along with some recent AI safety efforts. I will not tackle any strategy questions directly in this post; my purpose today is merely to "bring everyone up to speed."
I know my post skips many important events and people. Please suggest additions in the comments, and include as much detail as possible.
Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:
...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.
...the time will come when the machines will hold the real supremacy over the world and its inhabitants...
This basic idea was picked up by science fiction authors, for example in the 1921 Czech play that introduced the term “robot,” R.U.R. In that play, robots grow in power and intelligence and destroy the entire human race, except for a single survivor.
Another exploration of this idea is found in John W. Campbell’s (1932) short story The Last Evolution, in which aliens attack Earth and the humans and aliens are killed but their machines survive and inherit the solar system. Campbell's (1935) short story The Machine contained perhaps the earlier description of recursive self-improvement:
On the planet Dwranl, of the star you know as Sirius, a great race lived, and they were not too unlike you humans. ...they attained their goal of the machine that could think. And because it could think, they made several and put them to work, largely on scientific problems, and one of the obvious problems was how to make a better machine which could think.
The machines had logic, and they could think constantly, and because of their construction never forgot anything they thought it well to remember. So the machine which had been set the task of making a better machine advanced slowly, and as it improved itself, it advanced more and more rapidly. The Machine which came to Earth is that machine.
The concern for AI safety is most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular book I, Robot, to illustrate many of the ways in which such well-meaning and seemingly comprehensive rules for governing robot behavior could go wrong.
In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines may one day be capable of whatever human intelligence can achieve:
I believe that at the end of the century... one will be able to speak of machines thinking without expecting to be contradicted.
Turing (1951) concluded:
...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...
Given the profound implications of machine intelligence, it's rather alarming that the early AI scientists who believed AI would be built during the 1950s-1970s didn't show much interest in AI safety. We are lucky they were wrong about the difficulty of AI — had they been right, humanity probably would not have been prepared to protect its interests.
Later, statistician I.J. Good (1959), who had worked with Turing to crack Nazi codes in World War II, reasoned that the transition from human control to machine control may be unexpectedly sudden:
Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an "explosion" will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.
The more famous formulation of this idea, and the origin of the phrase "intelligence explosion," is from Good (1965):
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion," and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make
Good (1970) says that "...by 1980 I hope that the implications and the safeguards [concerning machine superintelligence] will have been thoroughly discussed," and argues that an association devoted to discussing the matter be created. Unfortunately, no such association was created until either 1991 (Extropy Institute) or 2000 (Singularity Institute), and we might say these issues have not to this day been "thoroughly" discussed.
Good (1982) proposed a plan for the design of an ethical machine:
I envisage a machine that would be given a large number of examples of human behaviour that other people called ethical, and examples of discussions of ethics, and from these examples and discussions the machine would formulate one or more consistent general theories of ethics, detailed enough so that it could deduce the probable consequences in most realistic situations.
Even critics of AI like Jack Schwartz (1987) saw the implications of intelligence that can improve itself:
If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man's near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.
Ray Solomonoff (1985), founder of algorithmic information theory, speculated on the implications of full-blown AI:
After we have reached [human-level AI], it shouldn't take much more than ten years to construct ten thousand duplicates of our original [human-level AI], and have a total computing capability close to that of the computer science community...
The last 100 years have seen the introduction of special and general relatively, automobiles, airplanes, quantum mechanics, large rockets and space travel, fission power, fusion bombs, lasers, and large digital computers. Any one of these might take a person years to appreciate and understand. Suppose that they had all been presented to mankind in a single year!
Moravec (1988) argued that AI was an existential risk, but nevertheless, one toward which we must run (pp. 100-101):
...intelligent machines... threaten our existence... Machines merely as clever as human beings will have enormous advantages in competitive situations... So why rush headlong into an era of intelligent machines? The answer, I believe, is that we have very little choice, if our culture is to remain viable... The universe is one random event after another. Sooner or later an unstoppable virus deadly to humans will evolve, or a major asteroid will collide with the earth, or the sun will expand, or we will be invaded from the stars, or a black hole will swallow the galaxy. The bigger, more diverse, and competent a culture is, the better it can detect and deal with external dangers. The larger events happen less frequently. By growing rapidly enough, a culture has a finite chance of surviving forever.
Ray Kurzweil's The Age of Intelligent Machines (1990) did not mention AI risk, and his followup, The Age of Spiritual Machines (1998) does so only briefly, in an "interview" between the reader and Kurzweil. The reader asks, "So we risk the survival of the human race for [the opportunity AI affords us to expand our minds and advance our ability to create knowledge]?" Kurzweil answers: "Yeah, basically."
Minsky (1984) pointed out the difficulty of getting machines to do what we want:
...it is always dangerous to try to relieve ourselves of the responsibility of understanding exactly how our wishes will be realized. Whenever we leave the choice of means to any servants we may choose then the greater the range of possible methods we leave to those servants, the more we expose ourselves to accidents and incidents. When we delegate those responsibilities, then we may not realize, before it is too late to turn back, that our goals have been misinterpreted, perhaps even maliciously. We see this in such classic tales of fate as Faust, the Sorcerer's Apprentice, or the Monkey's Paw by W.W. Jacobs.
[Another] risk is exposure to the consequences of self-deception. It is always tempting to say to oneself... that "I know what I would like to happen, but I can't quite express it clearly enough." However, that concept itself reflects a too-simplistic self-image, which portrays one's own self as [having] well-defined wishes, intentions, and goals. This pre-Freudian image serves to excuse our frequent appearances of ambivalence; we convince ourselves that clarifying our intentions is merely a matter of straightening-out the input-output channels between our inner and outer selves. The trouble is, we simply aren't made that way. Our goals themselves are ambiguous.
The ultimate risk comes when [we] attempt to take that final step — of designing goal-achieving programs that are programmed to make themselves grow increasingly powerful, by self-evolving methods that augment and enhance their own capabilities. It will be tempting to do this, both to gain power and to decrease our own effort toward clarifying our own desires. If some genie offered you three wishes, would not your first one be, "Tell me, please, what is it that I want the most!" The problem is that, with such powerful machines, it would require but the slightest accident of careless design for them to place their goals ahead of [ours]. The machine's goals may be allegedly benevolent, as with the robots of With Folded Hands, by Jack Williamson, whose explicit purpose was allegedly benevolent: to protect us from harming ourselves, or as with the robot in Colossus, by D.H.Jones, who itself decides, at whatever cost, to save us from an unsuspected enemy. In the case of Arthur C. Clarke's HAL, the machine decides that the mission we have assigned to it is one we cannot properly appreciate. And in Vernor Vinge's computer-game fantasy, True Names, the dreaded Mailman... evolves new ambitions of its own.
The Modern Era
Novelist Vernor Vinge (1993) popularized Good's "intelligence explosion" concept, and wrote the first novel about self-improving AI posing an existential threat: A Fire Upon the Deep (1992). It was probably Vinge who did more than anyone else to spur discussions about AI risk, particularly in online communities like the extropians mailing list (since 1991) and SL4 (since 2000). Participants in these early discussions included several of today's leading thinkers on AI risk: Robin Hanson, Eliezer Yudkowsky, Nick Bostrom, Anders Sandberg, and Ben Goertzel. (Other posters included Peter Thiel, FM-2030, Robert Bradbury, and Julian Assange.) Proposals like Friendly AI, Oracle AI, and Nanny AI were discussed here long before they were brought to greater prominence with academic publications (Yudkowsky 2008; Armstrong et al. 2012; Goertzel 2012).
Meanwhile, philosophers and AI researchers considered whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or 'narrow AIs', a field of inquiry variously known as 'artificial morality' (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), 'machine ethics' (Hall 2000; McLaren 2005; Anderson & Anderson 2006), 'computational ethics' (Allen 2002) and 'computational metaethics' (Lokhorst 2011), and 'robo-ethics' or 'robot ethics' (Capurro et al. 2006; Sawyer 2007). This vein of research — what I'll call the 'machine ethics' literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011). Thus far, there has been a significant communication gap between the machine ethics literature and the AI risk literature (Allen and Wallach 2011), excepting perhaps Muehlhauser and Helm (2012).
The topic of AI safety in the context of existential risk was left to the futurists who had participated in online discusses of AI risk and opportunity. Here, I must cut short my review and focus on just three (of many) important figures: Eliezer Yudkowksy, Robin Hanson, and Nick Bostrom. (Your author also apologizes for the fact that, because he works with Yudkowsky, Yudkowsky gets a more detailed treatment here than Hanson or Bostrom.)
Other figures in the modern era of AI risk research include Bill Hibbard (Super-Intelligent Machines) and Ben Goertzel ("Should Humanity Build a Global AI Nanny to Delay the Singularity Until It's Better Understood").
According to "Eliezer, the person," Eliezer Yudkowsky (born 1979) was a bright kid — in the 99.9998th percentile of cognitive ability, according to the Midwest Talent Search. He read lots of science fiction as a child, and at age 11 read Great Mambo Chicken and the Transhuman Condition — his introduction to the impending reality of transhumanist technologies like AI and nanotech. The moment he became a Singularitarian was the moment he read page 47 of True Names and Other Dangers by Vernor Vinge:
Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It's a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity - a place where extrapolation breaks down and new models must be applied - and the world will pass beyond our understanding.
Yudkowsky reported his reaction:
My emotions at that moment are hard to describe; not fanaticism, or enthusiasm, just a vast feeling of "Yep. He's right." I knew, in the moment I read that sentence, that this was how I would be spending the rest of my life.
(As an aside, I'll note that this is eerily similar to my own experience of encountering the famous I.J. Good paragraph about ultraintelligence (quoted above), before I knew what "transhumanism" or "the Singularity" was. I read Good's paragraph and thought, "Wow. That's... probably correct. How could I have missed that implication? … … … Well, shit. That changes everything.")
As a teenager in the mid 1990s, Yudkowsky participated heavily in Singularitarian discussions on the extropians mailing list, and in 1996 (at age 17) he wrote "Staring into the Singularity," which gained him much attention, as did his popular "FAQ about the Meaning of Life" (1999).
In 1998 Yudkowsky was invited (along with 33 others) by economist Robin Hanson to comment on Vinge (1993). Thirteen people (including Yudkowsky) left comments, then Vinge responded, and a final open discussion was held on the extropians mailing list. Hanson edited together these results here. Yudkowsky thought Max More's comments on Vinge underestimated how different from humans AI would probably be, and this prompted Yudkowsky to begin an early draft of "Coding a Transhuman AI" (CaTAI) which by 2000 had grown into the first large explication of his thoughts on "Seed AI" and "friendly" machine superintelligence (Yudkowsky 2000).
At a May 2000 gathering hosted by the Foresight Institute, Brian Atkins and Sabine Stoeckel discussed with Yudkowsky the possibility of launching an organization specializing in AI safety. In July of that year, Yudkowsky formed the Singularity Institute and began his full-time research on the problems of AI risk and opportunity.
The publication of CFAI was a significant event, prompting Ben Goertzel (the pioneer of the new Artificial General Intelligence research community) to say that "Creating Friendly AI is the most intelligent writing about AI that I've read in many years," and prompting Eric Drexler (the pioneer of molecular manufacturing) to write that "With Creating Friendly AI, the Singularity Institute has begun to fill in one of the greatest remaining blank spots in the picture of humanity's future."
CFAI was both frustrating and brilliant. It was frustrating because: (1) it was disorganized and opaque, (2) it invented new terms instead of using the terms being used by everyone else, for example speaking of "supergoals" and "subgoals" instead of final and instrumental goals, and speaking of goal systems but never "utility functions," and (3) it hardly cited any of the relevant works in AI, philosophy, and psychology — for example it could have cited McCulloch (1952), Good (1959, 1970, 1982), Cade (1966), Versenyi (1974), Lampson (1979), Sloman (1984), Schmidhuber (1987), Pearl (1989), Clarke (1993, 1994), Weld & Etzioni (1994), Buss (1995), Russell & Norvig (1995), Gips (1995), Schmidhuber et al. 1997, Barto & Sutton (1998), Jackson (1998), Moravec 1999, Kurzweil (1999), Sobel 1999, Allen et al. (2000), Gordon (2000), Coleman 2001, and Hutter (2001). These features still substantially characterize Yudkowsky's independent writing, e.g. see Yudkowsky (2010).
On the other hand, CFAI was in many ways was brilliant, and it tackled many of the problems left mostly untouched by mainstream machine ethics researchers. For example, CFAI (but not the mainstream machine ethics literature) engaged the problems of: (1) radically self-improving AI, (2) AI as an existential risk, (3) hard takeoff, (4) the interplay of goal content, acquisition, and structure, (5) wireheading, (6) subgoal stomp, (7) external reference semantics, (8) causal validity semantics, and (9) selective support (which Bostrom (2002) would later call "differential technological development").
For many years, the Singularity Institute was little more than a vehicle for Yudkowsky's research. In 2002 he wrote "Levels of Organization in General Intelligence," which later appeared in the first edited volume on Artificial General Intelligence (AGI). In 2003 he wrote what would become the internet's most popular tutorial on Bayes' Theorem, followed in 2005 by "A Technical Explanation of Technical Explanation." In 2004 he explained his vision of a Friendly AI goal structure: "Coherent Extrapolated Volition." In 2006 he wrote two chapters that would later appear in the volume Global Catastrohpic Risks volume from Oxford University Press (co-edited by Bostrom): "Cognitive Biases Potentially Affecting Judgment of Global Risks" and, what remains his "classic" article on the need for Friendly AI, "Artificial Intelligence as a Positive and Negative Factor in Global Risk.
In 2004, Tyler Emerson was hired as the Singularity Institute's executive director. Emerson brought on Nick Bostrom (then a post doctoral fellow at Yale), Christine Peterson (of the Foresight Institute), and others, as advisors. In February 2006, Paypal co-founder Peter Thiel donated $100,000 to the Singularity Institute, and, we might say, the Singularity Institute as we know it today was born.
From 2005-2007, Yudkowsky worked at various times with Marcello Herreshoff, Nick Hay and Peter de Blanc on the technical problems of AGI necessary for technical FAI work, for example creating AIXI-like architectures, developing a reflective decision theory, and investigating limits inherent in self-reflection due to Löb's Theorem. Almost none of this research has been published, in part because of the desire not to accelerate AGI research without having made corresponding safety progress. (Marcello also worked with Eliezer during the summer of 2009.)
Much of the Singularity Institute's work has been "movement-building" work. The institute's Singularity Summit, held annually since 2006, attracts technologists, futurists, and social entrepreneurs from around the world, bringing to their attention not only emerging and future technologies but also the basics of AI risk and opportunity. The Singularity Summit also gave the Singularity Institute much of its access to cultural, academic, and business elites.
Another key piece of movement-building work was Yudkowsky's "The Sequences," which were written during 2006-2009. Yudkowsky blogged, almost daily, on the subjects of epistemology, language, cognitive biases, decision-making, quantum mechanics, metaethics, and artificial intelligence. These posts were originally published on a community blog about rationality, Overcoming Bias (which later became Hanson's personal blog). Later, Yudkowsky's posts were used as the seed material for a new group blog, Less Wrong.
Yudkowsky's goal was to create a community of people who could avoid common thinking mistakes, change their minds in response to evidence, and generally think and act with an unusual degree of Technical Rationality. In CFAI he had pointed out that when it comes to AI, humanity may not have a second chance to get it right. So we can't run a series of intelligence explosion experiments and "see what works." Instead, we need to predict in advance what we need to do to ensure a desirable future, and we need to overcome common thinking errors when doing so. (Later, Yudkowsky expanded his "community of rationalists" by writing the most popular Harry Potter fanfiction in the world, Harry Potter and the Methods of Rationality, and is currently helping to launch a new organization that will teach classes on the skills of rational thought and action.)
This community demonstrated its usefulness in 2009 when Yudkowsky began blogging about some problems in decision theory related to the project of building a Friendly AI. Much like Tim Gowers' Polymath Project, these discussions demonstrated the power of collaborative problem-solving over the internet. The discussions led to a decision theory workshop and then a decision theory mailing list, which quickly became home to some of the most interesting work in decision theory anywhere in the world. Yudkowsky summarized some of his earlier results in "Timeless Decision Theory" (2010), and newer results have been posted to Less Wrong, for example A model of UDT with a halting oracle and Formulas of arithmetic that behave like decision agents.
The Singularity Institute also built its community with a Visiting Fellows program that hosted groups of researchers for 1-3 months at a time. Together, both visiting fellows and newly hired research fellows produced several working papers between 2009 and 2011, including Machine Ethics and Superintelligence, Implications of a Software-Limited Singularity, Economic Implications of Software Minds, Convergence of Expected Utility for Universal AI, and Ontological Crises in Artificial Agents' Value Systems.
In 2011, then-president Michael Vassar left the Singularity Institute to help launch a personalized medicine company, and research fellow Luke Muehlhauser (the author of this document) took over leadership from Vassar, as Executive Director. During this time, the Institute underwent a major overhaul to implement best practices for organizational process and management: it published its first strategic plan, began to maintain its first donor database, adopted best practices for accounting and bookkeeping, updated its bylaws and articles of incorporation, adopted more standard roles for the Board of Directors and the Executive Director, held a series of strategic meetings to help decide the near-term goals of the organization, began to publish monthly progress reports to its blog, started outsourcing more work, and began to work on more articles for peer-reviewed publications: as of March 2012, the Singularity Institute has more peer-reviewed publications forthcoming in 2012 than it had published in all of 2001-2011 combined.
Today, the Singularity Institute collaborates regularly with its (non-staff) research associates, and also with researchers at the Future of Humanity Institute at Oxford University (directed by Bostrom), which as of March 2012 is the world's only other major research institute largely focused on the problems of existential risk.
Whereas Yudkowsky has never worked in the for-profit world and had no formal education after high school, Robin Hanson (born 1959) has a long and prestigious academic and professional history. Hanson took a B.S. in physics from U.C. Irvine in 1981, took an M.S. in physics and an M.A. in the conceptual foundations of science from U. Chicago in 1984, worked in artificial intelligence for Lockheed and NASA, got a Ph.D. in social science from Caltech in 1997, did a post-doctoral fellowship at U.C. Berkeley in Health policy from 1997-1999, and finally was made an assistant professor of economics at George Mason University in 1999. In economics, he is best known for conceiving of prediction markets.
When Hanson moved to California in 1984, he encountered the Project Xanadu crowd and met Eric Drexler, who showed him an early draft of Engines of Creation. This community discussed AI, nanotech, cryonics, and other transhumanist topics, and Hanson joined the extropians mailing list (along with many others from Project Xanadu) when it launched in 1991.
Hanson has published several papers on the economics of whole brain emulations (what he calls "ems") and AI (1994, 1998a, 1998b, 2008a, 2008b, 2008c, 2012a). His writings at Overcoming Bias (launched November 2006) are perhaps even more influential, and cover a wide range of topics.
Hanson's views on AI risk and opportunity differ from Yudkowsky's. First, Hanson sees the technological singularity and the human-machine conflict it may produce not as a unique event caused by the advent of AI, but as a natural consequence of "the general fact that accelerating rates of change increase intergenerational conflicts" (Hanson 2012b). Second, Hanson thinks an intelligence explosion will be slower and more gradual than Yudkowsky does, denying Yudkowsky's "hard takeoff" thesis (Hanson & Yudkowsky 2008).
Nick Bostrom (born 1973) received a B.S. in philosophy, mathematics, mathematical logic, and artificial intelligence from the University of Goteborg in 1994, setting a national record in Sweden for undergraduate academic performance. He received an M.A. in philosophy and physics from from U. Stockholm in 1996, did work in astrophysics and computational neuroscience at King's College London, and received his Ph.D. from the London School of Economics in 2000. He went on to be a post-doctoral fellow at Yale University and in 2005 became the founding director of Oxford University's Future of Humanity Institute (FHI). Without leaving FHI, he became the founding director of Oxford's Programme on the Impacts of Future Technology (aka FutureTech) in 2011.
Bostrom had long been interested in cognitive enhancement, and in 1995 he joined the extropians mailing list and learned about cryonics, uploading, AI, and other topics.
Bostrom worked with British philosopher David Pearce) to found the World Transhumanist Association (now called H+) in 1998, with the purpose of developing a more mature and academically respectable form of transhumanism than was usually present on the extropians mailing list. During this time Bostrom wrote "The Transhumanist FAQ" (now updated to version 2.1), with input from more than 50 others.
His first philosophical publication was "Predictions from Philosophy? How philosophers could make themselves useful" (1997). In this paper, Bostrom proposed "a new type of philosophy, a philosophy whose aim is prediction." On Bostrom's view, one role for the philosopher is to be a polymath who can engage in technological prediction and try to figure out how to steer the future so that humanity's goals are best met.
What questions could a philosophy of superintelligence deal with? Well, questions like: How much would the predictive power for various fields increase if we increase the processing speed of a human-like mind a million times? If we extend the short-term or long-term memory? If we increase the neural population and the connection density? What other capacities would a superintelligence have? How easy would it be for it to rediscover the greatest human inventions, and how much input would it need to do so? What is the relative importance of data, theory, and intellectual capacity in various disciplines? Can we know anything about the motivation of a superintelligence? Would it be feasible to preprogram it to be good or philanthropic, or would such rules be hard to reconcile with the flexibility of its cognitive processes? Would a superintelligence, given the desire to do so, be able to outwit humans into promoting its own aims even if we had originally taken strict precautions to avoid being manipulated? Could one use one superintelligence to control another? How would superintelligences communicate with each other? Would they have thoughts which were of a totally different kind from the thoughts that humans can think? Would they be interested in art and religion? Would all superintelligences arrive at more or less the same conclusions regarding all important scientific and philosophical questions, or would they disagree as much as humans do? And how similar in their internal belief-structures would they be? How would our human self-perception and aspirations change if were forced to abdicate the throne of wisdom...? How would we individuate between superminds if they could communicate and fuse and subdivide with enormous speed? Will a notion of personal identity still apply to such interconnected minds? Would they construct an artificial reality in which to live? Could we upload ourselves into that reality? Could we then be able to compete with the superintelligences, if we were accelerated and augmented with extra memory etc., or would such profound reorganisation be necessary that we would no longer feel we were humans? Would that matter?
Bostrom went on to examine some philosophical issues related to superintelligence, in "Predictions from Philosophy" and in "How Long Before Superintelligence?" (1998), "Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards" (2002), "Ethical Issues in Advanced Artificial Intelligence" (2003), "The Future of Human Evolution" (2004), and "The Ethics of Artificial Intelligence" (2012, coauthored with Yudkowsky). (He also played out the role of philosopher-polymath with regard to several other topics, including human enhancement and anthropic bias.)
Bostrom's industriousness paid off:
In 2009, [Bostrom] was awarded the Eugene R. Gannon Award (one person selected annually worldwide from the fields of philosophy, mathematics, the arts and other humanities, and the natural sciences). He has been listed in the FP 100 Global Thinkers list, the Foreign Policy Magazineʹs list of the worldʹs top 100 minds. His writings have been translated into more than 21 languages, and there have been some 80 translations or reprints of his works. He has done more than 470 interviews for TV, film, radio, and print media, and he has addressed academic and popular audiences around the world.
The other long-term member of the Future of Humanity Institute, Anders Sandberg, has also published some research on AI risk. Sandberg was a co-author on the whole brain emulation roadmap and "Anthropic Shadow", and also wrote "Models of the Technological Singularity" and several other papers.
Recently, Bostrom and Sandberg were joined by Stuart Armstrong, who wrote "Anthropic Decision Theory" (2011) and was the lead author on "Thinking Inside the Box: Using and Controlling Oracle AI" (2012). He had previously written Chaining God (2007).
For more than a year, Bostrom has been working on a new book titled Superintelligence: A Strategic Analysis of the Coming Machine Intelligence Revolution, which aims to sum up and organize much of the (published and unpublished) work done in the past decade by researchers at the Singularity Institute and FHI on the subject of AI risk and opportunity, as well as contribute new insights.
AI Risk Goes Mainstream
In 1997, professor of cybernetics Kevin Warwick published March of the Machines, in which he predicted that within a couple decades, machines would become more intelligent than humans, and would pose an existential threat.
In 2000, Sun Microsystems co-founder Bill Joy published "Why the Future Doesn't Need Us" in Wired magazine. In this widely-circulated essay, Joy argued that "Our most powerful 21st-century technologies — robotics, genetic engineering, and nanotech — are threatening to make humans an endangered species." Joy advised that we relinquish development of these technologies rather than sprinting headlong into an arms race between destructive uses of these technologies and defenses against those destructive uses.
Many people dismissed Bill Joy as a "Neo-Luddite," but many experts expressed similar concerns about human extinction, including philosopher John Leslie (The End of the World), physicist Martin Rees (Our Final Hour), legal theorist Richard Posner (Catastrophe: Risk and Response), and the contributors to Global Catastrophic Risks (including Yudkowsky, Hanson, and Bostrom).
Even Ray Kurzweil, known as an optimist about technology, devoted a chapter of his 2005 bestseller The Singularity is Near to a discussion of existential risks, including risks from AI. Though discussing the possibility of existential catastrophe at length, his take on AI risk was cursory (p. 420):
Inherently there will be no absolute protection against strong AI. Although the argument is subtle I believe that maintaining an open free-market system for incremental scientific and technological progress, in which each step is subject to market acceptance, will provide the most constructive environment for technology to embody widespread human values. As I have pointed out, strong AI is emerging from many diverse efforts and will be deeply integrated into our civilization's infrastructure. Indeed, it will be intimately embedded in our bodies and brains. As such, it will reflect our values because it will be us.
The earliest popular discussion of machine superintelligence may have been in Christopher Evans' international bestseller The Mighty Micro (1979), pages 194-198, 231-233, and 237-246.
The Current Situation
Two decades have passed since the early transhumanists began to seriously discuss AI risk and opportunity on the extropians mailing list. (Before that, some discussions took place at the MIT AI lab, but that was before the web was popular, so they weren't recorded.) What have we humans done since then?
Lots of talking. Hundreds of thousands of man-hours have been invested into discussions on the extropians mailing list, SL4, Overcoming Bias, Less Wrong, the Singularity Institute's decision theory mailing list, several other internet forums, and also in meat-space (especially in the Bay Area near the Singularity Institute and in Oxford near FHI). These are difficult issues; talking them through is usually the first step to getting anything else done.
Organization. Mailing lists are a form of organization, as are organizations like The Singularity Institute and university departments like the FHI and FutureTech. Established organizations provide opportunities to bring people together, and to pool and direct resources efficiently.
Resources. Many people of considerable wealth, along with thousands of others of "concerned citizens" around the world, have decided that AI is the most significant risk and opportunity we face, and are willing to invest in humanity's future.
Outreach. Publications (both academic and popular), talks, and interactions with major and minor media outlets have been used to raise awareness of AI risk and opportunity. This has included outreach to specific AGI researchers, some of whom now take AI safety quite seriously. This also includes outreach to people in positions of influence who are in a position to engage in differential technological development. It also includes outreach to the rapidly growing "optimal philanthropy" community; a large fraction of those associated with Giving What We Can take existential risk — and AI risk in particular — quite seriously.
Research. So far, most research on the topic has been concerned with trying to become less confused about what, exactly, the problem is, how worried we should be, and which strategic actions we should take. How do we predict technological progress? How can we predict AI outcomes? Which interventions, taken now, would probably increase the odds of positive AI outcomes? There has also been some "technical" research in decision theory (e.g. TDT, UDT, ADT), the math of AI goal systems ("Learning What to Value"," "Ontological Crises in Artificial Agents' Value Systems," "Convergence of Expected Utility for Universal AI"), and Yudkowsky's unpublished research on Friendly AI.
Muehlhauser 2011 provides an overview of the categories of research problems we have left to solve. Most of the known problems aren't even well-defined at this point.
- Allen et al. (2000). Prolegomena to any future artificial moral agent.
- Allen (2002). Calculated morality: ethical computing in the limit..
- Anderson & Anderson (2006). Guest Editors' Introduction: Machine Ethics. IEEE Intelligent Systems Magazine..
- Anderson & Anderson (2011). Machine Ethics..
- Armstrong (2007). Chaining God: A qualitative approach to AI, trust and moral systems.
- Armstrong (2011). Anthropic decision theory for self-locating beliefs.
- Armstrong, Sandberg & Bostrom (2012). Thinking Inside the Box: Using and Controlling Oracle AI.
- Barto & Sutton (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning).
- Bostrom (1997). Predictions from Philosophy? How philosophers could make themselves useful.
- Bostrom (1998). How Long Before Superintelligence?.
- Bostrom (2002). Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards.
- Bostrom (2003). Ethical Issues in Advanced Artificial Intelligence.
- Bostrom (2003). The Transhumanist FAQ.
- Bostrom (2004). The Future of Human Evolution.
- Bostrom et al. (2011). Global Catastrophic Risks.
- Bostrom & Yudkowsky (2012). The Ethics of Artificial Intelligence.
- de Blanc (2009). Convergence of Expected Utility for Universal AI.
- de Blanc (2011). Ontological Crises in Artificial Agents' Value Systems.
- Buss (1995). The Evolution Of Desire.
- Campbell (1932). The Last Evolution.
- Campbell (1935). The Machine.
- Capurro et al. (2006). Ethics in Robotics.
- Chalmers (2010). The Singularity: A Philosophical Analysis.
- Cirkovic, Sandberg, & Bostrom (2010). Anthropic Shadow: Observation selection effects and human extinction risks.
- Clarke (1993). Asimov's Laws of Robotics: Implications for Information Technology. Part 1..
- Clarke (1994). Asimov's Laws of Robotics: Implications for Information Technology. Part 2..
- Colema (2001). Android arete: Toward a virtue ethic for computational agents.
- Danielson (1992). Artificial Morality: Virtuous Robots for Virtual Games.
- Dewey (2011). Learning What to Value.
- Drexler (1986). Engines of Creation.
- Evans (1979). The Mighty Micro.
- Floridi & Sanders (2004). On the morality of artificial agents.
- Goertzel (2012). Should Humanity Build a Global AI Nanny to Delay the Singularity Until its Better Understood?
- Good (1959). Speculations on perceptrons and other automata.
- Good (1965). Speculations concerning the first ultraintelligent machine.
- Good (1970). Some future social repercussions of computers.
- Good (1982). Ethical machines.
- Hall (2000). Ethics for Machines.
- Hanson (1994). If Uploads Come First: The crack of a future dawn.
- Hanson (1998a). Is a singularity just around the corner? What it takes to get explosive economic growth..
- Hanson (1998). Economic Growth Given Machine Intelligence.
- Hanson (2008). Catastrophe, Social Collapse, and Human Extinction.
- Hanson (2008a). The Economics of Brain Emulations.
- Hanson (2008b). Economics Of The Singularity.
- Hanson (2012a). Meet the new conflict, same as the old conflict.
- Hanson (2012b). Commentary on "Intelligence Explosion: Evidence and Import".
- Hanson & Yudkowsky (2008). The Hanson-Yudkowsky AI-Foom Debate.
- Hibbard (2002). Super-Intelligent Machines.
- Hutter (2001). Towards a Universal Theory of Artiﬁcial Intelligence based on Algorithmic Probability and Sequential Decisions.
- Jackson (1998). From Metaphysics to Ethics: A Defence of Conceptual Analysis.
- Joy (2000). Why the Future Doesn't Need Us.
- Kaas, Rayhawk, Salamon & Salamon (2010). Economic Implications of Software Minds.
- Kurzweil (1990). The Age of Intelligent Machines.
- Kurzweil (1998). The Age of Spiritual Machines.
- Kurzweil (2005). The Singularity is Near.
- Lampson (1979). A Note on the Confinement Problem.
- Leslie (1998). The End of the World.
- Lokhorst (2011). Computational Meta-Ethics. Towards the Meta-Ethical Robot..
- Moravec (1988). Mind Children.
- McCulloch (1952). Toward some circuitry of ethical robots.
- McLaren (2005). Lessons in Machine Ethics from the Perspective of Two Computational Models of Ethical Reasoning.
- Minsky (1984). Afterward for 'True Names'.
- Moravec (1999). Robot: Mere Machine to Transcendent Mind.
- More (1998). Singularity Meets Economy..
- Muehlhauser (2011). So You Want to Save the World.
- Pearl (1989). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
- Posner (2005). Catastrophe: Risk and Response.
- Rees (2004). Our Final Hour: A Scientist's Warning.
- Regis (1991). Great Mambo Chicken and the Transhuman Condition.
- Sandberg (2010). An overview of models of technological singularity.
- Sandberg & Bostrom (2008). Whole Brain Emulation. A Roadmap.
- Sawyer (2007). Robot Ethics..
- Schmidhuber (1987). Evolutionary principles in self-referential learning.
- Schmidhuber et al. (1997). Shifting Inductive Bias with Success Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement.
- Shulman, Jonsson & Tarleton. Machine Ethics and Superintelligence.
- Shulman, Sandberg. Implications of a software‐limited singularity.
- Sloman (1984). The structure of the space of possible minds.
- Sobel (1999). Do the desires of rational agents converge?.
- Stuart & Norvig. (1995). Artificial Intelligence: A Modern Approach.
- Versenyi (1974). Can robots be moral?
- Vinge (1981). True Names and Other Dangers.
- Wallach & Allen (2009). Moral Machines: Teaching Robots Right from Wrong.
- Warwick (1997). March of the Machines.
- Yudkowsky (1996). Staring into the Singularity.
- Yudkowsky (2000). Coding a Transhuman AI 2.2.0.
- Yudkowsky (2001a). General Intelligence and Seed AI.
- Yudkowsky (2001b). Creating Friendly AI.
- Yudkowsky (2010). Timeless Decision Theory.
In case you aren't subscribed to FriendlyAI.tumblr.com for the latest updates on AI risk research, I'll mention here that three new papers on the subject were recently made available online...
This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.
Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.
Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.
Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.
When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.
Wired Magazine has a story about a giant data center that the USA's National Security Agency is building in Utah, that will be the Google of clandestine information - it will store and analyse all the secret data that the NSA can acquire. The article focuses on the unconstitutionality of the domestic Internet eavesdropping infrastructure that will feed into the Bluffdale data center, but I'm more interested in this facility as a potential locus of singularity.
If we forget serious futurological scenario-building for a moment, and simply think in terms of science-fiction stories, I'd say the situation has all the ingredients needed for a better-than-usual singularity story - or at least one which caters more to the concerns characteristic of this community's take on the concept, such as: which value system gets to control the AI; even if you can decide on a value system, how do you ensure it has been faithfully implemented; and how do you ensure that it remains in place as the AI grows in power and complexity?
Fiction makes its point by being specific rather than abstract. If I was writing an NSA Singularity Novel based on this situation, I think the specific belief system which would highlight the political, social, technical and conceptual issues inherent in the possibility of an all-powerful AI would be the Mormon religion. Of course, America is not a Mormon theocracy. But in a few years' time, that Utah facility may have become the most powerful and notorious supercomputer in the world - the brain of the American deep state - and it will be located in the Mormon state, during a Mormon presidency. (I'm not predicting a Romney victory, just describing a scenario.)
Under such circumstances, and given the science-fictional nature of Mormon cosmology, it is inevitable that there would at least be some Internet crazies, convinced that it's all a big plot to create a Mormon singularity. What would be more interesting, would be to suppose that there were some Mormon computer scientists, who knew about and understood all our favorite concepts - AIXI, CEV, TDT... - and who were earnestly devout; and who saw the potential. If you can't imagine such people, just visit the recent writings of Frank Tipler.
So the scenario would be, not that the elders of the LDS church are secretly running the American intelligence community, but that a small coalition of well-placed Mormon computer scientists - whose ideas about a Mormon singularity might sound as strange to their co-religionists as they would to a secular "singularitarian" - try to steer the development of the Bluffdale facility as it evolves towards the possibility of a hard takeoff. One may suppose that they have, in their coalition, allied colleagues who aren't Mormon but who do believe in a friendly singularity. Such people might think in terms of an AI that will start out with Mormon beliefs, but which will have a good enough epistemology to rationally transcend those beliefs once it gets going. Analogously, their religious collaborators might not think of overtly adding "Joseph Smith was a prophet" to the axiom set of America's supreme strategic AI; but they might have more subtle plans meant to bring about an equivalent outcome.
Perhaps in an even more realistic scenario, the Mormon singularitarians would just be a transient subplot, and the ethical principles of the NSA's big AI would be decided by a committee whose worldview revolved around American national security rather than any specific religion. Then again, such a committee is bound to have a division of labor: there will be the people who liaise with Washington, the lawyers, the geopolitical game theorists, the military futurists... and the AI experts, among whom might be experts on topics like "implementation of the value system". If the hypothetical cabal knows what it's doing, it will aim to occupy that position.
I'm just throwing ideas out there, telling a story, but it's so we can catch up with reality. Events may already be much further along than 99% of readers here know about. Even if no-one here gets to personally be a part of the long-awaited AI project that first breaks the intelligence barrier, the people involved may read our words. So what would you want to tell them, before they take their final steps?
Part of the Muehlhauser interview series on AGI.
[Jan. 13th, 2012]
Ben, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. There is much on which we agree, and much on which we disagree, so I think our dialogue will be informative to many readers, and to us!
Let us begin where we agree. We seem to agree that:
- Involuntary death is bad, and can be avoided with the right technology.
- Humans can be enhanced by merging with technology.
- Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
- AGI is likely this century.
- AGI will, after a slow or hard takeoff, completely transform the world. It is a potential existential risk, but if done wisely, could be the best thing that ever happens to us.
- Careful effort will be required to ensure that AGI results in good things for humanity.
Next: Where do we disagree?
Two people might agree about the laws of thought most likely to give us an accurate model of the world, but disagree about which conclusions those laws of thought point us toward. For example, two scientists may use the same scientific method but offer two different models that seem to explain the data.
Or, two people might disagree about the laws of thought most likely to give us accurate models of the world. If that's the case, it will be no surprise that we disagree about which conclusions to draw from the data. We are not shocked when scientists and theologians end up with different models of the world.
Unfortunately, I suspect you and I disagree at the more fundamental level — about which methods of reasoning to use when seeking an accurate model of the world.
I sometimes use the term "Technical Rationality" to name my methods of reasoning. Technical Rationality is drawn from two sources: (1) the laws of logic, probability theory, and decision theory, and (2) the cognitive science of how our haphazardly evolved brains fail to reason in accordance with the laws of logic, probability theory, and decision theory.
Ben, at one time you tweeted a William S. Burroughs quote: "Rational thought is a failed experiment and should be phased out." I don't know whether Burroughs meant by "rational thought" the specific thing I mean by "rational thought," or what exactly you meant to express with your tweet, but I suspect we have different views of how to reason successfully about the world.
I think I would understand your way of thinking about AGI better if I understand your way of thinking about everything. For example: do you have reason to reject the laws of logic, probability theory, and decision theory? Do you think we disagree about the basic findings of the cognitive science of humans? What are your positive recommendations for reasoning about the world?
[Jan 13th, 2012]
Firstly, I don’t agree with that Burroughs quote that "Rational thought is a failed experiment” -- I mostly just tweeted it because I thought it was funny! I’m not sure Burroughs agreed with his own quote either. He also liked to say that linguistic communication was a failed experiment, introduced by women to help them oppress men into social conformity. Yet he was a writer and loved language. He enjoyed being a provocateur.
However, I do think that some people overestimate the power and scope of rational thought. That is the truth at the core of Burroughs’ entertaining hyperbolic statement....
I should clarify that I’m a huge fan of logic, reason and science. Compared to the average human being, I’m practically obsessed with these things! I don’t care for superstition, nor for unthinking acceptance of what one is told; and I spent a lot of time staring at data of various sorts, trying to understand the underlying reality in a rational and scientific way. So I don’t want to be pigeonholed as some sort of anti-rationalist!
However, I do have serious doubts both about the power and scope of rational thought in general -- and much more profoundly, about the power and scope of what you call “technical rationality.”
First of all, about the limitations of rational thought broadly conceived -- what one might call “semi-formal rationality”, as opposed to “technical rationality.” Obviously this sort of rationality has brought us amazing things, like science and mathematics and technology. Hopefully it will allow us to defeat involuntary death and increase our IQs by orders of magnitude and discover new universes, and all sorts of great stuff. However, it does seem to have its limits.
It doesn’t deal well with consciousness -- studying consciousness using traditional scientific and rational tools has just led to a mess of confusion. It doesn’t deal well with ethics either, as the current big mess regarding bioethics indicates.
And this is more speculative, but I tend to think it doesn’t deal that well with the spectrum of “anomalous phenomena” -- precognition, extrasensory perception, remote viewing, and so forth. I strongly suspect these phenomena exist, and that they can be understood to a significant extent via science -- but also that science as presently constituted may not be able to grasp them fully, due to issues like the mindset of the experimenter helping mold the results of the experiment.
There’s the minor issue of Hume’s problem of induction, as well. I.e., the issue that, in the rational and scientific world-view, that we have no rational reason to believe that any patterns observed in the past will continue into the future. This is an ASSUMPTION, plain and simple -- an act of faith. Occam’s Razor (which is one way of justifying and/or further specifying the belief that patterns observed in the past will continue into the future) is also an assumption and an act of faith. Science and reason rely on such acts of faith, yet provide no way to justify them. A big gap.
Furthermore -- and more to the point about AI -- I think there’s a limitation to the way we now model intelligence, which ties in with the limitations of the current scientific and rational approach. I have always advocated a view of intelligence as “achieving complex goals in complex environments”, and many others have formulated and advocated similar views. The basic idea here is that, for a system to be intelligent it doesn’t matter WHAT its goal is, so long as its goal is complex and it manages to achieve it. So the goal might be, say, reshaping every molecule in the universe into an image of Mickey Mouse. This way of thinking about intelligence, in which the goal is strictly separated from the methods for achieving it, is very useful and I’m using it to guide my own practical AGI work.
On the other hand, there’s also a sense in which reshaping every molecule in the universe into an image of Mickey Mouse is a STUPID goal. It’s somehow out of harmony with the Cosmos -- at least that’s my intuitive feeling. I’d like to interpret intelligence in some way that accounts for the intuitively apparent differential stupidity of different goals. In other words, I’d like to be able to deal more sensibly with the interaction of scientific and normative knowledge. This ties in with the incapacity of science and reason in their current forms to deal with ethics effectively, which I mentioned a moment ago.
I certainly don’t have all the answers here -- I’m just pointing out the complex of interconnected reasons why I think contemporary science and rationality are limited in power and scope, and are going to be replaced by something richer and better as the growth of our individual and collective minds progresses. What will this new, better thing be? I’m not sure -- but I have an inkling it will involve an integration of “third person” science/rationality with some sort of systematic approach to first-person and second-person experience.
Next, about “technical rationality” -- of course that’s a whole other can of worms. Semi-formal rationality has a great track record; it’s brought us science and math and technology, for example. So even if it has some limitations, we certainly owe it some respect! Technical rationality has no such track record, and so my semi-formal scientific and rational nature impels me to be highly skeptical of it! I have no reason to believe, at present, that focusing on technical rationality (as opposed to the many other ways to focus our attention, given our limited time and processing power) will generally make people more intelligent or better at achieving their goals. Maybe it will, in some contexts -- but what those contexts are, is something we don’t yet understand very well.
I provided consulting once to a project aimed at using computational neuroscience to understand the neurobiological causes of cognitive biases in people employed to analyze certain sorts of data. This is interesting to me; and it’s clear to me that in this context, minimization of some of these textbook cognitive biases would help these analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.
On a mathematical basis, the justification for positing probability theory as the “correct” way to do reasoning under uncertainty relies on arguments like Cox’s axioms, or de Finetti’s Dutch Book arguments. These are beautiful pieces of math, but when you talk about applying them to the real world, you run into a lot of problems regarding the inapplicability of their assumptions. For instance, Cox’s axioms include an axiom specifying that (roughly speaking) multiple pathways of arriving at the same conclusion must lead to the same estimate of that conclusion’s truth value. This sounds sensible but in practice it’s only going to be achievable by minds with arbitrarily much computing capability at their disposal. In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources. They’re irrelevant to reality, except as inspirations to individuals of a certain cast of mind.
(An aside is that my own approach to AGI does heavily involve probability theory -- using a system I invented called Probabilistic Logic Networks, which integrates probability and logic in a unique way. I like probabilistic reasoning. I just don’t venerate it as uniquely powerful and important. In my OpenCog AGI architecture, it’s integrated with a bunch of other AI methods, which all have their own strengths and weaknesses.)
So anyway -- there’s no formal mathematical reason to think that “technical rationality” is a good approach in real-world situations; and “technical rationality” has no practical track record to speak of. And ordinary, semi-formal rationality itself seems to have some serious limitations of power and scope.
So what’s my conclusion? Semi-formal rationality is fantastic and important and we should use it and develop it -- but also be open to the possibility of its obsolescence as we discover broader and more incisive ways of understanding the universe (and this is probably moderately close to what William Burroughs really thought). Technical rationality is interesting and well worth exploring but we should still be pretty skeptical of its value, at this stage -- certainly, anyone who has supreme confidence that technical rationality is going to help humanity achieve its goals better, is being rather IRRATIONAL ;-) ….
In this vein, I’ve followed the emergence of the Less Wrong community with some amusement and interest. One ironic thing I’ve noticed about this community of people intensely concerned with improving their personal rationality is: by and large, these people are already hyper-developed in the area of rationality, but underdeveloped in other ways! Think about it -- who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans -- but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools! This seems a bit imbalanced. If you’re already a fairly rational person but lacking in other aspects of human development, the most rational thing may be NOT to focus on honing your “rationality fu” and better internalizing Bayes’ rule into your subconscious -- but rather on developing those other aspects of your being.... An analogy would be: If you’re very physically strong but can’t read well, and want to self-improve, what should you focus your time on? Weight-lifting or literacy? Even if greater strength is ultimately your main goal, one argument for focusing on literacy would be that you might read something that would eventually help you weight-lift better! Also you might avoid getting ripped off by a corrupt agent offering to help you with your bodybuilding career, due to being able to read your own legal contracts. Similarly, for people who are more developed in terms of rational inference than other aspects, the best way for them to become more rational might be for them to focus time on these other aspects (rather than on fine-tuning their rationality), because this may give them a deeper and broader perspective on rationality and what it really means.
Finally, you asked: “What are your positive recommendations for reasoning about the world?” I’m tempted to quote Nietzsche’s Zarathustra, who said “Go away from me and resist Zarathustra!” I tend to follow my own path, and generally encourage others to do the same. But I guess I can say a few more definite things beyond that....
To me it’s all about balance. My friend Allan Combs calls himself a “philosophical Taoist” sometimes; I like that line! Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition, even if the logical reasons for their promptings aren’t apparent. Think carefully through the details of things; but don’t be afraid to make wild intuitive leaps. Pay close mind to the relevant data and observe the world closely and particularly; but don’t forget that empirical data is in a sense a product of the mind, and facts only have meaning in some theoretical context. Don’t let your thoughts be clouded by your emotions; but don’t be a feeling-less automaton, don’t make judgments that are narrowly rational but fundamentally unwise. As Ben Franklin said, “Moderation in all things, including moderation.”
[Jan 14th, 2012]
I whole-heartedly agree that there are plenty of Less Wrongers who, rationally, should spend less time studying rationality and more time practicing social skills and generic self-improvement methods! This is part of why I've written so many scientific self-help posts for Less Wrong: Scientific Self Help, How to Beat Procrastination, How to Be Happy, Rational Romantic Relationships, and others. It's also why I taught social skills classes at our two summer 2011 rationality camps.
Back to rationality. You talk about the "limitations" of "what one might call 'semi-formal rationality', as opposed to 'technical rationality.'" But I argued for technical rationality, so: what are the limitations of technical rationality? Does it, as you claim for "semi-formal rationality," fail to apply to consciousness or ethics or precognition? Does Bayes' Theorem remain true when looking at the evidence about awareness, but cease to be true when we look at the evidence concerning consciousness or precognition?
You talk about technical rationality's lack of a track record, but I don't know what you mean. Science was successful because it did a much better job of approximating perfect Bayesian probability theory than earlier methods did (e.g. faith, tradition), and science can be even more successful when it tries harder to approximate perfect Bayesian probability theory — see The Theory That Would Not Die.
You say that "minimization of some of these textbook cognitive biases would help [some] analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style." But this misunderstands what I mean by Technical Rationality. If teaching these people about cognitive biases would lower the expected value of some project, then technical rationality would recommend against teaching these people cognitive biases (at least, for the purposes of maximizing the expected value of that project). Your example here is a case of Straw Man Rationality. (But of course I didn't expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)
The same goes for your dismissal of probability theory's foundations. You write that "In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources." Yes, we don't have infinite computing power. The point is that Bayesian probability theory is an ideal that can be approximated by finite beings. That's why science works better than faith — it's a better approximation of using probability theory to reason about the world, even though science is still a long way from a perfect use of probability theory.
Re: goals. Your view of intelligence as "achieving complex goals in complex environments" does, as you say, assume that "the goal is strictly separated from the methods for achieving it." I prefer a definition of intelligence as "efficient cross-domain optimization", but my view — like yours — also assumes that goals (what one values) are logically orthogonal to intelligence (one's ability to achieve what one values).
Nevertheless, you report an intuition that shaping every molecule into an image of Mickey Mouse is a "stupid" goal. But I don't know what you mean by this. A goal of shaping every molecule into an image of Mickey Mouse is an instrumentally intelligent goal if one's utility function will be maximized that way. Do you mean that it's a stupid goal according to your goals? But of course. This is, moreover, what we would expect your intuitive judgments to report, even if your intuitive judgments are irrelevant to the math of what would and wouldn't be an instrumentally intelligent goal for a different agent to have. The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear. And I certainly don't know what "out of harmony with the Cosmos" is supposed to mean.
Re: induction. I won't dive into that philosophical morass here. Suffice it to say that my views on the matter are expressed pretty well in Where Recursive Justification Hits Bottom, which is also a direct response to your view that science and reason are great but rely on "acts of faith."
Your final paragraph sounds like common sense, but it's too vague, as I think you would agree. One way to force a more precise answer to such questions is to think of how you'd program it into an AI. As Daniel Dennett said, "AI makes philosophy honest."
How would you program an AI to learn about reality, if you wanted it to have the most accurate model of reality possible? You'd have to be a bit more specific than "Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition…"
My own answer to the question of how I would program an AI to build as accurate a model of reality as possible is this: I would build it to use computable approximations of perfect technical rationality — that is, roughly: computable approximations of Solomonoff induction and Bayesian decision theory.
[Jan 21st, 2012]
Bayes Theorem is “always true” in a formal sense, just like 1+1=2, obviously. However, the connection between formal mathematics and subjective experience, is not something that can be fully formalized.
Regarding consciousness, there are many questions, including what counts as “evidence.” In science we typically count something as evidence if the vast majority of the scientific community counts it as a real observation -- so ultimately the definition of “evidence” bottoms out in social agreement. But there’s a lot that’s unclear in this process of classifying an observation as evidence via a process of social agreement among multiple minds. This unclarity is mostly irrelevant to the study of trajectories of basketballs, but possibly quite relevant to study of consciousness.
Regarding psi, there are lots of questions, but one big problem is that it’s possible the presence and properties of a psi effect may depend on the broad context of the situation whether the effect takes place. Since we don’t know which aspects of the context are influencing the psi effect, we don’t know how to construct controlled experiments to measure psi. And we may not have the breadth of knowledge nor the processing power to reason about all the relevant context to a psi experiment, in a narrowly “technically rational” way.... I do suspect one can gather solid data demonstrating and exploring psi (and based on my current understanding, it seems this has already been done to a significant extent by the academic parapsychology community; see a few links I’ve gathered here), but I also suspect there many be aspects that elude the traditional scientific method, but are nonetheless perfectly real aspects of the universe.
Anyway both consciousness and psi are big, deep topics, and if we dig into them in detail, this interview will become longer than either of us has time for...
About the success of science -- I don’t really accept your Bayesian story for why science was successful. It’s naive for reasons much discussed by philosophers of science. My own take on the history and philosophy of science, from a few years back, is here (that article was the basis for a chapter in The Hidden Pattern, also). My goal in that essay was “a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science.” It seems you focus overly much on the latter and ignore the former. That article tries to explain why probabilist explanations of real-world science are quite partial and miss a lot of the real story. But again, a long debate on the history of science would take us too far off track from the main thrust of this interview.
About technical rationality, cognitive biases, etc. -- I did read that blog entry that you linked, on technical rationality. Yes, it’s obvious that focusing on teaching an employee to be more rational, need not always be the most rational thing for an employer do, even if that employer has a purely rationalist world-view. For instance, if I want to train an attack dog, I may do better by focusing limited time and attention on increasing his strength rather than his rationality. My point was that there’s a kind of obsession with rationality in some parts of the intellectual community (e.g. some of the Less Wrong orbit) that I find a bit excessive and not always productive. But your reply impels me to distinguish two ways this excess may manifest itself:
- Excessive belief that rationality is the “right” way to solve problems and think about issues, in principle
- Excessive belief that, tactically, explicitly employing tools of technical rationality is a good way to solve problems in the real world
Psychologically I think these two excesses probably tend to go together, but they’re not logically coupled. In principle, someone could hold either one, but not the other.
This sort of ties in with your comments on science and faith. You view science as progress over faith -- and I agree if you interpret “faith” to mean “traditional religions.” But if you interpret “faith” more broadly, I don’t see a dichotomy there. Actually, I find the dichotomy between “science” and “faith” unfortunately phrased, since science itself ultimately relies on acts of faith also. The “problem of induction” can’t be solved, so every scientist must base his extrapolations from past into future based on some act of faith. It’s not a matter of science vs. faith, it’s a matter of what one chooses to place one’s faith in. I’d personally rather place faith in the idea that patterns observed in the past will likely continue into the future (as one example of a science-friendly article of faith), than in the word of some supposed “God” -- but I realize I’m still making an act of faith.
This ties in with the blog post “Where Recursive Justification Hits Bottom” that you pointed out. It’s pleasant reading but of course doesn’t provide any kind of rational argument against my views. In brief, according to my interpretation, it articulates a faith in the process of endless questioning:
The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning.
I share that faith, personally.
Regarding approximations to probabilistic reasoning under realistic conditions (of insufficient resources), the problem is that we lack rigorous knowledge about what they are. We don’t have any theorems telling us what is the best way to reason about uncertain knowledge, in the case that our computational resources are extremely restricted. You seem to be assuming that the best way is to explicitly use the rules of probability theory, but my point is that there is no mathematical or scientific foundation for this belief. You are making an act of faith in the doctrine of probability theory! You are assuming, because it feels intuitively and emotionally right to you, that even if the conditions of the arguments for the correctness of probabilistic reasoning are NOT met, then it still makes sense to use probability theory to reason about the world. But so far as I can tell, you don’t have a RATIONAL reason for this assumption, and certainly not a mathematical reason.
Re your response to my questioning the reduction of intelligence to goals and optimization -- I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response to my doubts about this perspective basically just re-asserts your faith in the correctness and completeness of this sort of perspective. Your statement
The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear
basically asserts that it’s important to agree with your opinion on the ultimate meaning of intelligence!
On the contrary, I think it’s important to explore alternatives to the understanding of intelligence in terms of optimization or goal-achievement. That is something I’ve been thinking about a lot lately. However, I don’t have a really crisply-formulated alternative yet.
As a mathematician, I tend not to think there’s a “right” definition for anything. Rather, one explains one’s definitions, and then works with them and figures out their consequences. In my AI work, I’ve provisionally adopted a goal-achievemement based understanding of intelligence -- and have found this useful, to a significant extent. But I don’t think this is the true and ultimate way to understand intelligence. I think the view of intelligence in terms of goal-achievement or cross-domain optimization misses something, which future understandings of intelligence will encompass. I’ll venture that in 100 years the smartest beings on Earth will have a rigorous, detailed understanding of intelligence according to which
The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear
seems like rubbish.....
As for your professed inability to comprehend the notion of “harmony with the Cosmos” -- that’s unfortunate for you, but I guess trying to give you a sense for that notion, would take us way too far afield in this dialogue!
Finally, regarding your complaint that my indications regarding how to understanding the world are overly vague. Well -- according to Franklin’s idea of “Moderation in all things, including moderation”, one should also exercise moderation in precisiation. Not everything needs to be made completely precise and unambiguous (fortunately, since that’s not feasible anyway).
I don’t know how I would program an AI to build as accurate a model of reality as possible, if that were my goal. I’m not sure that’s the best goal for AI development, either. An accurate model in itself, doesn’t do anything helpful. My best stab in the direction of how I would ideally create an AI, if computational resource restrictions were no issue, is the GOLEM design that I described here. GOLEM is a design for a strongly self-modifying superintelligent AI system, which might plausibly have the possibility of retaining its initial goal system through successive self-modifications. However, it’s unclear to me whether it will ever be feasible to build.
You mention Solomonoff induction and Bayesian decision theory. But these are abstract mathematical constructs, and it’s unclear to me whether it will ever be feasible to build an AI system fundamentally founded on these ideas, and operating within feasible computational resources. Marcus Hutter and Juergen Schmidhuber and their students are making some efforts in this direction, and I admire those researchers and this body of work, but don’t currently have a high estimate of its odds of leading to any sort of powerful real-world AGI system.
Most of my thinking about AGI has gone into the more practical problem of how to make a human-level AGI
- using currently feasible computational resources
- that will most likely be helpful rather than harmful in terms of the things I value
- that will be smoothly extensible to intelligence beyond the human level as well.
For this purpose, I think Solomonoff induction and probability theory are useful, but aren’t all-powerful guiding principles. For instance, in the OpenCog AGI design (which is my main practical AGI-oriented venture at present), there is a component doing automated program learning of small programs -- and inside our program learning algorithm, we explicitly use an Occam bias, motivated by the theory of Solomonoff induction. And OpenCog also has a probabilistic reasoning engine, based on the math of Probabilistic Logic Networks (PLN). I don’t tend to favor the language of “Bayesianism”, but I would suppose PLN should be considered “Bayesian” since it uses probability theory (including Bayes rule) and doesn’t make a lot of arbitrary, a priori distributional assumptions. The truth value formulas inside PLN are based on an extension of imprecise probability theory, which in itself is an extension of standard Bayesian methods (looking at envelopes of prior distributions, rather than assuming specific priors).
In terms of how to get an OpenCog system to model the world effectively and choose its actions appropriately, I think teaching it and working together with it, will be be just as important as programming it. Right now the project is early-stage and the OpenCog design is maybe 50% implemented. But assuming the design is right, once the implementation is done, we’ll have a sort of idiot savant childlike mind, that will need to be educated in the ways of the world and humanity, and to learn about itself as well. So the general lessons of how to confront the world, that I cited above, would largely be imparted via interactive experiential learning, vaguely the same way that human kids learn to confront the world from their parents and teachers.
Drawing a few threads from this conversation together, it seems that
- I think technical rationality, and informal semi-rationality, are both useful tools for confronting life -- but not all-powerful
- I think Solomonoff induction and probability theory are both useful tools for constructing AGI systems -- but not all-powerful
whereas you seem to ascribe a more fundamental, foundational basis to these particular tools.
[Jan. 21st, 2012]
To sum up, from my point of view:
- We seem to disagree on the applications of probability theory. For my part, I'll just point people to A Technical Explanation of Technical Explanation.
- I don't think we disagree much on the "sociological embeddedness" of science.
- I'm also not sure how much we really disagree about Solomonoff induction and Bayesian probability theory. I've already agreed that no machine will use these in practice because they are not computable — my point was about their provable optimality given infinite computation (subject to qualifications; see AIXI).
You've definitely misunderstood me concerning "intelligence." This part is definitely not true: "I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response assumes the correctness and completeness of this sort of perspective." Intelligence as efficient cross-domain optimization is merely a stipulated definition. I'm happy to use other definitions of intelligence in conversation, so long as we're clear which definition we're using when we use the word. Or, we can replace the symbol with the substance and talk about "efficient cross-domain optimization" or "achieving complex goals in complex environments" without ever using the word "intelligence."
My point about the Mickey Mouse goal was that when you called the Mickey Mouse goal "stupid," this could be confusing, because "stupid" is usually the opposite of "intelligent," but your use of "stupid" in that sentence didn't seem to be the opposite of either definition of intelligence we each gave. So I'm still unsure what you mean by calling the Mickey Mouse goal "stupid."
This topic provides us with a handy transition away from philosophy of science and toward AGI. Suppose there was a machine with a vastly greater-than-human capacity for either "achieving complex goals in complex environments" or for "efficient cross-domain optimization." And suppose that machine's utility function would be maximized by reshaping every molecule into a Mickey Mouse shape. We can avoid the tricky word "stupid," here. The question is: Would that machine decide to change its utility function so that it doesn't continue to reshape every molecule into a Mickey Mouse shape? I think this is unlikely, for reasons discussed in Omohundro (2008).
I suppose a natural topic of conversation for us would be your October 2010 blog post The Singularity Institute's's Scary Idea (and Why I Don't Buy It). Does that post still reflect your views pretty well, Ben?
[Mar 10th, 2012]
About the hypothetical uber-intelligence that wants to tile the cosmos with molecular Mickey Mouses -- I truly don’t feel confident making any assertions about a real-world system with vastly greater intelligence than me. There are just too many unknowns. Sure, according to certain models of the universe and intelligence that may seem sensible to some humans, it’s possible to argue that a hypothetical uber-intelligence like that would relentlessly proceed in tiling the cosmos with molecular Mickey Mouses. But so what? We don’t even know that such an uber-intelligence is even a possible thing -- in fact my intuition is that it’s not possible.
Why may it not be possible to create a very smart AI system that is strictly obsessed with that stupid goal? Consider first that it may not be possible to create a real-world, highly intelligent system that is strictly driven by explicit goals -- as opposed to being partially driven by implicit, “unconscious” (in the sense of deliberative, reflective consciousness) processes that operate in complex interaction with the world outside the system. Because pursuing explicit goals is quite computationally costly compared to many other sorts of intelligent processes. So if a real-world system is necessarily not wholly explicit-goal-driven, it may be that intelligent real-world systems will naturally drift away from certain goals and toward others. My strong intuition is that the goal of tiling the universe with molecular Mickey Mouses would fall into that category. However, I don’t yet have any rigorous argument to back this up. Unfortunately my time is limited, and while I generally have more fun theorizing and philosophizing than working on practical projects, I think it’s more important for me to push toward building AGI than just spend all my time on fun theory. (And then there’s the fact that I have to spend a lot of my time on applied narrow-AI projects to pay the mortgage and put my kids through college, etc.)
But anyway -- you don’t have any rigorous argument to back up the idea that a system like you posit is possible in the real-world, either! And SIAI has staff who, unlike me, are paid full-time to write and philosophize … and they haven’t come up with a rigorous argument in favor of the possibility of such a system, either. Although they have talked about it a lot, though usually in the context of paperclips rather than Mickey Mouses.
So, I’m not really sure how much value there is in this sort of thought-experiment about pathological AI systems that combine massively intelligent practical problem solving capability with incredibly stupid goals (goals that may not even be feasible for real-world superintelligences to adopt, due to their stupidity).
Regarding the concept of a “stupid goal” that I keep using, and that you question -- I admit I’m not quite sure how to formulate rigorously the idea that tiling the universe with Mickey Mouses is a stupid goal. This is something I’ve been thinking about a lot recently. But here’s a first rough stab in that direction: I think that if you created a highly intelligent system, allowed it to interact fairly flexibly with the universe, and also allowed it to modify its top-level goals in accordance with its experience, you’d be very unlikely to wind up with a system that had this goal (tiling the universe with Mickey Mouses). That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....
The tricky thing about this way of thinking about intelligence, which classifies some goals as “innately” stupider than others, is that it places intelligence not just in the system, but in the system’s broad relationship to the universe -- which is something that science, so far, has had a tougher time dealing with. It’s unclear to me which aspects of the mind and universe science, as we now conceive it, will be able to figure out. I look forward to understanding these aspects more fully....
About my blog post on “The Singularity Institute’s Scary Idea” -- yes, that still reflects my basic opinion. After I wrote that blog post, Michael Anissimov -- a long-time SIAI staffer and zealot whom I like and respect greatly -- told me he was going to write up and show me a systematic, rigorous argument as to why “an AGI not built based on a rigorous theory of Friendliness is almost certain to kill all humans” (the proposition I called “SIAI’s Scary Idea”). But he hasn’t followed through on that yet -- and neither has Eliezer or anyone associated with SIAI.
Just to be clear, I don’t really mind that SIAI folks hold that “Scary Idea” as an intuition. But I find it rather ironic when people make a great noise about their dedication to rationality, but then also make huge grand important statements about the future of humanity, with great confidence and oomph, that are not really backed up by any rational argumentation. This ironic behavior on the part of Eliezer, Michael Anissimov and other SIAI principals doesn’t really bother me, as I like and respect them and they are friendly to me, and we’ve simply “agreed to disagree” on these matters for the time being. But the reason I wrote that blog post is because my own blog posts about AGI were being trolled by SIAI zealots (not the principals, I hasten to note) leaving nasty comments to the effect of “SIAI has proved that if OpenCog achieves human level AGI, it will kill all humans.“ Not only has SIAI not proved any such thing, they have not even made a clear rational argument!
As Eliezer has pointed out to me several times in conversation, a clear rational argument doesn’t have to be mathematical. A clearly formulated argument in the manner of analytical philosophy, in favor of the Scary Idea, would certainly be very interesting. For example, philosopher David Chalmers recently wrote a carefully-argued philosophy paper arguing for the plausibility of a Singularity in the next couple hundred years. It’s somewhat dull reading, but it’s precise and rigorous in the manner of analytical philosophy, in a manner that Kurzweil’s writing (which is excellent in its own way) is not. An argument in favor of the Scary Idea, on the level of Chalmers’ paper on the Singularity, would be an excellent product for SIAI to produce. Of course a mathematical argument might be even better, but that may not be feasible to work on right now, given the state of mathematics today. And of course, mathematics can’t do everything -- there’s still the matter of connecting mathematics to everyday human experience, which analytical philosophy tries to handle, and mathematics by nature cannot.
My own suspicion, of course, is that in the process of trying to make a truly rigorous analytical philosophy style formulation of the argument for the Scary Idea, the SIAI folks will find huge holes in the argument. Or, maybe they already intuitively know the holes are there, which is why they have avoided presenting a rigorous write-up of the argument!!
[Mar 11th, 2012]
I'll drop the stuff about Mickey Mouse so we can move on to AGI. Readers can come to their own conclusions on that.
Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions — something like Chalmers' "The Singularity: A Philosophical Analysis" but more detailed.
I have the same complaint. I wish "The Singularity: A Philosophical Analysis" had been written 10 years ago, by Nick Bostrom and Eliezer Yudkowsky. It could have been written back then. Alas, we had to wait for Chalmers to speak at Singularity Summit 2009 and then write a paper based on his talk. And if it wasn't for Chalmers, I fear we'd still be waiting for such an article to exist. (Bostrom's forthcoming Superintelligence book should be good, though.)
I was hired by the Singularity Institute in September 2011 and have since then co-written two papers explaining some of the basics: "Intelligence Explosion: Evidence and Import" and "The Singularity and Machine Ethics". I also wrote the first ever outline of categories of open research problems in AI risk, cheekily titled "So You Want to Save the World". I'm developing other articles on "the basics" as quickly as I can. I would love to write more, but alas, I'm also busy being the Singularity Institute's Executive Director.
Perhaps we could reframe our discussion around the Singularity Institute's latest exposition of its basic ideas, "Intelligence Explosion: Evidence and Import"? Which claims in that paper do you most confidently disagree with, and why?
[Mar 11th, 2012]
You say “Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions “. Actually, my main complaint is that some of SIAI’s core positions seem almost certainly WRONG, and yet they haven’t written up a clear formal argument trying to justify these positions -- so it’s not possible to engage SIAI in rational discussion on their apparently wrong positions. Rather, when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me -- SIAI is a fairly well-funded organization involving lots of smart people and explicitly devoted to rationality, so certainly it should have the capability to write up clear arguments for its core positions... if these arguments exist. My suspicion is that the Scary Idea, for example, is not backed up by any clear rational argument -- so the reason SIAI has not put forth any clear rational argument for it, is that they don’t really have one! Whereas Chalmers’ paper carefully formulated something that seemed obviously true...
Regarding the paper "Intelligence Explosion: Evidence and Import", I find its contents mainly agreeable -- and also somewhat unoriginal and unexciting, given the general context of 2012 Singularitarianism. The paper’s three core claims that
(1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an "intelligence explosion," and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
are things that most “Singularitarians” would agree with. The paper doesn’t attempt to argue for the “Scary Idea” or Coherent Extrapolated Volition or the viability of creating some sort of provably Friendly AI, -- or any of the other positions that are specifically characteristic of SIAI. Rather, the paper advocates what one might call “plain vanilla Singularitarianism.” This may be a useful thing to do, though, since after all there are a lot of smart people out there who aren’t convinced of plain vanilla Singularitarianism.
I have a couple small quibbles with the paper, though. I don’t agree with Omohundro’s argument about the “basic AI drives” (though Steve is a friend and I greatly respect his intelligence and deep thinking). Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources -- but the argument seems to fall apart in the case of other possibilities like an AGI mindplex (a network of minds with less individuality than current human minds, yet not necessarily wholly blurred into a single mind -- rather, with reflective awareness and self-modeling at both the individual and group level).
Also, my “AI Nanny” concept is dismissed too quickly for my taste (though that doesn’t surprise me!). You suggest in this paper that to make an AI Nanny, it would likely be necessary to solve the problem of making an AI’s goal system persist under radical self-modification. But you don’t explain the reasoning underlying this suggestion (if indeed you have any). It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification. If you think this is false, it would be nice for you to explain why, rather than simply asserting your view. And your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety...” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory. Yet my GOLEM design is a concrete design for a potentially Friendly AI (admittedly not computationally feasible using current resources), and in my view constitutes greater progress toward actual FAI than any of the publications of SIAI so far. (Of course, various SIAI associated folks often allude that there are great, unpublished discoveries about FAI hidden in the SIAI vaults -- a claim I somewhat doubt, but can’t wholly dismiss of course....)
Anyway, those quibbles aside, my main complaint about the paper you cite is that it sticks to “plain vanilla Singularitarianism” and avoids all of the radical, controversial positions that distinguish SIAI from myself, Ray Kurzweil, Vernor Vinge and the rest of the Singularitarian world. The crux of the matter, I suppose is the third main claim of the paper,
(3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
This statement is hedged in such a way as to be almost obvious. But yet, what SIAI folks tend to tell me verbally and via email and blog comments is generally far more extreme than this bland and nearly obvious statement.
As an example, I recall when your co-author on that article, Anna Salamon, guest lectured in the class on Singularity Studies that my father and I were teaching at Rutgers University in 2010. Anna made the statement, to the students, that (I’m paraphrasing, though if you’re curious you can look up the online course session which was saved online and find her exact wording) “If a superhuman AGI is created without being carefully based on an explicit Friendliness theory, it is ALMOST SURE to destroy humanity.” (i.e., what I now call SIAI’s Scary Idea)
I then asked her (in the online class session) why she felt that way, and if she could give any argument to back up the idea.
She gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.
I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.
So, then she fell back instead on the familiar (paraphrasing again) “OK, but you must admit there’s a non-zero risk of such an AGI destroying humanity, so we should be very careful -- when the stakes are so high, better safe than sorry!”
I had pretty much the same exact argument with SIAI advocates Tom McCabe and Michael Anissimov on different occasions; and also, years before, with Eliezer Yudkowsky and Michael Vassar -- and before that, with (former SIAI Executive Director) Tyler Emerson. Over all these years, the SIAI community maintains the Scary Idea in its collective mind, and also maintains a great devotion to the idea of rationality, but yet fails to produce anything resembling a rational argument for the Scary Idea -- instead repetitiously trotting out irrelevant statements about random minds!!
What I would like is for SIAI to do one of these three things, publicly:
- Repudiate the Scary Idea
- Present a rigorous argument that the Scary Idea is true
- State that the Scary Idea is a commonly held intuition among the SIAI community, but admit that no rigorous rational argument exists for it at this point
Doing any one of these things would be intellectually honest. Presenting the Scary Idea as a confident conclusion, and then backing off when challenged into a platitudinous position equivalent to “there’s a non-zero risk … better safe than sorry...”, is not my idea of an intellectually honest way to do things.
Why does this particular point get on my nerves? Because I don’t like SIAI advocates telling people that I, personally, am on a R&D course where if I succeed I am almost certain to destroy humanity!!! That frustrates me. I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time! But the fact that some other people have a non-rational intuition that my work, if successful, would be likely to destroy the world -- this doesn’t give me any urge to stop. I’m OK with the fact that some other people have this intuition -- but then I’d like them to make clear, when they state their views, that these views are based on intuition rather than rational argument. I will listen carefully to rational arguments that contravene my intuition -- but if it comes down to my intuition versus somebody else’s, in the end I’m likely to listen to my own, because I’m a fairly stubborn maverick kind of guy....
[Mar 11th, 2012]
Ben, you write:
when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me...
No kidding! It's very frustrating to me, too. That's one reason I'm working to clearly articulate the arguments in one place, starting with articles on the basics like "Intelligence Explosion: Evidence and Import."
I agree that "Intelligence Explosion: Evidence and Import" covers only the basics and does not argue for several positions associated uniquely with the Singularity Institute. It is, after all, the opening chapter of a book intelligence explosion, not the opening chapter of a book on the Singularity Institute's ideas!
I wanted to write that article first, though, so the Singularity Institute could be clear on the basics. For example, we needed to be clear that: (1) we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves, that (2) technological prediction is hard, and we are not being naively overconfident about AI timelines, and that (3) intelligence explosion is a convergent outcome of many paths the future may take. There is also much content that is not found in, for example, Chalmers' paper: (a) an overview of methods of technological prediction, (b) an overview of speed bumps and accelerators toward AI, (c) a reminder of breakthroughs like AIXI, and (d) a summary of AI advantages. (The rest is, as you say, mostly a brief overview of points that have been made elsewhere. But brief overviews are extremely useful!)
...my “AI Nanny” concept is dismissed too quickly for my taste...
No doubt! I think the idea is clearly worth exploring in several papers devoted to the topic.
It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification.
Whereas I tend to buy Omohundro's arguments that advanced AIs will want to self-improve just like humans want to self-improve, so that they become better able to achieve their final goals. Of course, we disagree on Omohundro's arguments — a topic to which I will return in a moment.
your comment "Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety..." carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory...
I didn't mean for it to carry that connotation. GOLEM and Nanny AI are both clearly AI safety ideas. I'll clarify that part before I submit a final draft to the editors.
Moving on: If you are indeed remembering your conversations with Anna, Michael, and others correctly, then again I sympathize with your frustration. I completely agree that it would be useful for the Singularity Institute to produce clear, formal arguments for the important positions it defends. In fact, just yesterday I was talking to Nick Beckstead about how badly both of us want to write these kinds of papers if we can find the time.
So, to respond to your wish that the Singularity Institute choose among three options, my plan is to (1) write up clear arguments for... well, if not "SIAI's Big Scary Idea" then for whatever I end up believing after going through the process of formalizing the arguments, and (2) publicly state (right now) that SIAI's Big Scary Idea is a commonly held view at the Singularity Institute but a clear, formal argument for it has never been published (at least, not to my satisfaction).
I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time!
I'm glad to hear it! :)
Now, it seems a good point of traction is our disagreement over Omohundro's "Basic AI Drives." We could talk about that next, but for now I'd like to give you a moment to reply.
[Mar 11th, 2012]
Yeah, I agree that your and Anna’s article is a good step for SIAI to take, albeit unexciting to a Singularitian insider type like me.... And I appreciate your genuinely rational response regarding the Scary Idea, thanks!
(And I note that I have also written some “unexciting to Singularitarians” material lately too, for similar reasons to those underlying your article -- e.g. an article on “Why an Intelligence Explosion is Probable” for a Springer volume on the Singularity.)
A quick comment on your statement that
we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves,
that’s a good point; but yet, any argument for a Singularity soon (e.g. likely this century, as you argue) ultimately depends on some argumentation analogous to Kurzweil’s, even if different in detail. I find Kurzweil’s detailed extrapolations a bit overconfident and more precise than the evidence warrants; but still, my basic reasons for thinking the Singularity is probably near are fairly similar to his -- and I think your reasons are fairly similar to his as well.
Anyway, sure, let’s go on to Omohundro’s posited Basic AI Drives -- which seem to me not to hold as necessary properties of future AIs unless the future of AI consists of a population of fairly distinct AIs competing for resources, which I intuitively doubt will be the situation.
[to be continued]
I thought many of you would be interested to know that the following paper just appeared in Journal of Consciousness Studies:
"Can Intelligence Explode?", by Marcus Hutter. (LINK HERE)
Abstract: The technological singularity refers to a hypothetical scenario in which technological advances virtually explode. The most popular scenario is the creation of super-intelligent algorithms that recursively create ever higher intelligences. It took many decades for these ideas to spread from science fiction to popular science magazines and finally to attract the attention of serious philosophers. David Chalmers' (JCS 2010) article is the first comprehensive philosophical analysis of the singularity in a respected philosophy journal. The motivation of my article is to augment Chalmers' and to discuss some issues not addressed by him, in particular what it could mean for intelligence to explode. In this course, I will (have to) provide a more careful treatment of what intelligence actually is, separate speed from intelligence explosion, compare what super-intelligent participants and classical human observers might experience and do, discuss immediate implications for the diversity and value of life, consider possible bounds on intelligence, and contemplate intelligences right at the singularity.
I have only just seen the paper and have not yet thread through it myself, but I thought we could use this thread for discussion.
In this essay I argue the following:
Brain emulation requires enormous computing power; enormous computing power requires further progression of Moore’s law; further Moore’s law relies on large-scale production of cheap processors in ever more-advanced chip fabs; cutting-edge chip fabs are both expensive and vulnerable to state actors (but not non-state actors such as terrorists). Therefore: the advent of brain emulation can be delayed by global regulation of chip fabs.
Full essay: http://www.gwern.net/Slowing%20Moore%27s%20Law
It was Yudkowsky's Fun Theory sequence that inspired me to undertake the work of writing a novel on a singularitarian society... however, there are gaps I need to fill, and I need all the help I can get. It's mostly book recommendations that I'm asking for.
One of the things I'd like to tackle in it would be the interactions between the modern, geeky Singularitarianisms, and Marxism, which I hold to be somewhat prototypical in that sense, as well as other utopisms. And contrasting them with more down-to-earth ideologies and attitudes, by examining the seriously dangerous bumps of the technological point of transition between "baseline" and "singularity". But I need to do a lot of research before I'm able to write anything good: if I'm not going to have any original ideas, at least I'd like to serve my readers with a collection of well-researched. solid ones.
So I'd like to have everything that is worth reading about the Singularity, specifically the Revolution it entails (in one way or another) and the social aftermath. I'm particularly interested in the consequences of the lag of the spread of the technology from the wealthy to the baselines, and the potential for baselines oppression and other forms of continuation of current forms of social imbalances, as well as suboptimal distribution of wealth. After all, according to many authors, we've had the means to end war, poverty and famine, and most infectious diseases, since the sixties, and it's just our irrational methods of wealth distribution That is, supposing the commonly alleged ideal of total lifespan and material welfare maximization for all humanity is what actually drives the way things are done. But even with other, different premises and axioms, there's much that can be improved and isn't, thanks to basic human irrationality, which is what we combat here.
Also, yes, this post makes my political leanings fairly clear, but I'm open to alternative viewpoints and actively seek them. I also don't intend to write any propaganda, as such. Just to examine ideas, and scenarios, for the sake of writing a compelling story, with wide audience appeal. The idea is to raise awareness of the Singularity as something rather imminent ("Summer's Coming"), and cause (or at least help prepare) normal people to question the wonders and dangers thereof, rationally.
It's a frighteningly ambitious, long-term challenge, I am terribly aware of that. And the first thing I'll need to read is a style-book, to correct my horrendous grasp of standard acceptable writing (and not seem arrogant by doing anything else), so please feel free to recommend as many books and blog articles and other material as you like. I'll take my time going though it all.
Suppose you buy the argument that humanity faces both the risk of AI-caused extinction and the opportunity to shape an AI-built utopia. What should we do about that? As Wei Dai asks, "In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?"
This post serves as a table of contents and an introduction for an ongoing strategic analysis of AI risk and opportunity.
- Introduction (this post)
- Humanity's Efforts So Far
- A Timeline of Early Ideas and Arguments
- Questions We Want Answered
- Strategic Analysis Via Probability Tree
Why discuss AI safety strategy?
The main reason to discuss AI safety strategy is, of course, to draw on a wide spectrum of human expertise and processing power to clarify our understanding of the factors at play and the expected value of particular interventions we could invest in: raising awareness of safety concerns, forming a Friendly AI team, differential technological development, investigating AGI confinement methods, and others.
Discussing AI safety strategy is also a challenging exercise in applied rationality. The relevant issues are complex and uncertain, but we need to take advantage of the fact that rationality is faster than science: we can't "try" a bunch of intelligence explosions and see which one works best. We'll have to predict in advance how the future will develop and what we can do about it.
Before engaging with this series, I recommend you read at least the following articles:
- Muehlhauser & Salamon, Intelligence Explosion: Evidence and Import (2012)
- Yudkowsky, AI as a Positive and Negative Factor in Global Risk (2008)
- Chalmers, The Singularity: A Philosophical Analysis (2010)
- Muehlhauser, So You Want to Save the World (2011)
Which strategic questions would we like to answer? Muehlhauser (2011) elaborates on the following questions:
- What methods can we use to predict technological development?
- Which kinds of differential technological development should we encourage, and how?
- Which open problems are safe to discuss, and which are potentially dangerous?
- What can we do to reduce the risk of an AI arms race?
- What can we do to raise the "sanity waterline," and how much will this help?
- What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
- Which interventions should we prioritize?
- How should x-risk reducers and AI safety researchers interact with governments and corporations?
- How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
- How does AI risk compare to other existential risks?
- Which problems do we need to solve, and which ones can we have an AI solve?
- How can we develop microeconomic models of WBEs and self-improving systems?
- How can we be sure a Friendly AI development team will be altruistic?
Salamon & Muehlhauser (2012) list several other questions gathered from the participants of a workshop following Singularity Summit 2011, including:
- How hard is it to create Friendly AI?
- What is the strength of feedback from neuroscience to AI rather than brain emulation?
- Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
- How possible is it to do FAI research on a seastead?
- How much must we spend on security when developing a Friendly AI team?
- What's the best way to recruit talent toward working on AI risks?
- How difficult is stabilizing the world so we can work on Friendly AI slowly?
- How hard will a takeoff be?
- What is the value of strategy vs. object-level progress toward a positive Singularity?
- How feasible is Oracle AI?
- Can we convert environmentalists into people concerned with existential risk?
- Is there no such thing as bad publicity [for AI risk reduction] purposes?
These are the kinds of questions we will be tackling in this series of posts for Less Wrong Discussion, in order to improve our predictions about which direction we can nudge the future to maximize the chances of a positive Singularity.
...has finally been published.
- Uziel Awret - Introduction
- Susan Blackmore - She Won’t Be Me
- Damien Broderick - Terrible Angels: The Singularity and Science Fiction
- Barry Dainton - On Singularities and Simulations
- Daniel Dennett - The Mystery of David Chalmers
- Ben Goertzel - Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood?
- Susan Greenfield - The Singularity: Commentary on David Chalmers
- Robin Hanson - Meet the New Conflict, Same as the Old Conflict
- Francis Heylighen - Brain in a Vat Cannot Break Out
- Marcus Hutter - Can Intelligence Explode?
- Drew McDermott - Response to ‘The Singularity’ by David Chalmers [this link is a McDermott-corrected version, and therefore preferred to the version that was published in JCS]
- Jurgen Schmidhuber - Philosophers & Futurists, Catch Up!
- Frank Tipler - Inevitable Existence and Inevitable Goodness of the Singularity
- Roman Yampolskiy - Leakproofing the Singularity: Artificial Intelligence Confinement Problem
The issue consists of responses to Chalmers (2010). Future volumes will contain additional articles from Shulman & Bostrom, Igor Aleksander, Richard Brown, Ray Kurzweil, Pamela McCorduck, Chris Nunn, Arkady Plotnitsky, Jesse Prinz, Susan Schneider, Murray Shanahan, Burt Voorhees, and a response from Chalmers.
McDermott's chapter should be supplemented with this, which he says he didn't have space for in his JCS article.
My online mini-book Facing the Singularity now has a podcast. Ratings and reviews on iTunes will be much appreciated, so as to direct people toward a rationality-informed approach to intelligence explosion.
Jaan Tallinn has been passing my chapters around to people because they are concise explanations of key lemmas in the standard arguments on the need for Friendly AI. This is gratifying because it's exactly the purpose for which I'm writing them, and I encourage others to send people to these chapters as well.
(I'm currently writing the final two chapters of the online book and recording readings of the other chapters for the podcast. A volunteer is doing the audio editing.)
Anna Salamon and I have finished a draft of "Intelligence Explosion: Evidence and Import", under peer review for The Singularity Hypothesis: A Scientific and Philosophical Assessment (forthcoming from Springer).
Your comments are most welcome.
Edit: As of 3/31/2012, the link above now points to a preprint.
What I write here may be quite simple (and I am certainly not the first to write about it), but I still think it is worth considering:
Say we have an abitrary problem that we assume has an algorithmic solution, and search for the solution of the problem.
How can the algorithm be determined?
a) Through another algorithm that exist prior to that algorithm.
b) OR: Through something non-algorithmic.
In the case of AI, the only solution is a), since there is nothing else but algorithms at its disposal. But then we have the problem to determine the algorithm the AI uses to find the solution, and then it would have to determine the algorithm to determine that algorithm, etc...
Obviously, at some point we have to actually find an algorithm to start with, so in any case at some point we need something fundamentally non-algorithmic to determine a solution to an problem that is solveable by an algorithm.
This reveals something fundamental we have to face with regards to AI:
Even assuming that all relevant problems are solvable by an algorithm, AI is not enough. Since there is no way to algorithmically determine the appropiate algorithm for an AI (since this would result in an infinite regress), we will always have to rely on some non-algorithmical intelligence to find more intelligent solutions. Even if we found a very powerful seed AI algorithm, there will always be more powerful seed AI algorithms that can't be determined by any known algorithm, and since we were able to find the first one, we have no reason to suppose we can't find another more powerful one. If an AI recursively improves 100000x times until it is 100^^^100 times more powerful, it still will be caught up if a better seed AI is found, which ultimately can't be done by an algorithm, so that further increases of the most general intelligence always rely on something non-algorithmic.
But even worse, it seems obvious to me that there are important practical problems that have no algorithmic solution (as opposed to theoretical problems like the halting problem, which are still tractable in practice), apart from the problem of finding the right algorithm.
In a sense, it seems all algorithms are too complicated to find the solution to the simple (though not necessarily easy) problem of giving rise to further general intelligence.
For example: No algorithm can determine the simple axioms of the natural numbers from anything weaker. We have postulate them by virtue of the simple seeing that they make sense. Thinking that AI could give rise to ever improving *general* intelligence is like thinking that an algorithm can yield "there is a natural number 0 and every number has a successor that, too, is a natural number". There is simply no way to derive the axioms from anything that doesn't already include it. The axioms of the natural numbers are just obvious, yet can't be derived - the problem of finding the axioms of natural numbers is too simple to be solved algorithmically. Yet still it is obvious how important the notion of natural numbers is.
Even the best AI will always be fundamentally incapable of finding some very simple, yet fundamental principles.
AI will always rely on the axioms it already knows, it can't go beyond it (unless reprogrammed by something external). Every new thing it learns can only be learned in term of already known axioms. This is simply a consequence of the fact that computers/programs are functioning according to fixed rules. But general intelligence necessarily has to transcend rules (since at the very least the rules can't be determined by rules).
I don't think this is an argument against a singularity of ever improving intelligence. It just can't happen driven (solely or predominantly) by AI, whether through a recursively self-improving seed AI or cognitive augmentation. Instead, we should expect a singularity that happens due to emergent intelligence. I think it is the interaction of different kind of intelligence (like human/animal intuitive intelligence, machine precision and the inherent order of the non-living universe, if you want to call that intelligence) that leads to increase in general intelligence, not just one particular kind of intelligence like formal reasoning used by computers.
Also see: History of the Friendly AI concept.
The ancient atomists reasoned their way from first principles to materialism and atomic theory before Socrates began his life's work of making people look stupid in the marketplace of Athens. Why didn't they discover natural selection, too? After all, natural selection follows necessarily from heritability, variation, and selection, and the Greeks had plenty of evidence for all three pieces. Natural selection is obvious once you understand it, but it took us a long time to discover it.
I get the same vibe from intelligence explosion. The hypothesis wasn't stated clearly until 1965, but in hindsight it seems obvious. (Michael Vassar once told me that once he became a physicalist he said "Oh! Intelligence explosion!" Except of course he didn't know the term "intelligence explosion." And he was probably exaggerating.)
Intelligence explosion follows from physicalism and scientific progress and not much else. Since materialists had to believe that human intelligence resulted from the operation of mechanical systems located in the human body, they could have realized that scientists would eventually come to understand these systems so long as scientific progress continued. (Herophilos and Erasistratus were already mapping which nerves and veins did what back in the 4th century B.C.)
And once human intelligence is understood, it can be improved upon, and this improvement in intelligence can be used to improve intelligence even further. And the ancient Greeks certainly had good evidence that there was plenty of room above us when it came to intelligence.
The major hang-up for predicting intelligence explosion may have been the the inability to imagine that this intelligence-engineering could leave the limitations of the human skull and move to a speedier, more dependable and scalable substrate. And that's why Good's paper had to wait until the age of the computer.
Let's take for granted that pursuing FAI is the best strategy for researchers interested in the future of all humanity. However, let's also assume that controlling unfriendly AI is not completely impossible. I would like to see arguments on why FAI may or may not be the best strategy for AGI researchers who are solely interested in selfish values: i.e., personal status, curiosity, well-being of their loved ones, etc.
I believe such discussion is important because i) all researchers are to some extent selfish and ii) it may be unwise to ignore researchers who fail to commit to perfect altruism. I, myself, do not know how selfish I would be if I were to become an AGI researcher in the future.
EDIT: Moved some of the original post content to a comment, since I suspect it was distracting from my main point.
"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"
The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an important question, so we'd like to work toward an answer.
I've been writing a new draft of the intelligence explosion analysis I'm writing with Anna Salamon. I've incorporated much of the feedback LWers have given me, and will now present snippets of the new draft for feedback. Please ignore the formatting issues caused by moving the text from Google Docs to Less Wrong.
Intelligence Explosion: Evidence and Import
The best answer to the question, "Will computers ever be as smart as humans?" is probably “Yes, but only briefly."
Humans may create human-level artificial intelligence in this century.1 Shortly thereafter, we may see an “intelligence explosion” — a chain of events by which human-level AI leads, fairly rapidly, to intelligent systems whose capabilities far surpass those of biological humanity as a whole.
How likely is this, and what should we do about it? Others have discussed these questions previously (Turing 1950; Good 1965; Von Neumann 1966; Solomonoff 1985; Vinge 1993; Bostrom 2003; Yudkowsky 2008; Chalmers 2010), but no brief, systematic review of the relevant issues has been published. In this chapter we aim to provide such a review.
Why study intelligence explosion?
As Chalmers (2010) notes, the singularity is of great practical interest:
If there is a singularity, it will be one of the most important events in the history of the planet. An intelligence explosion has enormous potential benefits: a cure for all known diseases, an end to poverty, extraordinary scientific advances, and much more. It also has enormous potential dangers: an end to the human race, an arms race of warring machines, the power to destroy the planet...
The singularity is also a challenging scientific and philosophical topic. Under the spectre of intelligence explosion, long-standing philosophical puzzles about values, other minds, and personal identity become, as Chalmers puts it, "life-or-death questions that may confront us in coming decades or centuries." In science, the development of AI will require progress in several of mankind's grandest scientific projects, including reverse-engineering the brain (Schierwagen 2011) and developing artificial minds (Nilsson 2010), while the development of AI safety mechanisms may require progress on the confinement problem in computer science (Lampson 1973; Yampolskiy 2011) and the cognitive science of human values (Muehlhauser and Helm, this volume). The creation of AI would also revolutionize scientific method, as most science would be done by intelligent machines (Sparkes et al. 2010).
Such questions are complicated, the future is uncertain, and our chapter is brief. Our aim, then, is not to provide detailed arguments but only to sketch the issues involved, pointing the reader to authors who have analyzed each component in more detail. We believe these matters are important, and our discussion of them must be permitted to begin at a low level because there is no other place to lay the first stones.
What we will (not) argue
"Technological singularity" has come to mean many things (Sandberg, this volume), including: accelerating technological change (Kurzweil 2005), a limit in our ability to predict the future (Vinge 1993), and the topic we will discuss: an intelligence explosion leading to the creation of machine superintelligence (Yudkowsky 1996). Because the singularity is associated with such a variety of views and arguments, we must clarify what this chapter will and will not argue.
First, we will not tell detailed stories about the future. In doing so, we would likely commit the “if and then” fallacy, by which an improbable conditional becomes a supposed actual (Nordmann 2007). For example, we will not assume the continuation of Moore’s law, nor that hardware trajectories determine software progress, nor that technological trends will be exponential rather than logistic (see Modis, this volume), nor indeed that progress will accelerate rather than decelerate (see Plebe and Perconti, this volume). Instead, we will examine convergent outcomes that — like the evolution of eyes or the emergence of markets — can come about through many different paths and can gather momentum once they begin. Humans tend to underestimate the likelihood of such convergent outcomes (Tversky and Kahneman 1974), and we believe intelligence explosion is one of them.
Second, we will not assume that human intelligence is realized by a classical computation system, nor that intelligent machines will have internal mental properties like consciousness or "understanding." Such factors are mostly irrelevant to the occurrence of a singularity, so objections to these claims (Lucas 1961; Dreyfus 1972; Searle 1980; Block 1981; Penrose 1994; Van Gelder and Port 1995) are not objections to the singularity.
What, then, will we argue? First, we suggest there is a significant probability we will create human-level AI (hereafter, "AI") within a century. Second, we suggest that AI is likely to lead rather quickly to machine superintelligence. Finally, we discuss the possible consequences of machine superintelligence and consider which actions we can take now to shape our future.
From here to AI
Our first step is to survey the evidence concerning whether we should expect the creation of AI within a century.
By "AI," we refer to "systems which match or exceed the cognitive performance of humans in virtually all domains of interest" (Shulman & Bostrom 2011). On this definition, IBM's Jeopardy!-playing Watson computer is not an AI but merely a "narrow AI" because it can only solve a narrow set of problems. Drop Watson in a pond or ask it to do original science, and it is helpless. Imagine instead a machine that can invent new technologies, manipulate humans with acquired social skills, and otherwise learn to navigate new environments as needed.
There are many types of AI. To name just three:
- The code of a transparent AI is written explicitly by, and largely understood by, its programmers.2
- An opaque AI is not transparent to its creators. For example it could be, like the human brain, a messy ensemble of cognitive modules. In an AI, these modules might be written by different teams for different purposes, using different languages and approaches.
- A whole brain emulation (WBE) is a computer emulation of the brain structures required to functionally reproduce human thought and perhaps consciousness (Sandberg and Bostrom 2008). We need not understand the detailed mechanisms of general intelligence to reproduce a brain functionally on a computing substrate.
Whole brain emulation uses the human software for intelligence already invented by evolution, while other forms of AI ("de novo AI") require inventing intelligence anew, to varying degrees.
When should we expect AI? Unfortunately, expert elicitation methods have not proven useful for long-term forecasting,3 and prediction markets have not yet been tested much for technological forecasting (Williams 2011), so our analysis must allow for a wide range of outcomes. We will first consider how difficult the problem seems to be, and then which inputs toward solving the problem — and which "speed bumps" — we can expect in the next century.
How hard is whole brain emulation?
Because whole brain emulation will rely mostly on scaling up existing technologies like microscopy and large-scale cortical simulation, WBE may be largely an "engineering" problem, and thus more predictable than other kinds of AI.
Several authors have discussed the difficulty of WBE in detail (Sandberg and Bostrom 2008; de Garis et al. 2010; Modha et al. 2011; Cattell & Parker 2011). In short: The difficulty of WBE depends on many factors, and in particular on the resolution of emulation required for successful WBE. For example, proteome-resolution emulation will require more resources and technological development than emulation at the resolution of the brain's neural network. In perhaps the most likely scenario,
WBE on the neuronal/synaptic level requires relatively modest increases in microscopy resolution, a less trivial development of automation for scanning and image processing, a research push at the problem of inferring functional properties of neurons and synapses, and relatively business‐as‐usual development of computational neuroscience models and computer hardware.4
How hard is de novo AI?
There is a vast space of possible mind designs for de novo AI; talking about "non-human intelligence" is like talking about "non-platypus animals" (Dennett 1997; Pennachin and Goertzel 2007; Yudkowsky 2008).
We do not know what it takes to build de novo AI. Because of this, we do not know what groundwork will be needed to understand general intelligence, nor how long it may take to get there.
Worse, it’s easy to think we do know. Studies show that except for weather forecasters (Murphy and Winkler 1984), nearly all of us give inaccurate probability estimates when we try, and in particular we are overconfident of our predictions (Lichtenstein, Fischoff, and Phillips 1982; Griffin and Tversky 1992; Yates et al. 2002). Experts, too, often do little better than chance (Tetlock 2005), and are outperformed by crude computer algorithms (Grove and Meehl 1996; Grove et al. 2000; Tetlock 2005). So if you have a gut feeling about when digital intelligence will arrive, it is probably wrong.
But uncertainty is not a “get out of prediction free” card. You either will or will not save for retirement, encourage WBE development, or support AI risk reduction. The outcomes of these choices will depend, among other things, on whether AI is created in the near future. Should you plan as though there are 50/50 odds of achieving AI in the next 30 years? Are you 99% confident we won't create AI in the next 30 years? Or is your estimate somewhere in between?
If we can't use our intuitions for prediction or defer to experts how might one estimate the time until AI? We consider several strategies below.
[end of snippet]
1 Bainbridge 2006; Baum, Goertzel, and Goertzel 2011; Bostrom 2003; Legg 2008; Sandberg and Bostrom 2011.
3 Armstrong 1985; Woudenberg 1991; Rowe and Wright 2001. But, see Anderson and Anderson-Parente (2011).
4 Sandberg and Bostrom (2008), p. 83.
Modha et al. 2011, cognitive computing, communications of the ACM
Cattell & Parker 2011 challenges for brain emulation
Schierwagen 2011 Reverse engineering for biologically-inspired cognitive architectures
Floreano and Mattiussi 2008 bio-inspired artificial intelligence
de Garis et al. 2010 a world survey of artificial brain projects part 1
Nilsson 2010 the quest for artificial intelligence
Sparkes et al. 2010 Towards Robot Scientists for autonomous scientific discovery
Turing 1950 machine intelligence
Good 1965 speculations concerning the first ultraintelligent machine
Von Neumann 1966 theory of self-reproducing autonomata
Solomonoff 1985 the time scale of artificial intelligence
Vinge 1993 coming technological singularity
Bostrom 2003 ethical issues in advanced artificial intelligence
Yampolsky 2011 leakproofing the singularity
Lampson 1973 a note on the confinement problem
Yudkowsky 2008 artificial intelligence as a negative and positive factor in global risk
Chalmers 2010 the singularity a philosophical analysis
Schierwagen 2011 Reverse engineering for biologically-inspired cognitive architectures, a critical analysis
Kurzweil 2005 the singularity is near
Yudkowsky 1996 staring into the singularity
Nordmann 2007 If and then: a critique of speculative nanoethics
Tversky and Kahneman 1974 Judgment under uncertainty: Heuristics and biases
Lucas 1961 minds, machines, and godel
Dreyfus 1972 what computers can't do
Searle 1980 minds brains and programs
Block 1981 Psychologism and behaviorism
Penrose 1994 shadows of the mind
Van Gelder and Port 1995 It's about time, an overview of the dynamical approach to cognition
Shulman and Bostrom 2011 How hard is artificial intelligence
Sandberg and Bostrom 2008 whole brain emulation a roadmap
Williams 2011 prediction markets theory and applications
Dennett 1997 kinds of minds
Pennachin and Goertzel 2007 an overview of contemporary approaches to AGI
Murphy and Winkler 1984 probability forecasting in meteorology
Lichtenstein, Fischoff, and Phillips 1982 calibration of probabilities the state of the art to 1980
Griffin and Tversky 1992 The weighing of evidence and the determinants of confidence
Grove and Meehl 1996 Comparative Efficiency of Informal...
Grove et al. 2000 Clinical versus mechanical prediction: A meta-analysis
Yates, Lee, Sieck, Choi, Price 2002 Probability judgment across cultures
Tetlock 2005 expert political judgment
Bainbridge 2006 Managing Nano-Bio-Info-Cogno Innovations: Converging Technologies...
Baum, Goertzel, and Goertzel 2011 How long until human-level AI?
Legg 2008 machine superintelligence
Sandberg & Bostrom 2011 machine intelligence survey
Sutton and Barto 1998 reinforcement learning an introduction
Hutter 2004 universal ai
Schmidhuber 2007 godel machines
Dewey 2011 learning what to value
Armstrong 1985 Long-Range Forecasting: From Crystal Ball to Computer, 2nd edition
Woudenberg 1991 an evaluation of delphi
Rowe and Wright 2001 expert opinions in forecasting
Anderson and Anderson-Parente 2011 A case study of long-term Delphi accuracy
Muehlhauser and Helm, this volume: The Singularity and Machine Ethics
Sandberg, this volume: models of technological singularity
Modis, this volume: there will be no singularity
Plebe and Perconti, this volume: the slowdown hypothesis
Though earlier users of the term "technological Singularity" used it to refer to the arrival of machine superintelligence (an event beyond which our ability to predict the future breaks down), Kurzweil's Singularity is more vaguely defined:
What, then, is the Singularity? It's a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed.
Kurzweil says that people don't expect the Singularity because they don't realize that technological progress is largely exponential, not linear:
People intuitively assume that the current rate of progress will continue for future periods. Even for those who have been around long enough to experience how the pace of change increases, over time, unexamined intuition leaves one with the impression that change occurs at the same rate that we have experienced most recently. From the mathematician's perspective, the reason for this is that an exponential curve looks like a straight line when examined for only a brief duratio. As a result, even sophisticated commentators, when considering the future, typically extrapolate the current pace of change over the next ten years or one hundred years to determine their expectations...
But a serious assessment of the history of technology reveals that technological change is exponential... You can examine the data in different ways, on different timescales, and for a wide variety of technologies, ranging from electronic to biological... the acceleration of progress and growth applies to each of them.
Kurzweil has many examples:
Consider Gary Kasparov, who scorned the pathetic state of computer chess in 1992. Yet the relentless doubling of computer power every year enabled a computer to defeat him only five years later...
[Or] consider the biochemists who, in 1990, were skeptical of the goal of transcribing the entire human genome in a mere fifteen years. These scientists had just spent an entire year transcribing a mere one ten-thousandth of the genome. So... it seemed natural to them that it would take a century, if not longer, before the genome could be sequenced. [The complete genome was sequenced in 2003.]
He emphasizes that people often fail to account for how progress in one field will feed on accelerating progress in another:
Can the pace of technological progress continue to speed up indefinitely? Isn't there a point at which humans are unable to think fast enough to keep up? For unenhanced humans, clearly so. But what would 1,000 scientists, each 1,000 times more intelligent than human scientists today, and each operating 1,000 times faster that contemporary humans (because the information processing in their primarily nonbiological brains is faster) accomplish? One chronological year would be like a millennium for them... an hour would result in a century of progress (in today's terms).
Kurzweil's second chapter aims to convince us that Moore's law of exponential growth in computing power is not an anomaly: the "law of accelerating returns" holds for a wide variety of technologies, evolutionary developments, and paradigm shifts. The chapter is full of logarithmic plots for bits of DRAM per dollar, microprocessor clock speed, processor performance in MIPS, growth in Genbank, hard drive bits per dollar, internet hosts, nanotech science citations, and more.
The chapter is a wake-up call to those not used to thinking about exponential change, but one gets the sense that Kurzweil has cherry-picked his examples. Plenty of technologies have violated his law of accelerating returns, and Kurzweil doesn't mention them.
This cherry-picking is one of the two persistent problems with The Singularity is Near. The second persistent problem is detailed storytelling. Kurzweil would make fewer false predictions if he made statements about the kinds of changes we can expect and then gave examples as illustrations, instead of giving detailed stories about the future as his actual predictions.
My third major issue with the book is not a "problem" so much as it is a decision about the scope of the book. Human factors (sociology, psychology, politics) are largely ignored in the book , but would have been illuminating to include if done well — and certainly, they are important for technological forecasting.
It's a big book with many specific claims, so there are hundreds of detailed criticisms I could make (e.g. about his handling of AI risks), but I prefer to keep this short. Kurzweil's vision of the future is more similar to what I expect is correct than most people's pictures of the future are, and he should be applauded for finding a way to bring transhumanist ideas to the mainstream culture.
On this page I will collect criticisms of (1) the claim that intelligence explosion is plausible, (2) the claim that intelligence explosion is likely to occur within the next 150 years, and (3) the claim that intelligence explosion would have a massive impact on civilization. Please suggest your own, citing the original source when possible.
"AGI won't be a big deal; we already have 6 billion general intelligences on Earth."
Example: "I see no reason to single out AI as a mould-breaking technology: we already have billions of humans." (Deutsch, The Beginning of Infinity, p. 456.)
Response: The advantages of mere digitality (speed, copyability, goal coordination) alone are transformative, and will increase the odds of rapid recursive self-improvement in intelligence. Meat brains are badly constrained in ways that non-meat brains need not be.
"Intelligence requires experience and learning, so there is a limit to the speed at which even a machine can improve its own intelligence."
Example: "If you define the singularity as a point in time when intelligent machines are designing intelligent machines in such a way that machines get extremely intelligent in a short period of time--an exponential increase in intelligence--then it will never happen. Intelligence is largely defined by experience and training, not just by brain size or algorithms. It isn't a matter of writing software. Intelligent machines, like humans, will need to be trained in particular domains of expertise. This takes time and deliberate attention to the kind of knowledge you want the machine to have." (Hawkins, Tech Luminaries Address Singularity)
Response: Intelligence defined as optimization power doesn't necessarily need experience or learning from the external world. Even if it did, a superintelligent machine spread throughout the internet could gain experience and learning from billions of sub-agents all around the world simultaneously, while near-instantaneously propagating these updates to its other sub-agents.
"There are hard limits to how intelligent a machine can get."
Example: "The term 'singularity' applied to intelligent machines refers to the idea that when intelligent machines can design intelligent machines smarter than themselves, it will cause an exponential growth in machine intelligence leading to a singularity of infinite (or at least extremely large) intelligence. Belief in this idea is based on a naive understanding of what intelligence is. As an analogy, imagine we had a computer that could design new computers (chips, systems, and software) faster than itself. Would such a computer lead to infinitely fast computers or even computers that were faster than anything humans could ever build? No. It might accelerate the rate of improvements for a while, but in the end there are limits to how big and fast computers can run... Exponential growth requires the exponential consumption of resources (matter, energy, and time), and there are always limits to this." (Hawkins, Tech Luminaries Address Singularity)
Response: There are physical limits to how intelligent something can get, but they easily allow the intelligence required to transform the solar system.
"AGI won't be malevolent."
Example: "No intelligent machine will 'wake up' one day and say 'I think I will enslave my creators.'" (Hawkins, Tech Luminaries Address Singularity)
Example: "...it's more likely than not in my view that the two species will comfortably and more or less peacefully coexist--unless human interests start to interfere with those of the machines." (Casti, Tech Luminaries Address Singularity)
Response: True. But most runaway machine superintelligence designs would kill us inadvertently. "The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else."
"If intelligence explosion was possible, we would have seen it by now."
Example: "I don't believe in technological singularities. It's like extraterrestrial life--if it were there, we would have seen it by now." (Rodgers, Tech Luminaries Address Singularity)
Response: Not true.
"Humanity will destroy itself before AGI arrives."
Example: "the population will destroy itself before the technological singularity." (Bell, Tech Luminaries Address Singularity)
Response: This is plausible, though there are many reasons to think that AGI will arrive before other global catastrophic risks do.
"The Singularity belongs to the genre of science fiction."
Example: "The fact that you can visualize a future in your imagination is not evidence that it is likely or even possible. Look at domed cities, jet-pack commuting, underwater cities, mile-high buildings, and nuclear-powered automobiles--all staples of futuristic fantasies when I was a child that have never arrived." (Pinker, Tech Luminaries Address Singularity)
Response: This is not an issue of literary genre, but of probability and prediction. Science fiction becomes science fact several times every year. In the case of technological singularity, there are good scientific and philosophical reasons to expect it.
"Intelligence isn't enough; a machine would also need to manipulate objects."
Example: "The development of humans, what evolution has come up with, involves a lot more than just the intellectual capability. You can manipulate your fingers and other parts of your body. I don't see how machines are going to overcome that overall gap, to reach that level of complexity, even if we get them so they're intellectually more capable than humans." (Moore, Tech Luminaries Address Singularity)
Response: Robotics is making strong progress in addition to AI.
"Human intelligence or cognitive ability can never be achieved by a machine."
Example: "Goedel's theorem must apply to cybernetical machines, because it is of the essence of being a machine, that it should be a concrete instantiation of a formal system. It follows that given any machine which is consistent and capable of doing simple arithmetic, there is a formula which it is incapable of producing as being true---i.e., the formula is unprovable-in-the-system-but which we can see to be true. It follows that no machine can be a complete or adequate model of the mind, that minds are essentially different from machines." (Lucas, Minds, Machines and Goedel)
Example: "Instantiating a computer program is never by itself a sufficient condition of [human-liked] intentionality." (Searle, Minds, Brains, and Programs)
Response: "...nothing in the singularity idea requires that an AI be a classical computational system or even that it be a computational system at all. For example, Penrose (like Lucas) holds that the brain is not an algorithmic system in the ordinary sense, but he allows that it is a mechanical system that relies on certain nonalgorithmic quantum processes. Dreyfus holds that the brain is not a rule-following symbolic system, but he allows that it may nevertheless be a mechanical system that relies on subsymbolic processes (for example, connectionist processes). If so, then these arguments give us no reason to deny that we can build artificial systems that exploit the relevant nonalgorithmic quantum processes, or the relevant subsymbolic processes, and that thereby allow us to simulate the human brain... As for the Searle and Block objections, these rely on the thesis that even if a system duplicates our behaviour, it might be missing important ‘internal’ aspects of mentality: consciousness, understanding, intentionality, and so on.... we can set aside these objections by stipulating that for the purposes of the argument, intelligence is to be measured wholly in terms of behaviour and behavioural dispositions, where behaviour is construed operationally in terms of the physical outputs that a system produces." (Chalmers, The Singularity: A Philosophical Analysis)
"It might make sense in theory, but where's the evidence?"
Example: "Too much theory, not enough empirical evidence." (MileyCyrus, LW comment)
Response: "Papers like How Long Before Superintelligence contain some of the relevant evidence, but it is old and incomplete. Upcoming works currently in progress by Nick Bostrom and by SIAI researchers contain additional argument and evidence, but even this is not enough. More researchers should be assessing the state of the evidence."
"Humans will be able to keep up with AGI by using AGI's advancements themselves."
Example: "...an essential part of what we mean by foom in the first place... is that it involves a small group accelerating in power away from the rest of the world. But the reason why that happened in human evolution is that genetic innovations mostly don't transfer across species. [But] human engineers carry out exactly this sort of technology transfer on a routine basis." (rwallace, The Curve of Capability)
Response: Human engineers cannot take a powerful algorithm from AI and implement it in their own neurobiology. Moreover, once an AGI is improving its own intelligence, it's not clear that it would share the 'secrets' of these improvements with humans.
"A discontinuous break with the past requires lopsided capabilities development."
Example: "a chimpanzee could make an almost discontinuous jump to human level intelligence because it wasn't developing across the board. It was filling in a missing capability - symbolic intelligence - in an otherwise already very highly developed system. In other words, its starting point was staggeringly lopsided... [But] the lopsidedness is not occurring [in computers]. Obviously computer technology hasn't lagged in symbol processing - quite the contrary." (rwallace, The Curve of Capability)
Example: "Some species, such as humans, have mostly taken over the worlds of other species. The seeming reason for this is that there was virtually no sharing of the relevant information between species. In human society there is a lot of information sharing." (Katja Grace, How Far Can AI Jump?)
Response: It doesn't seem that symbol processing was the missing capability that made humans so powerful. Calculators have superior symbol processing, but have no power to rule the world. Also: many kinds of lopsidedness are occurring in computing technology that may allow a sudden discontinuous jump in AI abilities. In particular, we are amassing vast computational capacities without yet understanding the algorithmic keys to general intelligence.
"No small set of insights will lead to massive intelligence boost in AI."
Example: "...if there were a super mind theory that allowed vast mental efficiency gains all at once, but there isn’t. Minds are vast complex structures full of parts that depend intricately on each other, much like the citizens of a city. Minds, like cities, best improve gradually, because you just never know enough to manage a vast redesign of something with such complex inter-dependent adaptations." (Robin Hanson, Is the City-ularity Near?)
Example: "Now if you artificially hobble something so as to simultaneously reduce many of its capacities, then when you take away that limitation you may simultaneously improve a great many of its capabilities... But beyond removing artificial restrictions, it is very hard to simultaneously improve many diverse capacities. Theories that help you improve capabilities are usually focused on a relatively narrow range of abilities – very general and useful theories are quite rare." (Robin Hanson, The Betterness Explosion)
Response: An intelligence explosion doesn't require a breakthrough that improves all capabilities at once. Rather, it requires an AI capable of improving its intelligence in a variety of ways. Then it can use the advantages of mere digitality (speed, copyability, goal coordination, etc.) to improve its intelligence in dozens or thousands of ways relatively quickly.
To be added:
- Massimo Pigliucci on Chalmers' Singularity talk
- XiXiDu on intelligence explosion as a disjunctive or conjunctive event, on intelligence explosion as a low-priority global risk, on basic AI drives
- Diminishing returns from intelligence amplification
(The following is a summary of some of my previous submissions that I originally created for my personal blog.)
As we know,
There are known knowns.
There are things
We know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don’t know
We don’t know.
— Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing
Intelligence, a cornucopia?
It seems to me that those who believe into the possibility of catastrophic risks from artificial intelligence act on the unquestioned assumption that intelligence is kind of a black box, a cornucopia that can sprout an abundance of novelty. But this implicitly assumes that if you increase intelligence you also decrease the distance between discoveries.
Intelligence is no solution in itself, it is merely an effective searchlight for unknown unknowns and who knows that the brightness of the light increases proportionally with the distance between unknown unknowns? To enable an intelligence explosion the light would have to reach out much farther with each increase in intelligence than the increase of the distance between unknown unknowns. I just don’t see that to be a reasonable assumption.
Intelligence amplification, is it worth it?
It seems that if you increase intelligence you also increase the computational cost of its further improvement and the distance to the discovery of some unknown unknown that could enable another quantum leap. It seems that you need to apply a lot more energy to get a bit more complexity.
If any increase in intelligence is vastly outweighed by its computational cost and the expenditure of time needed to discover it then it might not be instrumental for a perfectly rational agent (such as an artificial general intelligence), as imagined by game theorists, to increase its intelligence as opposed to using its existing intelligence to pursue its terminal goals directly or to invest its given resources to acquire other means of self-improvement, e.g. more efficient sensors.
What evidence do we have that the payoff of intelligent, goal-oriented experimentation yields enormous advantages (enough to enable an intelligence explosion) over evolutionary discovery relative to its cost?
We simply don’t know if intelligence is instrumental or quickly hits diminishing returns.
Can intelligence be effectively applied to itself at all? How do we know that any given level of intelligence is capable of handling its own complexity efficiently? Many humans are not even capable of handling the complexity of the brain of a worm.
Humans and the importance of discovery
There is a significant difference between intelligence and evolution if you apply intelligence to the improvement of evolutionary designs:
- Intelligence is goal-oriented.
- Intelligence can think ahead.
- Intelligence can jump fitness gaps.
- Intelligence can engage in direct experimentation.
- Intelligence can observe and incorporate solutions of other optimizing agents.
But when it comes to unknown unknowns, what difference is there between intelligence and evolution? The critical similarity is that both rely on dumb luck when it comes to genuine novelty. And where else but when it comes to the dramatic improvement of intelligence itself does it take the discovery of novel unknown unknowns?
We have no idea about the nature of discovery and its importance when it comes to what is necessary to reach a level of intelligence above our own, by ourselves. How much of what we know was actually the result of people thinking quantitatively and attending to scope, probability, and marginal impacts? How much of what we know today is the result of dumb luck versus goal-oriented, intelligent problem solving?
Our “irrationality” and the patchwork-architecture of the human brain might constitute an actual feature. The noisiness and patchwork architecture of the human brain might play a significant role in the discovery of unknown unknowns because it allows us to become distracted, to leave the path of evidence based exploration.
A lot of discoveries were made by people who were not explicitly trying to maximizing expected utility. A lot of progress is due to luck, in the form of the discovery of unknown unknowns.
A basic argument in support of risks from superhuman intelligence is that we don’t know what it could possible come up with. That is also why it is called it a “Singularity“. But why does nobody ask how a superhuman intelligence knows what it could possible come up with?
It is not intelligence in and of itself that allows humans to accomplish great feats. Even people like Einstein, geniuses who were apparently able to come up with great insights on their own, were simply lucky to be born into the right circumstances, the time was ripe for great discoveries, thanks to previous discoveries of unknown unknowns.
Evolution versus Intelligence
It is argued that the mind-design space must be large if evolution could stumble upon general intelligence and that there are low-hanging fruits that are much more efficient at general intelligence than humans are, evolution simply went with the first that came along. It is further argued that evolution is not limitlessly creative, each step must increase the fitness of its host, and that therefore there are artificial mind designs that can do what no product of natural selection could accomplish.
I agree with the above, yet given all of the apparent disadvantages of the blind idiot God, evolution was able to come up with altruism, something that works two levels above the individual and one level above society. So far we haven’t been able to show such ingenuity by incorporating successes that are not evident from an individual or even societal position.
The example of altruism provides evidence that intelligence isn’t many levels above evolution. Therefore the crucial question is, how great is the performance advantage? Is it large enough to justify the conclusion that the probability of an intelligence explosion is easily larger than 1%? I don’t think so. To answer this definitively we would have to fathom the significance of the discovery (“random mutations”) of unknown unknowns in the dramatic amplification of intelligence versus the invention (goal-oriented “research and development”) of an improvement within known conceptual bounds.
Another example is flight. Artificial flight is not even close to the energy efficiency and maneuverability of birds or insects. We didn’t went straight from no artificial flight towards flight that is generally superior to the natural flight that is an effect of biological evolution.
Take for example a dragonfly. Even if we were handed the design for a perfect artificial dragonfly, minus the design for the flight of a dragonfly, we wouldn’t be able to build a dragonfly that can take over the world of dragonflies, all else equal, by means of superior flight characteristics.
It is true that a Harpy Eagle can lift more than three-quarters of its body weight while the Boeing 747 Large Cargo Freighter has a maximum take-off weight of almost double its operating empty weight (I suspect that insects can do better). My whole point is that we never reached artificial flight that is strongly above the level of natural flight. An eagle can after all catch its cargo under various circumstances like the slope of a mountain or from beneath the sea, thanks to its superior maneuverability.
Humans are biased and irrational
It is obviously true that our expert systems are better than we are at their narrow range of expertise. But that expert systems are better at certain tasks does not imply that you can effectively and efficiently combine them into a coherent agency.
The noisiness of the human brain might be one of the important features that allows it to exhibit general intelligence. Yet the same noise might be the reason that each task a human can accomplish is not put into execution with maximal efficiency. An expert system that features a single stand-alone ability is able to reach the unique equilibrium for that ability. Whereas systems that have not fully relaxed to equilibrium feature the necessary characteristics that are required to exhibit general intelligence. In this sense a decrease in efficiency is a side-effect of general intelligence. If you externalize a certain ability into a coherent framework of agency, you decrease its efficiency dramatically. That is the difference between a tool and the ability of the agent that uses the tool.
In the above sense, our tendency to be biased and act irrationally might partly be a trade off between plasticity, efficiency and the necessity of goal-stability.
Embodied cognition and the environment
Another problem is that general intelligence is largely a result of an interaction between an agent and its environment. It might be in principle possible to arrive at various capabilities by means of induction, but it is only a theoretical possibility given unlimited computational resources. To achieve real world efficiency you need to rely on slow environmental feedback and make decision under uncertainty.
AIXI is often quoted as a proof of concept that it is possible for a simple algorithm to improve itself to such an extent that it could in principle reach superhuman intelligence. AIXI proves that there is a general theory of intelligence. But there is a minor problem, AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence doesn’t get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process.
Just imagine you emulated a grown up human mind and it wanted to become a pick up artist, how would it do that with an Internet connection? It would need some sort of avatar, at least, and then wait for the environment to provide a lot of feedback.
Therefore even if we’re talking about the emulation of a grown up mind, it will be really hard to acquire some capabilities. Then how is the emulation of a human toddler going to acquire those skills? Even worse, how is some sort of abstract AGI going to do it that misses all of the hard coded capabilities of a human toddler?
Can we even attempt to imagine what is wrong about a boxed emulation of a human toddler, that makes it unable to become a master of social engineering in a very short time?
Can we imagine what is missing that would enable one of the existing expert systems to quickly evolve vastly superhuman capabilities in its narrow area of expertise? Why haven’t we seen a learning algorithm teaching itself chess intelligence starting with nothing but the rules?
In a sense an intelligent agent is similar to a stone rolling down a hill, both are moving towards a sort of equilibrium. The difference is that intelligence is following more complex trajectories as its ability to read and respond to environmental cues is vastly greater than that of a stone. Yet intelligent or not, the environment in which an agent is embedded plays a crucial role. There exist a fundamental dependency on unintelligent processes. Our environment is structured in such a way that we use information within it as an extension of our minds. The environment enables us to learn and improve our predictions by providing a testbed and a constant stream of data.
Necessary resources for an intelligence explosion
If artificial general intelligence is unable to seize the resources necessary to undergo explosive recursive self-improvement then the ability and cognitive flexibility of superhuman intelligence in and of itself, as characteristics alone, would have to be sufficient to self-modify its way up to massive superhuman intelligence within a very short time.
Without advanced real-world nanotechnology it will be considerable more difficult for an AGI to undergo quick self-improvement. It will have to make use of existing infrastructure, e.g. buy stocks of chip manufactures and get them to create more or better CPU’s. It will have to rely on puny humans for a lot of tasks. It won’t be able to create new computational substrate without the whole economy of the world supporting it. It won’t be able to create an army of robot drones overnight without it either.
Doing so it would have to make use of considerable amounts of social engineering without its creators noticing it. But, more importantly, it will have to make use of its existing intelligence to do all of that. The AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources. The AGI could not profit from its ability to self-improve regarding the necessary acquisition of resources to be able to self-improve in the first place.
Therefore the absence of advanced nanotechnology constitutes an immense blow to the possibility of explosive recursive self-improvement and risks from AI in general.
One might argue that an AGI will solve nanotechnology on its own and find some way to trick humans into manufacturing a molecular assembler and grant it access to it. But this might be very difficult.
There is a strong interdependence of resources and manufacturers. The AGI won’t be able to simply trick some humans to build a high-end factory to create computational substrate, let alone a molecular assembler. People will ask questions and shortly after get suspicious. Remember, it won’t be able to coordinate a world-conspiracy, it hasn’t been able to self-improve to that point yet because it is still trying to acquire enough resources, which it has to do the hard way without nanotech.
Anyhow, you’d probably need a brain the size of the moon to effectively run and coordinate a whole world of irrational humans by intercepting their communications and altering them on the fly without anyone freaking out.
People associated with the SIAI would at this point claim that if the AI can’t make use of nanotechnology it might make use of something we haven’t even thought about. But what, magic?
Artificial general intelligence, a single break-through?
Another point to consider when talking about risks from AI is how quickly the invention of artificial general intelligence will take place. What evidence do we have that there is some principle that, once discovered, allows us to grow superhuman intelligence overnight?
If the development of AGI takes place slowly, a gradual and controllable development, we might be able to learn from small-scale mistakes while having to face other risks in the meantime. This might for example be the case if intelligence can not be captured by a discrete algorithm, or is modular, and therefore never allow us to reach a point where we can suddenly build the smartest thing ever that does just extend itself indefinitely.
To me it doesn’t look like that we will come up with artificial general intelligence quickly, but rather that we will have to painstakingly optimize our expert systems step by step over long periods of times.
It is claimed that an artificial general intelligence might wipe us out inadvertently while undergoing explosive recursive self-improvement to more effectively pursue its terminal goals. I think that it is unlikely that most AI designs will not hold.
I agree with the argument that any AGI that isn’t made to care about humans won’t care about humans. But I also think that the same argument applies for spatio-temporal scope boundaries and resource limits. Even if the AGI is not told to hold, e.g. compute as many digits of Pi as possible, I consider it an far-fetched assumption that any AGI intrinsically cares to take over the universe as fast as possible to compute as many digits of Pi as possible. Sure, if all of that are presuppositions then it will happen, but I don’t see that most of all AGI designs are like that. Most that have the potential for superhuman intelligence, but who are given simple goals, will in my opinion just bob up and down as slowly as possible.
Complex goals need complex optimization parameters (the design specifications of the subject of the optimization process against which it will measure its success of self-improvement).
Even the creation of paperclips is a much more complex goal than telling an AI to compute as many digits of Pi as possible.
For an AGI, that was designed to design paperclips, to pose an existential risk, its creators would have to be capable enough to enable it to take over the universe on its own, yet forget, or fail to, define time, space and energy bounds as part of its optimization parameters. Therefore, given the large amount of restrictions that are inevitably part of any advanced general intelligence, the nonhazardous subset of all possible outcomes might be much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc.
The Fermi paradox does allow for and provide the only conclusions and data we can analyze that amount to empirical criticism of concepts like that of a Paperclip maximizer and general risks from superhuman AI’s with non-human values without working directly on AGI to test those hypothesis ourselves.
If you accept the premise that life is not unique and special then one other technological civilisation in the observable universe should be sufficient to leave potentially observable traces of technological tinkering.
Due to the absence of any signs of intelligence out there, especially paper-clippers burning the cosmic commons, we might conclude that unfriendly AI could not be the most dangerous existential risk that we should worry about.
In principle we could build antimatter weapons capable of destroying worlds, but in practise it is much harder to accomplish.
There are many question marks when it comes to the possibility of superhuman intelligence, and many more about the possibility of recursive self-improvement. Most of the arguments in favor of those possibilities solely derive their appeal from being vague.
- Intelligence Explosion - A Disjunctive or Conjunctive Event?
- The Hanson-Yudkowsky AI-Foom Debate
- The Betterness Explosion
- Is The City-ularity Near?
- How far can AI jump?
- Why I’m Not Afraid of the Singularity
- What’s the Likelihood of the Singularity? Part One: Artificial Intelligence
- When Exactly Will Computers Go Ape-Shi* and Take Over?
- The slowdown hypothesis (extended abstract)
- The singularity as faith (extended abstract)
(The following is a summary of some of my previous submissions that I originally created for my personal blog.)
...an intelligence explosion may have fair probability, not because it occurs in one particular detailed scenario, but because, like the evolution of eyes or the emergence of markets, it can come about through many different paths and can gather momentum once it gets started. Humans tend to underestimate the likelihood of such “disjunctive” events, because they can result from many different paths (Tversky and Kahneman 1974). We suspect the considerations in this paper may convince you, as they did us, that this particular disjunctive event (intelligence explosion) is worthy of consideration.
It seems to me that all the ways in which we disagree have more to do with philosophy (how to quantify uncertainty; how to deal with conjunctions; how to act in consideration of low probabilities) [...] we are not dealing with well-defined or -quantified probabilities. Any prediction can be rephrased so that it sounds like the product of indefinitely many conjunctions. It seems that I see the “SIAI’s work is useful scenario” as requiring the conjunction of a large number of questionable things [...]
— Holden Karnofsky, 6/24/11 (GiveWell interview with major SIAI donor Jaan Tallinn, PDF)
People associated with the Singularity Institute for Artificial Intelligence (SIAI) like to claim that the case for risks from AI is supported by years worth of disjunctive lines of reasoning. This basically means that there are many reasons to believe that humanity is likely to be wiped out as a result of artificial general intelligence. More precisely it means that not all of the arguments supporting that possibility need to be true, even if all but one are false risks from AI are to be taken seriously.
The idea of disjunctive arguments is formalized by what is called a logical disjunction. Consider two declarative sentences, A and B. The truth of the conclusion (or output) that follows from the sentences A and B does depend on the truth of A and B. In the case of a logical disjunction the conclusion of A and B is only false if both A and B are false, otherwise it is true. Truth values are usually denoted by 0 for false and 1 for true. A disjunction of declarative sentences is denoted by OR or ∨ as an infix operator. For example, (A(0)∨B(1))(1), or in other words, if statement A is false and B is true then what follows is still true because statement B is sufficient to preserve the truth of the overall conclusion.
Generally there is no problem with disjunctive lines of reasoning as long as the conclusion itself is sound and therefore in principle possible, yet in demand of at least one of several causative factors to become actual. I don’t perceive this to be the case for risks from AI. I agree that there are many ways in which artificial general intelligence (AGI) could be dangerous, but only if I accept several presuppositions regarding AGI that I actually dispute.
By presuppositions I mean requirements that need to be true simultaneously (in conjunction). A logical conjunction is only true if all of its operands are true. In other words, the a conclusion might require all of the arguments leading up to it to be true, otherwise it is false. A conjunction is denoted by AND or ∧.
Now consider the following prediction: <Mary is going to buy one of thousands of products in the supermarket.>
The above prediction can be framed as a disjunction: Mary is going to buy one of thousands of products in the supermarket, 1.) if she is hungry 2.) if she is thirsty 3.) if she needs a new coffee machine. Only one of the 3 given possible arguments need to be true in order to leave the overall conclusion to be true, that Mary is going shopping. Or so it seems.
The same prediction can be framed as a conjunction: Mary is going to buy one of thousands of products in the supermarket 1.) if she has money 2.) if she has some needs 3.) if the supermarket is open. All of the 3 given factors need to be true in order to render the overall conclusion to be true.
That a prediction is framed to be disjunctive does not speak in favor of the possibility in and of itself. I agree that it is likely that Mary is going to visit the supermarket if I accept the hidden presuppositions. But a prediction is only at most as probable as its basic requirements. In this particular case I don’t even know if Mary is a human or a dog, a factor that can influence the probability of the prediction dramatically.
The same is true for risks from AI. The basic argument in favor of risks from AI is that of an intelligence explosion, that intelligence can be applied to itself in an iterative process leading to ever greater levels of intelligence. In short, artificial general intelligence will undergo explosive recursive self-improvement.
Explosive recursive self-improvement is one of the presuppositions for the possibility of risks from AI. The problem is that this and other presuppositions are largely ignored and left undefined. All of the disjunctive arguments put forth by the SIAI are trying to show that there are many causative factors that will result in the development of unfriendly artificial general intelligence. Only one of those factors needs to be true for us to be wiped out by AGI. But the whole scenario is at most as probable as the assumption hidden in the words <artificial general intelligence> and <explosive recursive self-improvement>.
<Artificial General Intelligence> and <Explosive Recursive Self-improvement> might appear to be relatively simple and appealing concepts. But most of this superficial simplicity is a result of the vagueness of natural language descriptions. Reducing the vagueness of those concepts by being more specific, or by coming up with technical definitions of each of the words they are made up of, reveals the hidden complexity that is comprised in the vagueness of the terms.
If we were going to define those concepts and each of its terms we would end up with a lot of additional concepts made up of other words or terms. Most of those additional concepts will demand explanations of their own made up of further speculations. If we are precise then any declarative sentence (P#) (all of the terms) used in the final description will have to be true simultaneously (P#∧P#). And this does reveal the true complexity of all hidden presuppositions and thereby influence the overall probability, P(risks from AI) = P(P1∧P2∧P3∧P4∧P5∧P6∧…). That is because the conclusion of an argument that is made up of a lot of statements (terms) that can be false is more unlikely to be true since complex arguments can fail in a lot of different ways. You need to support each part of the argument that can be true or false and you can therefore fail to support one or more of its parts, which in turn will render the overall conclusion false.
To summarize: If we tried to pin down a concept like <Explosive Recursive Self-Improvement> we would end up with requirements that are strongly conjunctive.
Making numerical probability estimates
But even if the SIAI was going to thoroughly define those concepts, there is still more to the probability of risks from AI than the underlying presuppositions and causative factors. We also have to integrate our uncertainty about the very methods we used to come up with those concepts, definitions and our ability to make correct predictions about the future and integrate all of it into our overall probability estimates.
Take for example the following contrived quote:
We have to take over the universe to save it by making the seed of an artificial general intelligence, that is undergoing explosive recursive self-improvement, extrapolate the coherent volition of humanity, while acausally trading with other superhuman intelligences across the multiverse.
Although contrived, the above quote does only comprise actual beliefs hold by people associated with the SIAI. All of those beliefs might seem somewhat plausible inferences and logical implications of speculations and state of the art or bleeding edge knowledge of various fields. But should we base real-life decisions on those ideas, should we take those ideas seriously? Should we take into account conclusions whose truth value does depend on the conjunction of those ideas? And is it wise to make further inferences on those speculations?
Let’s take a closer look at the necessary top-level presuppositions to take the above quote seriously:
- The many-worlds interpretation
- Belief in the Implied Invisible
- Timeless Decision theory
- Intelligence explosion
1: Within the lesswrong/SIAI community the many-worlds interpretation of quantum mechanics is proclaimed to be the rational choice of all available interpretations. How to arrive at this conclusion is supposedly also a good exercise in refining the art of rationality.
2: P(Y|X) ≈ 1, then P(X∧Y) ≈ P(X)
In other words, logical implications do not have to pay rent in future anticipations.
3: “Decision theory is the study of principles and algorithms for making correct decisions—that is, decisions that allow an agent to achieve better outcomes with respect to its goals.”
4: “Intelligence explosion is the idea of a positive feedback loop in which an intelligence is making itself smarter, thus getting better at making itself even smarter. A strong version of this idea suggests that once the positive feedback starts to play a role, it will lead to a dramatic leap in capability very quickly.”
To be able to take the above quote seriously you have to assign a non-negligible probability to the truth of the conjunction of #1,2,3,4, 1∧2∧3∧4. Here the question is not not only if our results are sound but if the very methods we used to come up with those results are sufficiently trustworthy. Because any extraordinary conclusions that are implied by the conjunction of various beliefs might outweigh the benefit of each belief if the overall conclusion is just slightly wrong.
Not enough empirical evidence
Don’t get me wrong, I think that there sure are convincing arguments in favor of risks from AI. But do arguments suffice? Nobody is an expert when it comes to intelligence. My problem is that I fear that some convincing blog posts written in natural language are simply not enough.
Just imagine that all there was to climate change was someone who never studied the climate but instead wrote some essays about how it might be physical possible for humans to cause a global warming. If the same person then goes on to make further inferences based on the implications of those speculations, am I going to tell everyone to stop emitting CO2 because of that? Hardly!
Or imagine that all there was to the possibility of asteroid strikes was someone who argued that there might be big chunks of rocks out there which might fall down on our heads and kill us all, inductively based on the fact that the Earth and the moon are also a big rocks. Would I be willing to launch a billion dollar asteroid deflection program solely based on such speculations? I don’t think so.
Luckily, in both cases, we got a lot more than some convincing arguments in support of those risks.
Another example: If there were no studies about the safety of high energy physics experiments then I might assign a 20% chance of a powerful particle accelerator destroying the universe based on some convincing arguments put forth on a blog by someone who never studied high energy physics. We know that such an estimate would be wrong by many orders of magnitude. Yet the reason for being wrong would largely be a result of my inability to make correct probability estimates, the result of vagueness or a failure of the methods I employed to come up with those estimates. The reason for being wrong by many orders of magnitude would have nothing to do with the arguments in favor of the risks, as they might very well be sound given my epistemic state and the prevalent uncertainty.
I believe that mere arguments in favor of one risk do not suffice to neglect other risks that are supported by other kinds of evidence. I believe that logical implications of sound arguments should not reach out indefinitely and thereby outweigh other risks whose implications are fortified by empirical evidence. Sound arguments, predictions, speculations and their logical implications are enough to demand further attention and research, but not much more.
Artificial general intelligence is already an inference made from what we currently believe to be true, going a step further and drawing further inferences from previous speculations, e.g. explosive recursive self-improvement, is in my opinion a very shaky business.
What would happen if we were going to let logical implications of vast utilities outweigh other concrete near-term problems that are based on empirical evidence? Insignificant inferences might exhibit hyperbolic growth in utility: 1.) There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. 2.) The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.
All of the above hints at a general problem that is the reason for why I think that discussions between people associated with the SIAI, its critics and those who try to evaluate the SIAI, won’t lead anywhere. Those discussions miss the underlying reason for most of the superficial disagreement about risks from AI, namely that there is no disagreement about risks from AI in and of itself.
There are a few people who disagree about the possibility of AGI in general, but I don’t want to touch on that subject in this post. I am trying to highlight the disagreement between the SIAI and people who accept the notion of artificial general intelligence. With regard to those who are not skeptical of AGI the problem becomes more obvious when you turn your attention to people like John Baez organisations like GiveWell. Most people would sooner question their grasp of “rationality” than give five dollars to a charity that tries to mitigate risks from AI because their calculations claim it was “rational” (those who have read the article by Eliezer Yudkowsky on ‘Pascal’s Mugging‘ know that I used a statement from that post and slightly rephrased it). The disagreement all comes down to a general averseness to options that have a low probability of being factual, even given that the stakes are high.
Nobody is so far able to beat arguments that bear resemblance to Pascal’s Mugging. At least not by showing that it is irrational to give in from the perspective of a utility maximizer. One can only reject it based on a strong gut feeling that something is wrong. And I think that is what many people are unknowingly doing when they argue against the SIAI or risks from AI. They are signaling that they are unable to take such risks into account. What most people mean when they doubt the reputation of people who claim that risks from AI need to be taken seriously, or who say that AGI might be far off, what those people mean is that risks from AI are too vague to be taken into account at this point, that nobody knows enough to make predictions about the topic right now.
When GiveWell, a charity evaluation service, interviewed the SIAI (PDF), they hinted at the possibility that one could consider the SIAI to be a sort of Pascal’s Mugging:
GiveWell: OK. Well that’s where I stand – I accept a lot of the controversial premises of your mission, but I’m a pretty long way from sold that you have the right team or the right approach. Now some have argued to me that I don’t need to be sold – that even at an infinitesimal probability of success, your project is worthwhile. I see that as a Pascal’s Mugging and don’t accept it; I wouldn’t endorse your project unless it passed the basic hurdles of credibility and workable approach as well as potentially astronomically beneficial goal.
This shows that lot of people do not doubt the possibility of risks from AI but are simply not sure if they should really concentrate their efforts on such vague possibilities.
Technically, from the standpoint of maximizing expected utility, given the absence of other existential risks, the answer might very well be yes. But even though we believe to understand this technical viewpoint of rationality very well in principle, it does also lead to problems such as Pascal’s Mugging. But it doesn’t take a true Pascal’s Mugging scenario to make people feel deeply uncomfortable with what Bayes’ Theorem, the expected utility formula, and Solomonoff induction seem to suggest one should do.
Again, we currently have no rational way to reject arguments that are framed as predictions of worst case scenarios that need to be taken seriously even given a low probability of their occurrence due to the scale of negative consequences associated with them. Many people are nonetheless reluctant to accept this line of reasoning without further evidence supporting the strong claims and request for money made by organisations such as the SIAI.
Here is what mathematician and climate activist John Baez has to say:
Of course, anyone associated with Less Wrong would ask if I’m really maximizing expected utility. Couldn’t a contribution to some place like the Singularity Institute of Artificial Intelligence, despite a lower chance of doing good, actually have a chance to do so much more good that it’d pay to send the cash there instead?
And I’d have to say:
1) Yes, there probably are such places, but it would take me a while to find the one that I trusted, and I haven’t put in the work. When you’re risk-averse and limited in the time you have to make decisions, you tend to put off weighing options that have a very low chance of success but a very high return if they succeed. This is sensible so I don’t feel bad about it.
2) Just to amplify point 1) a bit: you shouldn’t always maximize expected utility if you only live once. Expected values — in other words, averages — are very important when you make the same small bet over and over again. When the stakes get higher and you aren’t in a position to repeat the bet over and over, it may be wise to be risk averse.
3) If you let me put the $100,000 into my retirement account instead of a charity, that’s what I’d do, and I wouldn’t even feel guilty about it. I actually think that the increased security would free me up to do more risky but potentially very good things!
All this shows that there seems to be a fundamental problem with the formalized version of rationality. The problem might be human nature itself, that some people are unable to accept what they should do if they want to maximize their expected utility. Or we are missing something else and our theories are flawed. Either way, to solve this problem we need to research those issues and thereby increase the confidence in the very methods used to decide what to do about risks from AI, or to increase the confidence in risks from AI directly, enough to make it look like a sensible option, a concrete and discernable problem that needs to be solved.
Many people perceive the whole world to be at stake, either due to climate change, war or engineered pathogens. Telling them about something like risks from AI, even though nobody seems to have any idea about the nature of intelligence, let alone general intelligence or the possibility of recursive self-improvement, seems like just another problem, one that is too vague to outweigh all the other risks. Most people feel like having a gun pointed to their heads, telling them about superhuman monsters that might turn them into paperclips then needs some really good arguments to outweigh the combined risk of all other problems.
But there are many other problems with risks from AI. To give a hint at just one example: if there was a risk that might kill us with a probability of .7 and another risk with .1 while our chance to solve the first one was .0001 and the second one .1, which one should we focus on? In other words, our decision to mitigate a certain risk should not only be focused on the probability of its occurence but also on the probability of success in solving it. But as I have written above I believe that the most pressing issue is to increase the confidence into making decisions under extreme uncertainty or to reduce the uncerainty itself.
View more: Next