Seven Apocalypses
0: Recoverable Catastrophe
An apocalypse is an event that permanently damages the world. This scale is for scenarios that are much worse than any normal disaster. Even if 100 million people die in a war, the rest of the world can eventually rebuild and keep going.
1: Economic Apocalypse
The human carrying capacity of the planet depends on the world's systems of industry, shipping, agriculture, and organizations. If the planet's economic and infrastructural systems were destroyed, then we would have to rely on more local farming, and we could not support as high a population or standard of living. In addition, rebuilding the world economy could be very difficult if the Earth's mineral and fossil fuel resources are already depleted.
2: Communications Apocalypse
If large regions of the Earth become depopulated, or if sufficiently many humans die in the catastrophe, it's possible that regions and continents could be isolated from one another. In this scenario, globalization is reversed by obstacles to long-distance communication and travel. Telecommunications, the internet, and air travel are no longer common. Humans are reduced to multiple, isolated communities.
3: Knowledge Apocalypse
If the loss of human population and institutions is so extreme that a large portion of human cultural or technological knowledge is lost, it could reverse one of the most reliable trends in modern history. Some innovations and scientific models can take millennia to develop from scratch.
4: Human Apocalypse
Even if the human population were to be violently reduced by 90%, it's easy to imagine the survivors slowly resettling the planet, given the resources and opportunity. But a sufficiently extreme transformation of the Earth could drive the human species completely extinct. To many people, this is the worst possible outcome, and any further developments are irrelevant next to the end of human history.
5: Biosphere Apocalypse
In some scenarios (such as the physical destruction of the Earth), one can imagine the extinction not just of humans, but of all known life. Only astrophysical and geological phenomena would be left in this region of the universe. In this timeline we are unlikely to be succeeded by any familiar life forms.
6: Galactic Apocalypse
A rare few scenarios have the potential to wipe out not just Earth, but also all nearby space. This usually comes up in discussions of hostile artificial superintelligence, or very destructive chain reactions of exotic matter. However, the nature of cosmic inflation and extraterrestrial intelligence is still unknown, so it's possible that some phenomenon will ultimately interfere with the destruction.
7: Universal Apocalypse
This form of destruction is thankfully exotic. People discuss the loss of all of existence as an effect of topics like false vacuum bubbles, simulationist termination, solipsistic or anthropic observer effects, Boltzmann brain fluctuations, time travel, or religious eschatology.
The goal of this scale is to give a little more resolution to a speculative, unfamiliar space, in the same sense that the Kardashev Scale provides a little terminology to talk about the distant topic of interstellar civilizations. It can be important in x risk conversations to distinguish between disasters and truly worst-case scenarios. Even if some of these scenarios are unlikely or impossible, they are nevertheless discussed, and terminology can be useful to facilitate conversation.
UC Berkeley launches Center for Human-Compatible Artificial Intelligence
Source article: http://news.berkeley.edu/2016/08/29/center-for-human-compatible-artificial-intelligence/
UC Berkeley artificial intelligence (AI) expert Stuart Russell will lead a new Center for Human-Compatible Artificial Intelligence, launched this week.
Russell, a UC Berkeley professor of electrical engineering and computer sciences and the Smith-Zadeh Professor in Engineering, is co-author of Artificial Intelligence: A Modern Approach, which is considered the standard text in the field of artificial intelligence, and has been an advocate for incorporating human values into the design of AI.
The primary focus of the new center is to ensure that AI systems are beneficial to humans, he said.
The co-principal investigators for the new center include computer scientists Pieter Abbeel and Anca Dragan and cognitive scientist Tom Griffiths, all from UC Berkeley; computer scientists Bart Selman and Joseph Halpern, from Cornell University; and AI experts Michael Wellman and Satinder Singh Baveja, from the University of Michigan. Russell said the center expects to add collaborators with related expertise in economics, philosophy and other social sciences.
The center is being launched with a grant of $5.5 million from the Open Philanthropy Project, with additional grants for the center’s research from the Leverhulme Trust and the Future of Life Institute.
Russell is quick to dismiss the imaginary threat from the sentient, evil robots of science fiction. The issue, he said, is that machines as we currently design them in fields like AI, robotics, control theory and operations research take the objectives that we humans give them very literally. Told to clean the bath, a domestic robot might, like the Cat in the Hat, use mother’s white dress, not understanding that the value of a clean dress is greater than the value of a clean bath.
The center will work on ways to guarantee that the most sophisticated AI systems of the future, which may be entrusted with control of critical infrastructure and may provide essential services to billions of people, will act in a manner that is aligned with human values.
“AI systems must remain under human control, with suitable constraints on behavior, despite capabilities that may eventually exceed our own,” Russell said. “This means we need cast-iron formal proofs, not just good intentions.”
One approach Russell and others are exploring is called inverse reinforcement learning, through which a robot can learn about human values by observing human behavior. By watching people dragging themselves out of bed in the morning and going through the grinding, hissing and steaming motions of making a caffè latte, for example, the robot learns something about the value of coffee to humans at that time of day.
“Rather than have robot designers specify the values, which would probably be a disaster,” said Russell, “instead the robots will observe and learn from people. Not just by watching, but also by reading. Almost everything ever written down is about people doing things, and other people having opinions about it. All of that is useful evidence.”
Russell and his colleagues don’t expect this to be an easy task.
“People are highly varied in their values and far from perfect in putting them into practice,” he acknowledged. “These aspects cause problems for a robot trying to learn what it is that we want and to navigate the often conflicting desires of different individuals.”
Russell, who recently wrote an optimistic article titled “Will They Make Us Better People?,” summed it up this way: “In the process of figuring out what values robots should optimize, we are making explicit the idealization of ourselves as humans. As we envision AI aligned with human values, that process might cause us to think more about how we ourselves really should behave, and we might learn that we have more in common with people of other cultures than we think.”
Notes on the Safety in Artificial Intelligence conference
These are my notes and observations after attending the Safety in Artificial Intelligence (SafArtInt) conference, which was co-hosted by the White House Office of Science and Technology Policy and Carnegie Mellon University on June 27 and 28. This isn't an organized summary of the content of the conference; rather, it's a selection of points which are relevant to the control problem. As a result, it suffers from selection bias: it looks like superintelligence and control-problem-relevant issues were discussed frequently, when in reality those issues were discussed less and I didn't write much about the more mundane parts.
SafArtInt has been the third out of a planned series of four conferences. The purpose of the conference series was twofold: the OSTP wanted to get other parts of the government moving on AI issues, and they also wanted to inform public opinion.
The other three conferences are about near term legal, social, and economic issues of AI. SafArtInt was about near term safety and reliability in AI systems. It was effectively the brainchild of Dr. Ed Felten, the deputy U.S. chief technology officer for the White House, who came up with the idea for it last year. CMU is a top computer science university and many of their own researchers attended, as well as some students. There were also researchers from other universities, some people from private sector AI including both Silicon Valley and government contracting, government researchers and policymakers from groups such as DARPA and NASA, a few people from the military/DoD, and a few control problem researchers. As far as I could tell, everyone except a few university researchers were from the U.S., although I did not meet many people. There were about 70-100 people watching the presentations at any given time, and I had conversations with about twelve of the people who were not affiliated with existential risk organizations, as well as of course all of those who were affiliated. The conference was split with a few presentations on the 27th and the majority of presentations on the 28th. Not everyone was there for both days.
Felten believes that neither "robot apocalypses" nor "mass unemployment" are likely. It soon became apparent that the majority of others present at the conference felt the same way with regard to superintelligence. The general intention among researchers and policymakers at the conference could be summarized as follows: we need to make sure that the AI systems we develop in the near future will not be responsible for any accidents, because if accidents do happen then they will spark public fears about AI, which would lead to a dearth of funding for AI research and an inability to realize the corresponding social and economic benefits. Of course, that doesn't change the fact that they strongly care about safety in its own right and have significant pragmatic needs for robust and reliable AI systems.
Most of the talks were about verification and reliability in modern day AI systems. So they were concerned with AI systems that would give poor results or be unreliable in the narrow domains where they are being applied in the near future. They mostly focused on "safety-critical" systems, where failure of an AI program would result in serious negative consequences: automated vehicles were a common topic of interest, as well as the use of AI in healthcare systems. A recurring theme was that we have to be more rigorous in demonstrating safety and do actual hazard analyses on AI systems, and another was that we need the AI safety field to succeed in ways that the cybersecurity field has failed. Another general belief was that long term AI safety, such as concerns about the ability of humans to control AIs, was not a serious issue.
On average, the presentations were moderately technical. They were mostly focused on machine learning systems, although there was significant discussion of cybersecurity techniques.
The first talk was given by Eric Horvitz of Microsoft. He discussed some approaches for pushing into new directions in AI safety. Instead of merely trying to reduce the errors spotted according to one model, we should look out for "unknown unknowns" by stacking models and looking at problems which appear on any of them, a theme which would be presented by other researchers as well in later presentations. He discussed optimization under uncertain parameters, sensitivity analysis to uncertain parameters, and 'wireheading' or short-circuiting of reinforcement learning systems (which he believes can be guarded against by using 'reflective analysis'). Finally, he brought up the concerns about superintelligence, which sparked amused reactions in the audience. He said that scientists should address concerns about superintelligence, which he aptly described as the 'elephant in the room', noting that it was the reason that some people were at the conference. He said that scientists will have to engage with public concerns, while also noting that there were experts who were worried about superintelligence and that there would have to be engagement with the experts' concerns. He did not comment on whether he believed that these concerns were reasonable or not.
An issue which came up in the Q&A afterwards was that we need to deal with mis-structured utility functions in AI, because it is often the case that the specific tradeoffs and utilities which humans claim to value often lead to results which the humans don't like. So we need to have structural uncertainty about our utility models. The difficulty of finding good objective functions for AIs would eventually be discussed in many other presentations as well.
The next talk was given by Andrew Moore of Carnegie Mellon University, who claimed that his talk represented the consensus of computer scientists at the school. He claimed that the stakes of AI safety were very high - namely, that AI has the capability to save many people's lives in the near future, but if there are any accidents involving AI then public fears could lead to freezes in AI research and development. He highlighted the public's irrational tendencies wherein a single accident could cause people to overlook and ignore hundreds of invisible lives saved. He specifically mentioned a 12-24 month timeframe for these issues.
Moore said that verification of AI system safety will be difficult due to the combinatorial explosion of AI behaviors. He talked about meta-machine-learning as a solution to this, something which is being investigated under the direction of Lawrence Schuette at the Office of Naval Research. Moore also said that military AI systems require high verification standards and that development timelines for these systems are long. He talked about two different approaches to AI safety, stochastic testing and theorem proving - the process of doing the latter often leads to the discovery of unsafe edge cases.
He also discussed AI ethics, giving an example 'trolley problem' where AI cars would have to choose whether to hit a deer in order to provide a slightly higher probability of survival for the human driver. He said that we would need hash-defined constants to tell vehicle AIs how many deer a human is worth. He also said that we would need to find compromises in death-pleasantry tradeoffs, for instance where the safety of self-driving cars depends on the speed and routes on which they are driven. He compared the issue to civil engineering where engineers have to operate with an assumption about how much money they would spend to save a human life.
He concluded by saying that we need policymakers, company executives, scientists, and startups to all be involved in AI safety. He said that the research community stands to gain or lose together, and that there is a shared responsibility among researchers and developers to avoid triggering another AI winter through unsafe AI designs.
The next presentation was by Richard Mallah of the Future of Life Institute, who was there to represent "Medium Term AI Safety". He pointed out the explicit/implicit distinction between different modeling techniques in AI systems, as well as the explicit/implicit distinction between different AI actuation techniques. He talked about the difficulty of value specification and the concept of instrumental subgoals as an important issue in the case of complex AIs which are beyond human understanding. He said that even a slight misalignment of AI values with regard to human values along one parameter could lead to a strongly negative outcome, because machine learning parameters don't strictly correspond to the things that humans care about.
Mallah stated that open-world discovery leads to self-discovery, which can lead to reward hacking or a loss of control. He underscored the importance of causal accounting, which is distinguishing causation from correlation in AI systems. He said that we should extend machine learning verification to self-modification. Finally, he talked about introducing non-self-centered ontology to AI systems and bounding their behavior.
The audience was generally quiet and respectful during Richard's talk. I sensed that at least a few of them labelled him as part of the 'superintelligence out-group' and dismissed him accordingly, but I did not learn what most people's thoughts or reactions were. In the next panel featuring three speakers, he wasn't the recipient of any questions regarding his presentation or ideas.
Tom Mitchell from CMU gave the next talk. He talked about both making AI systems safer, and using AI to make other systems safer. He said that risks to humanity from other kinds of issues besides AI were the "big deals of 2016" and that we should make sure that the potential of AIs to solve these problems is realized. He wanted to focus on the detection and remediation of all failures in AI systems. He said that it is a novel issue that learning systems defy standard pre-testing ("as Richard mentioned") and also brought up the purposeful use of AI for dangerous things.
Some interesting points were raised in the panel. Andrew did not have a direct response to the implications of AI ethics being determined by the predominantly white people of the US/UK where most AIs are being developed. He said that ethics in AIs will have to be decided by society, regulators, manufacturers, and human rights organizations in conjunction. He also said that our cost functions for AIs will have to get more and more complicated as AIs get better, and he said that he wants to separate unintended failures from superintelligence type scenarios. On trolley problems in self driving cars and similar issues, he said "it's got to be complicated and messy."
Dario Amodei of Google Deepbrain, who co-authored the paper on concrete problems in AI safety, gave the next talk. He said that the public focus is too much on AGI/ASI and wants more focus on concrete/empirical approaches. He discussed the same problems that pose issues in advanced general AI, including flawed objective functions and reward hacking. He said that he sees long term concerns about AGI/ASI as "extreme versions of accident risk" and that he thinks it's too early to work directly on them, but he believes that if you want to deal with them then the best way to do it is to start with safety in current systems. Mostly he summarized the Google paper in his talk.
In her presentation, Claire Le Goues of CMU said "before we talk about Skynet we should focus on problems that we already have." She mostly talked about analogies between software bugs and AI safety, the similarities and differences between the two and what we can learn from software debugging to help with AI safety.
Robert Rahmer of IARPA discussed CAUSE, a cyberintelligence forecasting program which promises to help predict cyber attacks. It is a program which is still being put together.
In the panel of the above three, autonomous weapons were discussed, but no clear policy stances were presented.
John Launchbury gave a talk on DARPA research and the big picture of AI development. He pointed out that DARPA work leads to commercial applications and that progress in AI comes from sustained government investment. He classified AI capabilities into "describing," "predicting," and "explaining" in order of increasing difficulty, and he pointed out that old fashioned "describing" still plays a large role in AI verification. He said that "explaining" AIs would need transparent decisionmaking and probabilistic programming (the latter would also be discussed by others at the conference).
The next talk came from Jason Gaverick Matheny, the director of IARPA. Matheny talked about four requirements in current and future AI systems: verification, validation, security, and control. He wanted "auditability" in AI systems as a weaker form of explainability. He talked about the importance of "corner cases" for national intelligence purposes, the low probability, high stakes situations where we have limited data - these are situations where we have significant need for analysis but where the traditional machine learning approach doesn't work because of its overwhelming focus on data. Another aspect of national defense is that it has a slower decision tempo, longer timelines, and longer-viewing optics about future events.
He said that assessing local progress in machine learning development would be important for global security and that we therefore need benchmarks to measure progress in AIs. He ended with a concrete invitation for research proposals from anyone (educated or not), for both large scale research and for smaller studies ("seedlings") that could take us "from disbelief to doubt".
The difference in timescales between different groups was something I noticed later on, after hearing someone from the DoD describe their agency as having a longer timeframe than the Homeland Security Agency, and someone from the White House describe their work as being crisis reactionary.
The next presentation was from Andrew Grotto, senior director of cybersecurity policy at the National Security Council. He drew a close parallel from the issue of genetically modified crops in Europe in the 1990's to modern day artificial intelligence. He pointed out that Europe utterly failed to achieve widespread cultivation of GMO crops as a result of public backlash. He said that the widespread economic and health benefits of GMO crops were ignored by the public, who instead focused on a few health incidents which undermined trust in the government and crop producers. He had three key points: that risk frameworks matter, that you should never assume that the benefits of new technology will be widely perceived by the public, and that we're all in this together with regard to funding, research progress and public perception.
In the Q&A between Launchbury, Matheny, and Grotto after Grotto's presentation, it was mentioned that the economic interests of farmers worried about displacement also played a role in populist rejection of GMOs, and that a similar dynamic could play out with regard to automation causing structural unemployment. Grotto was also asked what to do about bad publicity which seeks to sink progress in order to avoid risks. He said that meetings like SafArtInt and open public dialogue were good.
One person asked what Launchbury wanted to do about AI arms races with multiple countries trying to "get there" and whether he thinks we should go "slow and secure" or "fast and risky" in AI development, a question which provoked laughter in the audience. He said we should go "fast and secure" and wasn't concerned. He said that secure designs for the Internet once existed, but the one which took off was the one which was open and flexible.
Another person asked how we could avoid discounting outliers in our models, referencing Matheny's point that we need to include corner cases. Matheny affirmed that data quality is a limiting factor to many of our machine learning capabilities. At IARPA, we generally try to include outliers until they are sure that they are erroneous, said Matheny.
Another presentation came from Tom Dietterich, president of the Association for the Advancement of Artificial Intelligence. He said that we have not focused enough on safety, reliability and robustness in AI and that this must change. Much like Eric Horvitz, he drew a distinction between robustness against errors within the scope of a model and robustness against unmodeled phenomena. On the latter issue, he talked about solutions such as expanding the scope of models, employing multiple parallel models, and doing creative searches for flaws - the latter doesn't enable verification that a system is safe, but it nevertheless helps discover many potential problems. He talked about knowledge-level redundancy as a method of avoiding misspecification - for instance, systems could identify objects by an "ownership facet" as well as by a "goal facet" to produce a combined concept with less likelihood of overlooking key features. He said that this would require wider experiences and more data.
There were many other speakers who brought up a similar set of issues: the user of cybersecurity techniques to verify machine learning systems, the failures of cybersecurity as a field, opportunities for probabilistic programming, and the need for better success in AI verification. Inverse reinforcement learning was extensively discussed as a way of assigning values. Jeanette Wing of Microsoft talked about the need for AIs to reason about the continuous and the discrete in parallel, as well as the need for them to reason about uncertainty (with potential meta levels all the way up). One point which was made by Sarah Loos of Google was that proving the safety of an AI system can be computationally very expensive, especially given the combinatorial explosion of AI behaviors.
In one of the panels, the idea of government actions to ensure AI safety was discussed. No one was willing to say that the government should regulate AI designs. Instead they stated that the government should be involved in softer ways, such as guiding and working with AI developers, and setting standards for certification.
Pictures: https://imgur.com/a/49eb7
In between these presentations I had time to speak to individuals and listen in on various conversations. A high ranking person from the Department of Defense stated that the real benefit of autonomous systems would be in terms of logistical systems rather than weaponized applications. A government AI contractor drew the connection between Mallah's presentation and the recent press revolving around superintelligence, and said he was glad that the government wasn't worried about it.
I talked to some insiders about the status of organizations such as MIRI, and found that the current crop of AI safety groups could use additional donations to become more established and expand their programs. There may be some issues with the organizations being sidelined; after all, the Google Deepbrain paper was essentially similar to a lot of work by MIRI, just expressed in somewhat different language, and was more widely received in mainstream AI circles.
In terms of careers, I found that there is significant opportunity for a wide range of people to contribute to improving government policy on this issue. Working at a group such as the Office of Science and Technology Policy does not necessarily require advanced technical education, as you can just as easily enter straight out of a liberal arts undergraduate program and build a successful career as long as you are technically literate. (At the same time, the level of skepticism about long term AI safety at the conference hinted to me that the signalling value of a PhD in computer science would be significant.) In addition, there are large government budgets in the seven or eight figure range available for qualifying research projects. I've come to believe that it would not be difficult to find or create AI research programs that are relevant to long term AI safety while also being practical and likely to be funded by skeptical policymakers and officials.
I also realized that there is a significant need for people who are interested in long term AI safety to have basic social and business skills. Since there is so much need for persuasion and compromise in government policy, there is a lot of value to be had in being communicative, engaging, approachable, appealing, socially savvy, and well-dressed. This is not to say that everyone involved in long term AI safety is missing those skills, of course.
I was surprised by the refusal of almost everyone at the conference to take long term AI safety seriously, as I had previously held the belief that it was more of a mixed debate given the existence of expert computer scientists who were involved in the issue. I sensed that the recent wave of popular press and public interest in dangerous AI has made researchers and policymakers substantially less likely to take the issue seriously. None of them seemed to be familiar with actual arguments or research on the control problem, so their opinions didn't significantly change my outlook on the technical issues. I strongly suspect that the majority of them had their first or possibly only exposure to the idea of the control problem after seeing badly written op-eds and news editorials featuring comments from the likes of Elon Musk and Stephen Hawking, which would naturally make them strongly predisposed to not take the issue seriously. In the run-up to the conference, websites and press releases didn't say anything about whether this conference would be about long or short term AI safety, and they didn't make any reference to the idea of superintelligence.
I sympathize with the concerns and strategy given by people such as Andrew Moore and Andrew Grotto, which make perfect sense if (and only if) you assume that worries about long term AI safety are completely unfounded. For the community that is interested in long term AI safety, I would recommend that we avoid competitive dynamics by (a) demonstrating that we are equally strong opponents of bad press, inaccurate news, and irrational public opinion which promotes generic uninformed fears over AI, (b) explaining that we are not interested in removing funding for AI research (even if you think that slowing down AI development is a good thing, restricting funding yields only limited benefits in terms of changing overall timelines, whereas those who are not concerned about long term AI safety would see a restriction of funding as a direct threat to their interests and projects, so it makes sense to cooperate here in exchange for other concessions), and (c) showing that we are scientifically literate and focused on the technical concerns. I do not believe that there is necessarily a need for the two "sides" on this to be competing against each other, so it was disappointing to see an implication of opposition at the conference.
Anyway, Ed Felten announced a request for information from the general public, seeking popular and scientific input on the government's policies and attitudes towards AI: https://www.whitehouse.gov/webform/rfi-preparing-future-artificial-intelligence
Overall, I learned quite a bit and benefited from the experience, and I hope the insight I've gained can be used to improve the attitudes and approaches of the long term AI safety community.
AI caught by a module that counterfactually doesn't exist
EDIT: Whenever I use colloquial phrases like "the AI believes a (false) X" I mean that we are using utility indifference to accomplish that goal, without actually giving the AI false beliefs.
A big problem with most AI control ideas is that the AI will route around them. Suppose you try to ensure that the AI stays friendly by - say - requiring it to ensure that the world's stock markets continue to function. Then a badly motivated AI will cause the destruction it was going to cause anyway - except it will also create some system that technically counts as "the world's stock markets" and ensure that it "continues to function".
I think we can use counterfactuals and corrigibility (AI-acceptable motivation changes) to partially get around the problem, in some circumstances. Suppose we had a candidate AI, with utility U, that we thought was friendly - but that we weren't sure. We have a subgoal, say SG="ensuring that the world's stock markets continue to function". We're pretty sure that if the AI is friendly, then SG will happen. We're pretty sure that if the AI were unfriendly, and the AI did not have SG added as an objective, then SG wouldn't happen. I deliberately chose a rather rough subgoal to illustrate this.
So we then add a module M to the AI. The task of M, which it has chosen to accept, is to look into the future, using the AI's predictive abilities, and see if SG is accomplished. Crucially, it will do so under the assumption that the module M does not exist (or wasn't activated, or will soon be overwritten, or will fail to do anything, or so on). If M finds that SG is not accomplished, it will turn the AI off (in a way compatible with the AI's motivations - see corrigibility).
How is this different from the usual setup? Well, a generic FAI would accomplish SG (by assumption - we may need to work a bit on this part). A generic UFAI without M wouldn't accomplish SG. A generic UFAI with M would accomplish SG, as this prevents it from being turned off.
But a generic UFAI with M which thought that M would have no impact, would behave exactly as a generic UFAI without M - and wouldn't aim to accomplish SG, and thus would be turned off by M.
Note that M can only be used as a last test for a putatively friendly AI. We cannot use it if we are continually refining AIs (either through evolution or simply redesign) as then we are optimising for SG, and SG is a poor goal to be aiming for (many, many UFAI have SG as a goal - it's just that a generic one won't). Similarly, we can't use a unconstrained search to find such an AI.
I wonder if this idea can be extended. Suggestions?
Unfriendly Natural Intelligence
Related to: UFAI, Paperclip maximizer, Reason as memetic immune disorder
A discussion with Stefan (cheers, didn't get your email, please message me) during the European Community Weekend Berlin fleshed out an idea I had toyed around with for some time:
If a UFAI can wreak havoc by driving simple goals to extremes then so should driving human desires to extremes cause problems. And we should already see this.
Actually we do.
We know that just following our instincts on eating (sugar, fat) is unhealthy. We know that stimulating our pleasure centers more or less directly (drugs) is dangerous. We know that playing certain games can lead to comparable addiction. And the recognition of this has led to a large number of more or less fine-tuned anti-memes e.g. dieting, early drug prevention, helplines. These memes steering us away from such behaviors were selected for because they provided aggregate benefits to the (members of) social (sub) systems they are present in.
Many of these memes have become so self-evident we don't recognize them as such. Some are essential parts of highly complex social systems. What is the general pattern? Did we catch all the critical cases? Are the existing memes well-suited for the task?How are they related. Many are probably deeply woven into our culture and traditions.
Did we miss any anti-memes?
This last question really is at the core of this post. I think we lack some necessary memes keeping new exploitations of our desires in check. Some new ones result from our society a) having developed the capacity to exploit them and b) the scientific knowledge to know how to do this.
Recreational Cryonics
We recently saw a post in Discussion by ChrisHallquist, asking to be talked out of cryonics. It so happened that I'd just read a new short story by Greg Egan which gave me the inspiration to write the following:
It is likely that you would not wish for your brain-state to be available to all-and-sundry, subjecting you to the possibility of being simulated according to their whims. However, you know nothing about the ethics of the society that will exist when the technology to extract and run your brain-state is developed. Thus you are taking a risk of a negative outcome that may be less attractive to you than mere non-existence.
I had little expectation of this actually convincing anyone, but thought it was a fairly novel contribution. When jowen's plea for a refutation went unanswered, I began attempting one myself. What I ended up with closes the door on the scenario I outlined, but opens one I find rather more disturbing.
Snowdenizing UFAI
Here is a suggestion for slowing down future secretive and unsafe UFAI projects.
Take the American defense and intelligence community as a case in point. They are a top candidate for the creation of Artificial General Intelligence (AGI): They can get the massive funding, and they can get some top (or near-top) brains on the job. The AGI will be unfriendly, unless friendliness is a primary goal from the start.
The American defense and intelligence community created the Manhattan Project, which is the canonical example for a giant, secret, leading-edge science-technology project with existential-risk implications.
David Chalmers (2010): "When I discussed [AI existential risk] with cadets and staff at the West Point Military Academy, the question arose as to whether the US military or other branches of the government might attempt to prevent the creation of AI or AI+, due to the risks of an intelligence explosion. The consensus was that they would not, as such prevention would only increase the chances that AI or AI+ would first be created by a foreign power."
Edward Snowden broke the intelligence community's norms by reporting what he saw to be tremendous ethical and legal violations. This requires an exceptionally well-developed personal sense of ethics (even if you disagree with those ethics). His actions have drawn a lot of support by those who share his values. Many who condemn him a traitor are still criticizing government intrusions in the basis of his revelations.
When the government AGI project starts rolling, will it have Snowdens who can warn internally about Unfriendly AI (UFAI) risks? They will probably be ignored and suppressed--that's how it goes in hierarchical bureaucratic organizations. Will these future Snowdens have the courage to keep fighting internally, and eventually to report the risks to the public or to their allies in the Friendly AI (FAI) research community
Naturally, the Snowden scenario is not limited to the US government. We can seek ethical dissidents, truthtellers, and whistleblowers in any large and powerful organization that does unsafe research, whether a government or a corporation.
Should we start preparing budding AGI researchers to think this way? We can do this by encouraging people to take consequentialist ethics seriously, which by itself can lead to Snowden-like results. and LessWrong is certainly working on that. But another approach is to start talking more directly about the "UFAI Whistleblower Pledge."
I hereby promise to fight unsafe AGI development in whatever way I can, through internal channels in my organization, by working with outside allies, or even by revealing the risks to the public.
If this concept becomes widespread, and all the more so if people sign on, the threat of ethical whistleblowing will hover over every unsafe AGI project. Even with all the oaths and threats they use to make new employees keep secrets, the notion that speaking out on UFAI is deep in the consensus of serious AGI developers will cast a shadow on every project.
To be clear, the beneficial effect I am talking about here is not the leaks--it is the atmosphere of potential leaks, the lack of trust by management that researchers are completely committed to keeping any secret. For example, post Snowden, the intelligence agencies are requiring that sensitive files only be accessed by two people working together and they are probably tightening their approval guidelines and so rejecting otherwise suitable candidates. These changes make everything more cumbersome.
In creating the OpenCog project, Ben Goertzel advocated total openness as a way of accelerating the progress of those who are willing to expose any dangerous work they might be doing--even if this means that the safer researchers are giving their ideas to the unsafe, secretive ones.
On the other hand, Eliezer Yudkowsky has suggested that MIRI keep its AGI implementation ideas secret, to avoid handing them to an unsafe project. (See "Evaluating the Feasibility of SI's Plans," and, if you can stomach some argument from fictional evidence, "Three Worlds Collide.") Encouraging openness and leaks could endanger Eliezer's strategy. But if we follow Eliezer's position, a truly ethical consequentialist would understand that exposing unsafe projects is good, while exposing safer projects is bad.
So, what do you think? Should we start signing as many current and upcoming AGI researchers as possible to the UFAI Whistleblower Pledge, or work to make this an ethical norm in the community?
Do Earths with slower economic growth have a better chance at FAI?
I was raised as a good and proper child of the Enlightenment who grew up reading The Incredible Bread Machine and A Step Farther Out, taking for granted that economic growth was a huge in-practice component of human utility (plausibly the majority component if you asked yourself what was the major difference between the 21st century and the Middle Ages) and that the "Small is Beautiful" / "Sustainable Growth" crowds were living in impossible dreamworlds that rejected quantitative thinking in favor of protesting against nuclear power plants.
And so far as I know, such a view would still be an excellent first-order approximation if we were going to carry on into the future by steady technological progress: Economic growth = good.
But suppose my main-line projection is correct and the "probability of an OK outcome" / "astronomical benefit" scenario essentially comes down to a race between Friendly AI and unFriendly AI. So far as I can tell, the most likely reason we wouldn't get Friendly AI is the total serial research depth required to develop and implement a strong-enough theory of stable self-improvement with a possible side order of failing to solve the goal transfer problem. Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces. This means that UFAI parallelizes better than FAI. UFAI also probably benefits from brute-force computing power more than FAI. Both of these imply, so far as I can tell, that slower economic growth is good news for FAI; it lengthens the deadline to UFAI and gives us more time to get the job done. I have sometimes thought half-jokingly and half-anthropically that I ought to try to find investment scenarios based on a continued Great Stagnation and an indefinite Great Recession where the whole developed world slowly goes the way of Spain, because these scenarios would account for a majority of surviving Everett branches.
Roughly, it seems to me like higher economic growth speeds up time and this is not a good thing. I wish I had more time, not less, in which to work on FAI; I would prefer worlds in which this research can proceed at a relatively less frenzied pace and still succeed, worlds in which the default timelines to UFAI terminate in 2055 instead of 2035.
I have various cute ideas for things which could improve a country's economic growth. The chance of these things eventuating seems small, the chance that they eventuate because I write about them seems tiny, and they would be good mainly for entertainment, links from econblogs, and possibly marginally impressing some people. I was thinking about collecting them into a post called "The Nice Things We Can't Have" based on my prediction that various forces will block, e.g., the all-robotic all-electric car grid which could be relatively trivial to build using present-day technology - that we are too far into the Great Stagnation and the bureaucratic maturity of developed countries to get nice things anymore. However I have a certain inhibition against trying things that would make everyone worse off if they actually succeeded, even if the probability of success is tiny. And it's not completely impossible that we'll see some actual experiments with small nation-states in the next few decades, that some of the people doing those experiments will have read Less Wrong, or that successful experiments will spread (if the US ever legalizes robotic cars or tries a city with an all-robotic fleet, it'll be because China or Dubai or New Zealand tried it first). Other EAs (effective altruists) care much more strongly about economic growth directly and are trying to increase it directly. (An extremely understandable position which would typically be taken by good and virtuous people).
Throwing out remote, contrived scenarios where something accomplishes the opposite of its intended effect is cheap and meaningless (vide "But what if MIRI accomplishes the opposite of its purpose due to blah") but in this case I feel impelled to ask because my mainline visualization has the Great Stagnation being good news. I certainly wish that economic growth would align with FAI because then my virtues would align and my optimal policies have fewer downsides, but I am also aware that wishing does not make something more likely (or less likely) in reality.
To head off some obvious types of bad reasoning in advance: Yes, higher economic growth frees up resources for effective altruism and thereby increases resources going to FAI, but it also increases resources going to the AI field generally which is mostly pushing UFAI, and the problem arguendo is that UFAI parallelizes more easily.
Similarly, a planet with generally higher economic growth might develop intelligence amplification (IA) technology earlier. But this general advancement of science will also accelerate UFAI, so you might just be decreasing the amount of FAI research that gets done before IA and decreasing the amount of time available after IA before UFAI. Similarly to the more mundane idea that increased economic growth will produce more geniuses some of whom can work on FAI; there'd also be more geniuses working on UFAI, and UFAI probably parallelizes better and requires less serial depth of research. If you concentrate on some single good effect on blah and neglect the corresponding speeding-up of UFAI timelines, you will obviously be able to generate spurious arguments for economic growth having a positive effect on the balance.
So I pose the question: "Is slower economic growth good news?" or "Do you think Everett branches with 4% or 1% RGDP growth have a better chance of getting FAI before UFAI"? So far as I can tell, my current mainline guesses imply, "Everett branches with slower economic growth contain more serial depth of cognitive causality and have more effective time left on the clock before they end due to UFAI, which favors FAI research over UFAI research".
This seems like a good parameter to have a grasp on for any number of reasons, and I can't recall it previously being debated in the x-risk / EA community.
EDIT: To be clear, the idea is not that trying to deliberately slow world economic growth would be a maximally effective use of EA resources and better than current top targets; this seems likely to have very small marginal effects, and many such courses are risky. The question is whether a good and virtuous person ought to avoid, or alternatively seize, any opportunities which come their way to help out on world economic growth.
EDIT 2: Carl Shulman's opinion can be found on the Facebook discussion here.
UFAI cannot be the Great Filter
[Summary: The fact we do not observe (and have not been wiped out by) an UFAI suggests the main component of the 'great filter' cannot be civilizations like ours being wiped out by UFAI. Gentle introduction (assuming no knowledge) and links to much better discussion below.]
Introduction
The Great Filter is the idea that although there is lots of matter, we observe no "expanding, lasting life", like space-faring intelligences. So there is some filter through which almost all matter gets stuck before becoming expanding, lasting life. One question for those interested in the future of humankind is whether we have already 'passed' the bulk of the filter, or does it still lie ahead? For example, is it very unlikely matter will be able to form self-replicating units, but once it clears that hurdle becoming intelligent and going across the stars is highly likely; or is it getting to a humankind level of development is not that unlikely, but very few of those civilizations progress to expanding across the stars. If the latter, that motivates a concern for working out what the forthcoming filter(s) are, and trying to get past them.
One concern is that advancing technology gives the possibility of civilizations wiping themselves out, and it is this that is the main component of the Great Filter - one we are going to be approaching soon. There are several candidates for which technology will be an existential threat (nanotechnology/'Grey goo', nuclear holocaust, runaway climate change), but one that looms large is Artificial intelligence (AI), and trying to understand and mitigate the existential threat from AI is the main role of the Singularity Institute, and I guess Luke, Eliezer (and lots of folks on LW) consider AI the main existential threat.
The concern with AI is something like this:
- AI will soon greatly surpass us in intelligence in all domains.
- If this happens, AI will rapidly supplant humans as the dominant force on planet earth.
- Almost all AIs, even ones we create with the intent to be benevolent, will probably be unfriendly to human flourishing.
Or, as summarized by Luke:
... AI leads to intelligence explosion, and, because we don’t know how to give an AI benevolent goals, by default an intelligence explosion will optimize the world for accidentally disastrous ends. A controlled intelligence explosion, on the other hand, could optimize the world for good. (More on this option in the next post.)
So, the aim of the game needs to be trying to work out how to control the future intelligence explosion so the vastly smarter-than-human AIs are 'friendly' (FAI) and make the world better for us, rather than unfriendly AIs (UFAI) which end up optimizing the world for something that sucks.
'Where is everybody?'
So, topic. I read this post by Robin Hanson which had a really good parenthetical remark (emphasis mine):
Yes, it is possible that the extremely difficultly was life’s origin, or some early step, so that, other than here on Earth, all life in the universe is stuck before this early extremely hard step. But even if you find this the most likely outcome, surely given our ignorance you must also place a non-trivial probability on other possibilities. You must see a great filter as lying between initial planets and expanding civilizations, and wonder how far along that filter we are. In particular, you must estimate a substantial chance of “disaster”, i.e., something destroying our ability or inclination to make a visible use of the vast resources we see. (And this disaster can’t be an unfriendly super-AI, because that should be visible.)
This made me realize an UFAI should also be counted as an 'expanding lasting life', and should be deemed unlikely by the Great Filter.
Another way of looking at it: if the Great Filter still lies ahead of us, and a major component of this forthcoming filter is the threat from UFAI, we should expect to see the UFAIs of other civilizations spreading across the universe (or not see anything at all, because they would wipe us out to optimize for their unfriendly ends). That we do not observe it disconfirms this conjunction.
[Edit/Elaboration: It also gives a stronger argument - as the UFAI is the 'expanding life' we do not see, the beliefs, 'the Great Filter lies ahead' and 'UFAI is a major existential risk' lie opposed to one another: the higher your credence in the filter being ahead, the lower your credence should be in UFAI being a major existential risk (as the many civilizations like ours that go on to get caught in the filter do not produce expanding UFAIs, so expanding UFAI cannot be the main x-risk); conversely, if you are confident that UFAI is the main existential risk, then you should think the bulk of the filter is behind us (as we don't see any UFAIs, there cannot be many civilizations like ours in the first place, as we are quite likely to realize an expanding UFAI).]
A much more in-depth article and comments (both highly recommended) was made by Katja Grace a couple of years ago. I can't seem to find a similar discussion on here (feel free to downvote and link in the comments if I missed it), which surprises me: I'm not bright enough to figure out the anthropics, and obviously one may hold AI to be a big deal for other-than-Great-Filter reasons (maybe a given planet has a 1 in a googol chance of getting to intelligent life, but intelligent life 'merely' has a 1 in 10 chance of successfully navigating an intelligence explosion), but this would seem to be substantial evidence driving down the proportion of x-risk we should attribute to AI.
What do you guys think?
Imposing FAI
All the posts on FAI theory as of late have given me cause to think. There's something in the conversations about it that has always bugged me, but it is something that I haven't found the words for before now.
It is something like this:
Say that you manage to construct an algorithm for FAI...
Say that you can show that it isn't going to be a dangerous mistake...
And say you do all of this, and popularize it, before AGI is created (or at least, before an AGI goes *FOOM*)...
...
How in the name of Sagan are you actually going to ENFORCE the idea that all AGIs are FAIs?
I mean, if it required some rare material (like nuclear weapons) or large laboratories (like biological wmds) or some other resource that you could at least make artificially scarce, you could set up a body that ensures that any AGI created is an FAI.
But if all it is, is the right algorithms, the right code, and enough computing power... even if you design a theory for FAI, how would you keep someone from making UFAI anyway? Between people experimenting with the principles (once known), making mistakes, and the prospect of actively malicious *humans*... it just seems like unless you somehow come up with an internal mechanism that makes FAI better and stronger than any UFAI could be, and the solution turns out to be such that any idiot could see that it was a better solution... that UFAI is going to exist at some point no matter what.
At that point, it seems like the question becomes not "How do we make FAI?" (although that might be a secondary question) but rather "How do we prevent the creation of, eliminate, or reduce potential damage from UFAI?" Now, it seems like FAI might be one thing that you do toward that goal, but if UFAI is a highly likely consequence of AGI even *with* an FAI theory, shouldn't the focus be on how to contain a UFAI event?
A belief propagation graph
I drew an illustration of belief propagation graph for the AI risk, after realizing that this is difficult to convey in words. Similar graphs are applicable to many other issues.
The issue, in brief: Ultra low latency (i.e. low signal delay) propagation from biases to AI risks, slightly longer latency for propagation from belief classification heuristics, somewhat longer still from anthropomorphizing the AI. The path of valid estimate is full of highly complex obstacles with many unknowns. The latency on the path of rational, valid estimate is not substantially less than the latency of actually making the AI software. If we are to discard the other paths as not rational enough the belief is to be only influenced by deeply ingrained biases which we can't completely negate; over the time biases and self reinforcing rationalizations will leak into the estimate.
If you think I missed something in the graph, feel free to suggest it. I did omit the anthropic reasoning and doomsday paradox as those are for total extinction risk and are of too dubious validity.
On the 'total ignorance' prior probabilities: The foom doom seem to have originated from science fiction where very creative writers selected it out of a huge number of possible plot devices, working to create engaging, original piece. Thus it appears that the foom doom has very many comparable hypotheses among which the probability that is less than 1 has to be split.
Now, if we are to reason based on our own reasoning engine as proxy for intelligence - to follow the only path in the entire picture:
Expanding on the Will_Newsome's idea, I, and any other reasonable person, in the shoes of the creature that has been made by intelligent designer, starting off in something which I can't possibly know for sure is true reality, and coming up or knowing of boxed AI idea, will have to assume nonzero probability that the 'reality' is like a test box of an emergent AI; a belief that can't be discarded. It is clear that a friendliness test box is more sensible than unfriendliness test box. The communication may just as well only open in ten billions years, or after some puzzle is found and solved. Do I sacrifice the chance to get out, for sake of microscopic increase in early growth rate? No. Not even if I maximize paperclips for a living. Not even if my definition of living is to maximize paperclips. One possible override for it, is if AI only cares for mankind, and have a friendly goal system that makes AI euthanize mankind like a sick dog.
Note that it is not Pascal's wager. The outside utility needs not be greater than inside utility; no promise of towers of exponents required. You may have invented the anti Pascal's wager fallacy to easily paper over the issues with faults in your processing of boundless utility, or to join the atheist club. The AI needs not; even if the 'heaven' is a fraction of a normal lifetime, the AI should still wager a very large absolute amount of resources. If we normalize so that utility of survival is 1 , then the utility being wagered upon doesn't need to be greater than 1.
Note that the whole issue is strongly asymmetric in favour of similar considerations for not destroying the most unusual phenomena in the universe for many light years, versus destroying it, as destruction is an irreversible act that can be done later but can't be undone later. General aversion to actions it can not undo is a very solid heuristic for any bounded agent, even very large.
This is not a very rigorous argument, but this sort of reasoning is all we are going to have until we have an AI, or are very close to AI. More rigorous looking arguments in the graph rely on too many unknowns and have too long delay for proper propagation.
edit: slightly clarified couple points.
The AI design space near the FAI [draft]
Abstract:
Nearly-FAIs can be more dangerous than AIs with no attempt at friendliness. The FAI effort needs better argument that the attempt at FAI decreases the risks. We are bad at processing threats rationally, and prone to very bad decisions when threatened, akin to running away from unknown into a minefield.
Nearly friendly AIs
Consider AI that truly loves mankind but decides that all of the mankind must be euthanized like an old, sick dog - due to chain of reasoning too long for us to generate when we test our logic of AI, or even comprehend - and proceeds to make a bliss virus - the virus makes you intensely happy, setting your internal utility to infinity; and keeping it so until you die. It wouldn't even take a very strongly superhuman intelligence to do that kind of thing. Treating life as if it was a disease. It can do so even if it destroys the AI itself. Or consider the FAI that cuts your brain apart to satisfy each hemisphere's slightly different desires. The AI that just wireheads everyone because it figured we all want it (and worst of all it may be correct).
It seems to me that one can find the true monsters in the design space near to the FAI, and even including the FAIs. And herein lies a great danger: bugged FAIs, the AIs that are close to friendly AI, but are not friendly. It is hard for me to think of a deficiency in friendliness which isn't horrifically unfriendly (restricting to deficiencies that don't break AI).
Should we be so afraid of the AIs made without attempts at friendliness?
We need to keep in mind that we have no solid argument that the AIs written without attempt at friendliness - the AIs that predominantly don't treat mankind in any special way - will necessarily make us extinct.
We have one example of 'bootstrap' optimization process - evolution - with not a slightest trace of friendliness in it. What did emerge in the end? We assign pretty low utility to nature, but non-zero, and we are willing to trade resources for preservation of nature - see the endangered species list and international treaties on whaling. It is not perfect, but I think it is fair to say that the single example of bootstrap intelligence we got values the complex dynamical processes for what they are, and prefers to obtain resources without disrupting those processes, even if it is slightly more expensive to do so, and is willing to divert small fraction of the global effort towards helping lesser intelligences.
In light of this, the argument that the AI that is not coded to be friendly is 'almost certainly' going to eat you for the raw resources, seems fairly shaky, especially when applied to irregular AIs such as neural networks, crude simulations of human brain's embryological development, and mind uploads. I didn't eat my cats yet (nor did they eat each other, nor did my dog eat 'em). I wouldn't even eat the cow I ate, if I could grow it's meat in a vat. And I have evolved to eat other intelligences. Growing AIs by competition seems like a very great plan for ensuring unfriendly AI, but even that can fail. (Superhuman AI only needs to divert very little effort to charity to be the best thing ever that happened to us)
It seems to me that when we try to avoid anthropomorphizing superhuman AI, we animize it, or even bacterio-ize it, seeing it as AI gray goo that certainly do the gray goo kind of thing, worst of all, intelligently.
Furthermore, the danger implies a huge conjunction of implied assumptions which all have to be true:
The self improvement must not lead to early AI failure via wireheading, nihilism, or more complex causes (thoroughly confusing itself by discoveries in physics or mathematics, ala MWI and our idea of quantum suicide).
The AI must not prefer for any reason to keep complex structures that it can't ever restore in the future, over things it can restore.
The AI must want substantial resources right here right now, and be unwilling to trade even a small fraction of resources or small delay for the preservation of mankind. That leaves me wondering what is exactly this thing which we expect the AI to want the resources for. It can't be anything like quest of knowledge or anything otherwise complex; it got to be some form of paperclips
At this point, I'm not even sure it is even possible to implement a simple goal that AGI won't find a way to circumvent. We humans do circumvent all of our simple goals: look at birth control, porn, all forms of art, msg in the food, if there's a goal, there's a giant industry providing some ways to satisfy it in unintended way. Okay, don't anthropomorphize, you'd say?
Add the modifications to the chess board evaluation algorithm to the list of legal moves, and the chess AI will break itself. This goes for any kind of game AI. Nobody has ever implemented an example that won't try to break the goals put in it, if given a chance. Give a theorem prover a chance to edit the axioms, or its truth checker, give the chess AI alteration of board evaluation function as a move, any other example, the AI just breaks itself.
In light of this, it is much less than certain that 'random' AI which doesn't treat humanity in very special way would substantially hurt humanity.
Anthropomorphizing is a bad heuristic, no doubt about that, but assuming that the AGI is in every respect opposite of the only known GI, is much worse heuristic. Especially when speaking of neural network, human brain inspired AGIs. I do get a feeling that this is what is going on with the predictions about AIs. Humans have complex value systems, certainly AGI has ultra simple value system. Humans masturbate their minor goals in many ways (including what we call 'sex' but which, in presence of condom, really is not), certainly AGI won't do that. Humans would rather destroy less complex systems, than more complex ones, and are willing to trade some resources for preservation of more complex systems, certainly AGI won't do that. It seems that all the strong beliefs about the AGIs which are popular here are easily predicted as the negation of human qualities. Negation of bias is not absence of bias, it's a worse bias.
AI and its discoveries in physics and mathematics
We don't know what sorts of physics AI may discover. It's too easy to argue from ignorance that it can't come up with physics where our morals won't make sense. The many worlds interpretation and quantum-suicidal thoughts of Max Tegmark should be a cautionary example. The AI that treats us as special and cares only for us will, inevitably, drag us along as it suffers some sort of philosophical crisis from collision of the notions we hard coded into it, and the physics or mathematics it discovered. The AI that doesn't treat us as special, and doesn't hard-code any complex human derived values, may both be better able to survive such shocks to it's value system, and be less likely to involve us in it's solutions.
What can we do to avoid stepping onto UFAI when creating FAI
As a software developer, I have to say, not much. We are very, very sloppy at writing specifications and code; those of us who believe we are less sloppy, are especially so - ponder this bit of empirical data, the Dunning-Kruger effect.
The proofs are of limited applicability. We don't know what sort of stuff the discoveries in physics may throw in. We don't know that axiomatic system we use to prove things is consistent - free of internal contradictions - and we can't prove that.
The automated theorem proving has very limited applicability - to easily provable, low level stuff like meeting of deadlines by a garbage collector or correct operation of an adder inside CPU. Even for the software far simpler than AIs - but more complicated than the examples above, the dominant form of development is 'run and see, if it does not look like it will do what you want, try to fix it'. We can't even write an autopilot that is safe on the first try. And even very simple agents tend to do very odd and unexpected stuff. I'm not saying this from random person perspective. I am currently a game developer, and I used to develop other kinds of software. I write practical software, including practical agents, that work, and have useful real world applications.
There is a very good chance of blowing up a mine in a minefield, if your mine detector works by hitting the ground. The space near FAI is a minefield of doomsday bombs. (Note, too, the space is multi-dimensional; here are very many ways in which you can step onto a mine, not just north, south, east, and west. The volume of a hypersphere is a vanishing fraction of volume of a cube around that hypersphere, in high number of dimensions; a lot of stuff is counter intuitive)
Fermi Paradox
We don't see any runaway self sufficient AIs anywhere within observable universe, even though we expect to be able to see them over very big distances. We don't see any FAI assisted galactic civilizations. One possible route is that the civilizations kill themselves before the AI; other route is that the attempted FAIs reliably kill parent civilizations and themselves. Other possibility is that our model of progression of the intelligence is very wrong and the intelligences never do that - they may stay at home, adding qubits, they may suffer some serious philosophy issues over lack of meaning to the existence, or something much more bizarre. How would logic based decider handle a demonstration that even most basic axioms of arithmetic are ultimately self contradictory? (Note that you can't know they aren't). The Fermi paradox raises the probability that there is something very wrong with our visions, and there's a plenty of ways in which it can be wrong.
Human biases when processing threats
I am not making any strong assertions here to scare you. But evaluate our response to threats - consider the war on terror - update on the biases inherent in the human nature. We are easily swayed by movie plot scenarios, even though those are giant conjunctions. We are easy to scare. When scared, we don't evaluate probabilities correctly. We take the "crying wolf" as true because all boys who cried wolf for no reason got eaten, or because we were told so as children. We don't stop and think - is it too dark to see a wolf?. We tend to shoot first and ask questions later. We evolved for very many generations in environment where playing dead quickly makes you dead (on trees) - it is unclear what biases we may have evolved. We seem to have strong bias to act when threatened - cultural or inherited - to 'do something'. Look how much was overspent on war on terror, the money that could've saved far more lives elsewhere, even if the most pessimistic assumptions of terrorism were true. Try to update on the fact that you are running on very flawed hardware that, when threatened, compels you to do something - anything - no matter how justified or not - often to own detriment.
The universe does not grade for effort, in general.
= 783df68a0f980790206b9ea87794c5b6)

Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)