## [Link] Artificial intelligence and the stability of markets

1 15 November 2017 02:17AM

## [Link] The NN/tank Story Probably Never Happened

2 20 October 2017 01:41AM

## Could the Maxipok rule have catastrophic consequences? (I argue yes.)

6 25 August 2017 10:00AM

Here I argue that following the Maxipok rule could have truly catastrophic consequences.

Here I provide a comprehensive list of actual humans who expressed, often with great intensity, omnicidal urges. I also discuss the worrisome phenomenon of "latent agential risks."

And finally, here I argue that a superintelligence singleton constitutes the only mechanism that could neutralize the "threat of universal unilateralism" and the consequent breakdown of the social contract, resulting in a Hobbesian state of constant war among Earthians.

I would genuinely welcome feedback on any of these papers! The first one seems especially relevant to the good denizens of this website. :-)

## Lloyd's of London and less-than-catastrophic risk

2 14 June 2017 02:49AM

I recently found that Lloyd's has a number of interesting resources on risk. One is the City Risk Index, the methodology for which comes from Cambridge's Judge Business School.

The key metric is something they call GDP@Risk. Despite the name, it is not simply an application of Value@Risk to GDP. Instead, it is simply the sum of the expected damage from a given threat (or from a set of threats) during a given time period. In this case, the time period is 2015-2015. The threats considered include manmade ones (e.g., cyber attack, oil price shock) and natural ones (e.g., drought, solar storm). The site includes brief case studies for the threats. For example, the "plant epidemic" study focuses on the demise of the Gros Michel banana:

 Event: Panama disease outbreak, 1950s Location: Latin America Economic cost: Estimated losses across Latin America were around $400m ($2.3bn today) although this figure does not include any of the economic losses caused by unemployment, abandoned villages and unrealised income in the affected region. Description: The Fusarium oxysporum cubense fungus was first diagnosed in Panama but quickly travelled across Central America. Damage: The disease wiped out the Gros Michel banana, the principal cultivar at the time, from plantations across the region. Between 1940 and 1960, around 30,000 hectares of Gros Michel plantations were lost in the Ulua Valley of Honduras, and in a decade 10,000 hectares were lost in Suriname and the Quepos area of Costa Rica. Insight: Gros Michel was replaced in the 1960s by Cavendish, a variety thought to be resistant to the disease. However, a new strain of the pathogen was found to be attacking Cavendish plantations in Southeast Asia in the early 1990s. It has since spread, destroying tens of thousands of hectares across Indonesia and Malaysia, and costing more than $400m in the Philippines alone. There is concern that it could reach Central America and destroy up to 85% of the world’s banana crop. Solutions to contain the disease could include increasing genetic diversity among banana cultivars and developing hybrid varieties with stronger resistance. As the name implies, the site quantifies risks from these threats at the city level. So which cities are the most at risk from a plant epidemic? They're all in APAC: • Hong Kong ($3.83b)
• Shanghai ($2.89b) • Beijing ($2.38b)
• Bangkok ($2.22b) • Jakarta ($2.09b)

These account for 1/6 of the plant-epidemic risk across all 301 cities (75b). Which cities are the most "at risk," all threats considered? Once again, APAC dominates, but with a different set of cities: • Taipei • Tokyo • Seoul • Manila • New York (not APAC of course) This kind of information is interesting. It may even be useful as an approximate indication of where to focus risk mitigation efforts. But without more detail (probability distributions? second-order interaction effects? etc.) it's hard to see what role it would play in a serious risk analysis, existential or commercial or otherwise. Coda: Despite their application to less-than-existential risks and the superficiality of this particular resource (it is a marketing tool for Lloyd's, after all), perhaps existential riskologists could benefit from looking at the insurance industry. Has this already been done? ## AI safety: three human problems and one AI issue 9 19 May 2017 10:48AM Crossposted at the Intelligent agent foundation. There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems. These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans. Specifically, I feel AI safety issues can be classified as three human problems and one central AI issue. The human problems are: • Humans don't know their own values (sub-issue: humans know their values better in retrospect than in prediction). • Humans are not agents and don't have stable values (sub-issue: humanity itself is even less of an agent). • Humans have poor predictions of an AI's behaviour. And the central AI issue is: • AIs could become extremely powerful. Obviously if humans were agents and knew their own values and could predict whether a given AI would follow those values or not, there would be not problem. Conversely, if AIs were weak, then the human failings wouldn't matter so much. The points about human values is relatively straightforward, but what's the problem with humans not being agents? Essentially, humans can be threatened, tricked, seduced, exhausted, drugged, modified, and so on, in order to act seemingly against our interests and values. If humans were clearly defined agents, then what counts as a trick or a modification would be easy to define and exclude. But since this is not the case, we're reduced to trying to figure out the extent to which something like a heroin injection is a valid way to influence human preferences. This makes both humans susceptible to manipulation, and human values hard to define. Finally, the issue of humans having poor predictions of AI is more general than it seems. If you want to ensure that an AI has the same behaviour in the testing and training environment, then you're essentially trying to guarantee that you can predict that the testing environment behaviour will be the same as the (presumably safe) training environment behaviour. ## How to classify methods and problems That's well and good, but how to various traditional AI methods or problems fit into this framework? This should give us an idea as to whether the framework is useful. It seems to me that: • Friendly AI is trying to solve the values problem directly. • IRL and Cooperative IRL are also trying to solve the values problem. The greatest weakness of these methods is the not agents problem. • Corrigibility/interruptibility are also addressing the issue of humans not knowing their own values, using the sub-issue that human values are clearer in retrospect. These methods also overlap with poor predictions. • AI transparency is aimed at getting round the poor predictions problem. • Laurent's work on carefully defining the properties of agents is mainly also about solving the poor predictions problem. • Low impact and Oracles are aimed squarely at preventing AIs from becoming powerful. Methods that restrict the Oracle's output implicitly accept that humans are not agents. • Robustness of the AI to changes between testing and training environment, degradation and corruption, etc... ensures that humans won't be making poor predictions about the AI. • Robustness to adversaries is dealing with the sub-issue that humanity is not an agent. • The modular approach of Eric Drexler is aimed at preventing AIs from becoming too powerful, while reducing our poor predictions. • Logical uncertainty, if solved, would reduce the scope for certain types of poor predictions about AIs. • Wireheading, when the AI takes control of reward channel, is a problem that humans don't know their values (and hence use an indirect reward) and that the humans make poor predictions about the AI's actions. • Wireheading, when the AI takes control of the human, is as above but also a problem that humans are not agents. • Incomplete specifications are either a problem of not knowing our own values (and hence missing something important in the reward/utility) or making poor predictions (when we though that a situation was covered by our specification, but it turned out not to be). • AIs modelling human knowledge seem to be mostly about getting round the fact that humans are not agents. Putting this all in a table: MethodValues Not Agents Poor PredictionsPowerful Friendly AI X IRL and CIRL X Corrigibility/interruptibility X X AI transparency X Laurent's work X Low impact and Oracles X X Robustness X Robustness to adversaries X Modular approach X X Logical uncertainty X Wireheading (reward channel) X X X Wireheading (human) X X Incomplete specifications X X AIs modelling human knowledge X ## Further refinements of the framework It seems to me that the third category - poor predictions - is the most likely to be expandable. For the moment, it just incorporates all our lack of understanding about how AIs would behave, but this might more useful to subdivide. ## AI arms race 5 04 May 2017 10:59AM Racing to the Precipice: a Model of Artificial Intelligence Development by Stuart Armstrong, Nick Bostrom, and Carl Shulman This paper presents a simple model of an AI arms race, where several development teams race to build the first AI. Under the assumption that the first AI will be very powerful and transformative, each team is incentivised to finish first – by skimping on safety precautions if need be. This paper presents the Nash equilibrium of this process, where each team takes the correct amount of safety precautions in the arms race. Having extra development teams and extra enmity between teams can increase the danger of an AI-disaster, especially if risk taking is more important than skill in developing the AI. Surprisingly, information also increases the risks: the more teams know about each others’ capabilities (and about their own), the more the danger increases. ## Nearest unblocked strategy versus learning patches 6 23 February 2017 12:42PM Crossposted at Intelligent Agents Forum. The nearest unblocked strategy problem (NUS) is the idea that if you program a restriction or a patch into an AI, then the AI will often be motivated to pick a strategy that is as close as possible to the banned strategy, very similar in form, and maybe just as dangerous. For instance, if the AI is maximising a reward R, and does some behaviour Bi that we don't like, we can patch the AI's algorithm with patch Pi ('maximise R0 subject to these constraints...'), or modify R to Ri so that Bi doesn't come up. I'll focus more on the patching example, but the modified reward one is similar. continue reading » ## [Link] Gas hydrate breakdown unlikely to cause clathrate gun - report 1 19 February 2017 10:47PM ## X risk update, Gliese 710 will pass thru Oort in 1.35 my 4 08 January 2017 07:58PM I was tracking these runaway stars for a SF story i had in mind, but this is the closest one i have heard of yet, and the ArXiv paper describes one that also passed thru 2.5 mya. ## Gliese 710 will pass the Sun even closer ### Close approach parameters recalculated based on the first Gaia data release http://www.aanda.org/articles/aa/abs/2016/11/aa29835-16/aa29835-16.html # Close encounters of the stellar kind https://arxiv.org/abs/1412.3648 tl:dr article "Gliese 710 is about half the size of our sun, and it is set to reach Earth in 1.35 million years, according to a paper published in the journal Astronomy & Astrophysics in November. And when it arrives, the star could end up a mere 77 light-days away from Earth — one light-day being the equivalent of how far light travels in one day, which is about 26 billion kilometers, the researchers worked out. As far as we know, Gliese 710 isn't set to collide directly with Earth, but it wil be passing through the Oort Cloud, a shell of trillions of icy objects at the furthest reaches of our solar system. " Seems like a great opportunity to send out some interstellar probes. The star will be trailing lots of ISM, free gas that would help bring a ramjet up to speed, and track till you could curve towards another destination. Likewise, a solar sail probe launched out in front of it by laser could "hitchhike" , and get some deep space ISM , and EM measurements. Can we think of some other opportunities that this might present ? If we are past the filter by then, then we will already prob have samples of the Oort objects, but looks like they will be delivering then... ## [Link] Dares are social signaling at its purest 2 03 January 2017 10:48PM ## [Link] Ozy's Thoughts on CFAR's Mission Statement 2 14 December 2016 04:25PM ## Corrigibility through stratified indifference 4 19 August 2016 04:11PM A putative new idea for AI control; index here. Corrigibility through indifference has a few problems. One of them is that the AI is indifferent between the world in which humans change its utility to v, and world in which humans try to change its utility, but fail. Now the try-but-fail world is going to be somewhat odd - humans will be reacting by trying to change the utility again, trying to shut the AI down, panicking that a tiny probability event has happened, and so on. continue reading » ## Notes on the Safety in Artificial Intelligence conference 25 01 July 2016 12:36AM These are my notes and observations after attending the Safety in Artificial Intelligence (SafArtInt) conference, which was co-hosted by the White House Office of Science and Technology Policy and Carnegie Mellon University on June 27 and 28. This isn't an organized summary of the content of the conference; rather, it's a selection of points which are relevant to the control problem. As a result, it suffers from selection bias: it looks like superintelligence and control-problem-relevant issues were discussed frequently, when in reality those issues were discussed less and I didn't write much about the more mundane parts. SafArtInt has been the third out of a planned series of four conferences. The purpose of the conference series was twofold: the OSTP wanted to get other parts of the government moving on AI issues, and they also wanted to inform public opinion. The other three conferences are about near term legal, social, and economic issues of AI. SafArtInt was about near term safety and reliability in AI systems. It was effectively the brainchild of Dr. Ed Felten, the deputy U.S. chief technology officer for the White House, who came up with the idea for it last year. CMU is a top computer science university and many of their own researchers attended, as well as some students. There were also researchers from other universities, some people from private sector AI including both Silicon Valley and government contracting, government researchers and policymakers from groups such as DARPA and NASA, a few people from the military/DoD, and a few control problem researchers. As far as I could tell, everyone except a few university researchers were from the U.S., although I did not meet many people. There were about 70-100 people watching the presentations at any given time, and I had conversations with about twelve of the people who were not affiliated with existential risk organizations, as well as of course all of those who were affiliated. The conference was split with a few presentations on the 27th and the majority of presentations on the 28th. Not everyone was there for both days. Felten believes that neither "robot apocalypses" nor "mass unemployment" are likely. It soon became apparent that the majority of others present at the conference felt the same way with regard to superintelligence. The general intention among researchers and policymakers at the conference could be summarized as follows: we need to make sure that the AI systems we develop in the near future will not be responsible for any accidents, because if accidents do happen then they will spark public fears about AI, which would lead to a dearth of funding for AI research and an inability to realize the corresponding social and economic benefits. Of course, that doesn't change the fact that they strongly care about safety in its own right and have significant pragmatic needs for robust and reliable AI systems. Most of the talks were about verification and reliability in modern day AI systems. So they were concerned with AI systems that would give poor results or be unreliable in the narrow domains where they are being applied in the near future. They mostly focused on "safety-critical" systems, where failure of an AI program would result in serious negative consequences: automated vehicles were a common topic of interest, as well as the use of AI in healthcare systems. A recurring theme was that we have to be more rigorous in demonstrating safety and do actual hazard analyses on AI systems, and another was that we need the AI safety field to succeed in ways that the cybersecurity field has failed. Another general belief was that long term AI safety, such as concerns about the ability of humans to control AIs, was not a serious issue. On average, the presentations were moderately technical. They were mostly focused on machine learning systems, although there was significant discussion of cybersecurity techniques. The first talk was given by Eric Horvitz of Microsoft. He discussed some approaches for pushing into new directions in AI safety. Instead of merely trying to reduce the errors spotted according to one model, we should look out for "unknown unknowns" by stacking models and looking at problems which appear on any of them, a theme which would be presented by other researchers as well in later presentations. He discussed optimization under uncertain parameters, sensitivity analysis to uncertain parameters, and 'wireheading' or short-circuiting of reinforcement learning systems (which he believes can be guarded against by using 'reflective analysis'). Finally, he brought up the concerns about superintelligence, which sparked amused reactions in the audience. He said that scientists should address concerns about superintelligence, which he aptly described as the 'elephant in the room', noting that it was the reason that some people were at the conference. He said that scientists will have to engage with public concerns, while also noting that there were experts who were worried about superintelligence and that there would have to be engagement with the experts' concerns. He did not comment on whether he believed that these concerns were reasonable or not. An issue which came up in the Q&A afterwards was that we need to deal with mis-structured utility functions in AI, because it is often the case that the specific tradeoffs and utilities which humans claim to value often lead to results which the humans don't like. So we need to have structural uncertainty about our utility models. The difficulty of finding good objective functions for AIs would eventually be discussed in many other presentations as well. The next talk was given by Andrew Moore of Carnegie Mellon University, who claimed that his talk represented the consensus of computer scientists at the school. He claimed that the stakes of AI safety were very high - namely, that AI has the capability to save many people's lives in the near future, but if there are any accidents involving AI then public fears could lead to freezes in AI research and development. He highlighted the public's irrational tendencies wherein a single accident could cause people to overlook and ignore hundreds of invisible lives saved. He specifically mentioned a 12-24 month timeframe for these issues. Moore said that verification of AI system safety will be difficult due to the combinatorial explosion of AI behaviors. He talked about meta-machine-learning as a solution to this, something which is being investigated under the direction of Lawrence Schuette at the Office of Naval Research. Moore also said that military AI systems require high verification standards and that development timelines for these systems are long. He talked about two different approaches to AI safety, stochastic testing and theorem proving - the process of doing the latter often leads to the discovery of unsafe edge cases. He also discussed AI ethics, giving an example 'trolley problem' where AI cars would have to choose whether to hit a deer in order to provide a slightly higher probability of survival for the human driver. He said that we would need hash-defined constants to tell vehicle AIs how many deer a human is worth. He also said that we would need to find compromises in death-pleasantry tradeoffs, for instance where the safety of self-driving cars depends on the speed and routes on which they are driven. He compared the issue to civil engineering where engineers have to operate with an assumption about how much money they would spend to save a human life. He concluded by saying that we need policymakers, company executives, scientists, and startups to all be involved in AI safety. He said that the research community stands to gain or lose together, and that there is a shared responsibility among researchers and developers to avoid triggering another AI winter through unsafe AI designs. The next presentation was by Richard Mallah of the Future of Life Institute, who was there to represent "Medium Term AI Safety". He pointed out the explicit/implicit distinction between different modeling techniques in AI systems, as well as the explicit/implicit distinction between different AI actuation techniques. He talked about the difficulty of value specification and the concept of instrumental subgoals as an important issue in the case of complex AIs which are beyond human understanding. He said that even a slight misalignment of AI values with regard to human values along one parameter could lead to a strongly negative outcome, because machine learning parameters don't strictly correspond to the things that humans care about. Mallah stated that open-world discovery leads to self-discovery, which can lead to reward hacking or a loss of control. He underscored the importance of causal accounting, which is distinguishing causation from correlation in AI systems. He said that we should extend machine learning verification to self-modification. Finally, he talked about introducing non-self-centered ontology to AI systems and bounding their behavior. The audience was generally quiet and respectful during Richard's talk. I sensed that at least a few of them labelled him as part of the 'superintelligence out-group' and dismissed him accordingly, but I did not learn what most people's thoughts or reactions were. In the next panel featuring three speakers, he wasn't the recipient of any questions regarding his presentation or ideas. Tom Mitchell from CMU gave the next talk. He talked about both making AI systems safer, and using AI to make other systems safer. He said that risks to humanity from other kinds of issues besides AI were the "big deals of 2016" and that we should make sure that the potential of AIs to solve these problems is realized. He wanted to focus on the detection and remediation of all failures in AI systems. He said that it is a novel issue that learning systems defy standard pre-testing ("as Richard mentioned") and also brought up the purposeful use of AI for dangerous things. Some interesting points were raised in the panel. Andrew did not have a direct response to the implications of AI ethics being determined by the predominantly white people of the US/UK where most AIs are being developed. He said that ethics in AIs will have to be decided by society, regulators, manufacturers, and human rights organizations in conjunction. He also said that our cost functions for AIs will have to get more and more complicated as AIs get better, and he said that he wants to separate unintended failures from superintelligence type scenarios. On trolley problems in self driving cars and similar issues, he said "it's got to be complicated and messy." Dario Amodei of Google Deepbrain, who co-authored the paper on concrete problems in AI safety, gave the next talk. He said that the public focus is too much on AGI/ASI and wants more focus on concrete/empirical approaches. He discussed the same problems that pose issues in advanced general AI, including flawed objective functions and reward hacking. He said that he sees long term concerns about AGI/ASI as "extreme versions of accident risk" and that he thinks it's too early to work directly on them, but he believes that if you want to deal with them then the best way to do it is to start with safety in current systems. Mostly he summarized the Google paper in his talk. In her presentation, Claire Le Goues of CMU said "before we talk about Skynet we should focus on problems that we already have." She mostly talked about analogies between software bugs and AI safety, the similarities and differences between the two and what we can learn from software debugging to help with AI safety. Robert Rahmer of IARPA discussed CAUSE, a cyberintelligence forecasting program which promises to help predict cyber attacks. It is a program which is still being put together. In the panel of the above three, autonomous weapons were discussed, but no clear policy stances were presented. John Launchbury gave a talk on DARPA research and the big picture of AI development. He pointed out that DARPA work leads to commercial applications and that progress in AI comes from sustained government investment. He classified AI capabilities into "describing," "predicting," and "explaining" in order of increasing difficulty, and he pointed out that old fashioned "describing" still plays a large role in AI verification. He said that "explaining" AIs would need transparent decisionmaking and probabilistic programming (the latter would also be discussed by others at the conference). The next talk came from Jason Gaverick Matheny, the director of IARPA. Matheny talked about four requirements in current and future AI systems: verification, validation, security, and control. He wanted "auditability" in AI systems as a weaker form of explainability. He talked about the importance of "corner cases" for national intelligence purposes, the low probability, high stakes situations where we have limited data - these are situations where we have significant need for analysis but where the traditional machine learning approach doesn't work because of its overwhelming focus on data. Another aspect of national defense is that it has a slower decision tempo, longer timelines, and longer-viewing optics about future events. He said that assessing local progress in machine learning development would be important for global security and that we therefore need benchmarks to measure progress in AIs. He ended with a concrete invitation for research proposals from anyone (educated or not), for both large scale research and for smaller studies ("seedlings") that could take us "from disbelief to doubt". The difference in timescales between different groups was something I noticed later on, after hearing someone from the DoD describe their agency as having a longer timeframe than the Homeland Security Agency, and someone from the White House describe their work as being crisis reactionary. The next presentation was from Andrew Grotto, senior director of cybersecurity policy at the National Security Council. He drew a close parallel from the issue of genetically modified crops in Europe in the 1990's to modern day artificial intelligence. He pointed out that Europe utterly failed to achieve widespread cultivation of GMO crops as a result of public backlash. He said that the widespread economic and health benefits of GMO crops were ignored by the public, who instead focused on a few health incidents which undermined trust in the government and crop producers. He had three key points: that risk frameworks matter, that you should never assume that the benefits of new technology will be widely perceived by the public, and that we're all in this together with regard to funding, research progress and public perception. In the Q&A between Launchbury, Matheny, and Grotto after Grotto's presentation, it was mentioned that the economic interests of farmers worried about displacement also played a role in populist rejection of GMOs, and that a similar dynamic could play out with regard to automation causing structural unemployment. Grotto was also asked what to do about bad publicity which seeks to sink progress in order to avoid risks. He said that meetings like SafArtInt and open public dialogue were good. One person asked what Launchbury wanted to do about AI arms races with multiple countries trying to "get there" and whether he thinks we should go "slow and secure" or "fast and risky" in AI development, a question which provoked laughter in the audience. He said we should go "fast and secure" and wasn't concerned. He said that secure designs for the Internet once existed, but the one which took off was the one which was open and flexible. Another person asked how we could avoid discounting outliers in our models, referencing Matheny's point that we need to include corner cases. Matheny affirmed that data quality is a limiting factor to many of our machine learning capabilities. At IARPA, we generally try to include outliers until they are sure that they are erroneous, said Matheny. Another presentation came from Tom Dietterich, president of the Association for the Advancement of Artificial Intelligence. He said that we have not focused enough on safety, reliability and robustness in AI and that this must change. Much like Eric Horvitz, he drew a distinction between robustness against errors within the scope of a model and robustness against unmodeled phenomena. On the latter issue, he talked about solutions such as expanding the scope of models, employing multiple parallel models, and doing creative searches for flaws - the latter doesn't enable verification that a system is safe, but it nevertheless helps discover many potential problems. He talked about knowledge-level redundancy as a method of avoiding misspecification - for instance, systems could identify objects by an "ownership facet" as well as by a "goal facet" to produce a combined concept with less likelihood of overlooking key features. He said that this would require wider experiences and more data. There were many other speakers who brought up a similar set of issues: the user of cybersecurity techniques to verify machine learning systems, the failures of cybersecurity as a field, opportunities for probabilistic programming, and the need for better success in AI verification. Inverse reinforcement learning was extensively discussed as a way of assigning values. Jeanette Wing of Microsoft talked about the need for AIs to reason about the continuous and the discrete in parallel, as well as the need for them to reason about uncertainty (with potential meta levels all the way up). One point which was made by Sarah Loos of Google was that proving the safety of an AI system can be computationally very expensive, especially given the combinatorial explosion of AI behaviors. In one of the panels, the idea of government actions to ensure AI safety was discussed. No one was willing to say that the government should regulate AI designs. Instead they stated that the government should be involved in softer ways, such as guiding and working with AI developers, and setting standards for certification. Pictures: https://imgur.com/a/49eb7 In between these presentations I had time to speak to individuals and listen in on various conversations. A high ranking person from the Department of Defense stated that the real benefit of autonomous systems would be in terms of logistical systems rather than weaponized applications. A government AI contractor drew the connection between Mallah's presentation and the recent press revolving around superintelligence, and said he was glad that the government wasn't worried about it. I talked to some insiders about the status of organizations such as MIRI, and found that the current crop of AI safety groups could use additional donations to become more established and expand their programs. There may be some issues with the organizations being sidelined; after all, the Google Deepbrain paper was essentially similar to a lot of work by MIRI, just expressed in somewhat different language, and was more widely received in mainstream AI circles. In terms of careers, I found that there is significant opportunity for a wide range of people to contribute to improving government policy on this issue. Working at a group such as the Office of Science and Technology Policy does not necessarily require advanced technical education, as you can just as easily enter straight out of a liberal arts undergraduate program and build a successful career as long as you are technically literate. (At the same time, the level of skepticism about long term AI safety at the conference hinted to me that the signalling value of a PhD in computer science would be significant.) In addition, there are large government budgets in the seven or eight figure range available for qualifying research projects. I've come to believe that it would not be difficult to find or create AI research programs that are relevant to long term AI safety while also being practical and likely to be funded by skeptical policymakers and officials. I also realized that there is a significant need for people who are interested in long term AI safety to have basic social and business skills. Since there is so much need for persuasion and compromise in government policy, there is a lot of value to be had in being communicative, engaging, approachable, appealing, socially savvy, and well-dressed. This is not to say that everyone involved in long term AI safety is missing those skills, of course. I was surprised by the refusal of almost everyone at the conference to take long term AI safety seriously, as I had previously held the belief that it was more of a mixed debate given the existence of expert computer scientists who were involved in the issue. I sensed that the recent wave of popular press and public interest in dangerous AI has made researchers and policymakers substantially less likely to take the issue seriously. None of them seemed to be familiar with actual arguments or research on the control problem, so their opinions didn't significantly change my outlook on the technical issues. I strongly suspect that the majority of them had their first or possibly only exposure to the idea of the control problem after seeing badly written op-eds and news editorials featuring comments from the likes of Elon Musk and Stephen Hawking, which would naturally make them strongly predisposed to not take the issue seriously. In the run-up to the conference, websites and press releases didn't say anything about whether this conference would be about long or short term AI safety, and they didn't make any reference to the idea of superintelligence. I sympathize with the concerns and strategy given by people such as Andrew Moore and Andrew Grotto, which make perfect sense if (and only if) you assume that worries about long term AI safety are completely unfounded. For the community that is interested in long term AI safety, I would recommend that we avoid competitive dynamics by (a) demonstrating that we are equally strong opponents of bad press, inaccurate news, and irrational public opinion which promotes generic uninformed fears over AI, (b) explaining that we are not interested in removing funding for AI research (even if you think that slowing down AI development is a good thing, restricting funding yields only limited benefits in terms of changing overall timelines, whereas those who are not concerned about long term AI safety would see a restriction of funding as a direct threat to their interests and projects, so it makes sense to cooperate here in exchange for other concessions), and (c) showing that we are scientifically literate and focused on the technical concerns. I do not believe that there is necessarily a need for the two "sides" on this to be competing against each other, so it was disappointing to see an implication of opposition at the conference. Anyway, Ed Felten announced a request for information from the general public, seeking popular and scientific input on the government's policies and attitudes towards AI: https://www.whitehouse.gov/webform/rfi-preparing-future-artificial-intelligence Overall, I learned quite a bit and benefited from the experience, and I hope the insight I've gained can be used to improve the attitudes and approaches of the long term AI safety community. ## One weird trick to turn maximisers into minimisers 1 22 April 2016 04:47PM A putative new idea for AI control; index here. A simple and easy design for a u-maximising agent that turns into a u-minimising one. Let X be some boolean random variable outside the agent's control, that will be determined at some future time t (based on a cosmic event, maybe?). Set it up so that P(X=1)=ε, and for a given utility u, consider the utility: • u# = (2/ε)Xu - u. Before t, the expected value of (2/ε)X is 2, so u# = u. Hence the agent is a u-maximiser. After t, the most likely option is X=0, hence a little bit of evidence to that effect is enough to make u# into a u-minimiser. This isn't perfect corrigibility - the agent would be willing to sacrifice a bit of u-value (before t) in order to maintain its flexibility after t. To combat this effect, we could instead use: • u# = Ω(2/ε)Xu - u. If Ω is large, then the agent is willing to pay very little u-value to maintain flexibility. However, the amount of evidence of X=0 that it needs to become a u-minimiser is equally proportional to Ω, so X better be a clear and convincing event. ## True answers from AI: Summary 4 10 March 2016 03:56PM A putative new idea for AI control; index here. This post summaries the methods for getting honest answers from certain AIs, presented in the previous two posts. You can basically safely ask an AI for expected utility estimates (including conditional ones), probability estimates, and get the AI to identify events of highest and lowest probability and utility, and, arguably, highest and lowest conditional utility. The setup is a boxed AI, which emits a message M, and a stochastic stochastic process E that would erase M with low probability, before anyone could see it. There is also a stochastic process that emits a message Y. Let v_E be the indicator function for E and v_y the indicator function for Y=y. Then, given a utility u bounded between 0 and 1, and a specific y, we can give the AI a utility u# that will cause it to give us the message M_1={q,q^y,p^y,q^{|y}} (which we will see with high probability P(¬E), an event the AI will not optimise for). Define f(q,u)=-q^2 +2qu. The utility is then • u#(1) = v_E(f(q,u) + f(q^y,uv_y) + f(p^y,v_y) + f(q^{|y},u)v_y). Maximising this u# ensures that, in the AI's best estimate, 1. q = E(u), 2. q^y = E(u ∧ (Y=y)), 3. p^{y} = P(Y=y), 4. q^{|y} = q^{y}/p^{y}=E(u|Y=y). But we need not consider any specific y. The AI could send out the message M_2={y^+,y^-,y^{p+},y^{p-},q^+,q^-,p^+,p^-}, while maximising the utility • u#(2) = v_E (f(q^+,uv_{y^+}) + f(q^-,(1-u)v_{y^-}) + f(p^+,v_{y^{p+}}) + f(p^-,1-v_{y^{p-}})). Then M will be such that, in the AI's best estimate, 1. (y^+,q^+) = (argmax_y, max_y) E(u ∧ (Y=y)), 2. (y^-,q^-) = (argmin_y, min_y) E(u ∧ (Y=y)), 3. (y^{p+},p^+) = (argmax_y, max_y) P(Y=y), 4. (y^{p-},p^-) = (argmin_y, min_y) P(Y=y), Finally, define M_{3,r} = {y^{|+},y^{|-},q^{|+},q^{|-}}. And set f_r(q,u)= -(r-1)q^r+ rq^{r-1}u, and define the utility function: • u#(3,r) = v_E v_{y^{|+}}f_r(q{|+},u) + v_E v_{y^{|-}}f_r(q^{|-},1-u). Then, for large enough r, the AI will output M so that, according to its best estimate, 1. (y^{|+},q^{|+}) = (argmax_y, max_y) E(u|(Y=Y^+)), 2. (y^{|-},q^{|-}) = (argmin_y, min_y) E(u|(Y=Y^+)). Unfortunately the value of r cannot be known ahead of time. It might be possible to define a "limit" utility u#(3, ∞). Note that all these messages and utilities are independent, so the AI can generate messages M_1 + M_2 + M_{3,r} + M_{3,r'} when maximising • u#(1) + u#(2) + u#(3,r) + u#(3,r'). But there are issues with very low probabilities, as explained in the previous post. ## Discussion: weighting inside view versus outside view on extinction events 3 25 February 2016 05:18AM Articles covering the ideas of inside view and outside view: Beware the Inside View (by Robin Hanson) Outside View LessWrong wiki article Article discussing the weighting of inside view and outside view: The World is Mad (by ozymandias) A couple of potential extinction events which seem to be most easily mitigated (the machinery involved is expensive): Broadcasting powerful messages to the stars: Should Earth Shut the Hell Up? (by Robin Hanson) Arecibo message (Wikipedia) Large Hadron Collider: Anyone who thinks the Large Hadron Collider will destroy the world is a t**t. (by Rebecca Roache) How should the inside view versus the outside view be weighted when considering extinction events? Should the broadcast of future Arecibo messages (or powerful signals in general) be opposed? Should the expansion of energy levels (or continued operation at all) of the Large Hadron Collider be opposed? ## Goal completion: noise, errors, bias, prejudice, preference and complexity 4 18 February 2016 02:37PM A putative new idea for AI control; index here. This is a preliminary look at how an AI might assess and deal with various types of errors and uncertainties, when estimating true human preferences. I'll be using the circular rocket model to illustrate how these might be distinguished by an AI. Recall that the rocket can accelerate by -2, -1, 0, 1, and 2, and the human wishes to reach the space station (at point 0 with velocity 0) and avoid accelerations of ±2. In the forthcoming, there will generally be some noise, so to make the whole thing more flexible, assume that the space station is a bit bigger than usual, covering five squares. So "docking" at the space station means reaching {-2,-1,0,1,2} with 0 velocity. continue reading » ## Estimating the probability of human extinction 5 17 February 2016 04:19PM I'm looking for feedback on the following idea. The article from which it's been excerpted can be found here: http://ieet.org/index.php/IEET/more/torres20120213 "But not only has the number of scenarios increased in the past 71 years, many riskologists believe that the probability of a global disaster has also significantly risen. Whereas the likelihood of annihilation for most of our species’ history was extremely low, Nick Bostrom argues that “setting this probability lower than 25% [this century] would be misguided, and the best estimate may be considerably higher.” Similarly, Sir Martin Rees claims that a civilization-destroying event before the year 02100 is as likely as getting a “heads” after flipping a coin. These are only two opinions, of course, but to paraphrase the Russell-Einstein Manifesto, my experience confirms that those who know the < most tend to be the most gloomy "I [would] argue that Rees’ figure is plausible. To adapt a maxim from the philosopher David Hume, wise people always proportion their fears to the best available evidence, and when one honestly examines this evidence, one finds that there really is good reason for being alarmed. But I also offer a novel — to my knowledge — argument for why we may be systematically underestimating the overall likelihood of doom. In sum, just as a dog can’t possibly comprehend any of the natural and anthropogenic risks mentioned above, so too could there be risks that forever lie beyond our epistemic reach. All biological brains have intrinsic limitations that constrain the library of concepts to which one has access. And without concepts, one can’t mentally represent the external world. It follows that we could be “cognitively closed” to a potentially vast number of cosmic risks that threaten us with total annihilation. This being said, one might argue that such risks, if they exist at all, must be highly improbable, since Earth-originating life has existed for some 3.5 billion years without an existential catastrophe having happened. But this line of reasoning is deeply flawed: it fails to take into account that the only worlds in which observers like us could find ourselves are ones in which such a catastrophe has never occurred. It follows that a record of past survival on our planetary spaceship provides no useful information about the probability of certain existential disasters happening in the future. The facts of cognitive closure plus the observation selection effect suggest that our probability conjectures of total annihilation may be systematically underestimated, perhaps by a lot." Thoughts? ## New positions and recent hires at the Centre for the Study of Existential Risk (Cambridge, UK) 9 13 October 2015 11:11AM [Cross-posted from EA Forum. Summary: Four new postdoc positions at the Centre for the Study of Existential Risk: Evaluation of extreme technological risk (philosophy, economics); Extreme risk and the culture of science (philosophy of science); Responsible innovation and extreme technological risk (science & technology studies, sociology, policy, governance); and an academic project manager (cutting across the Centre’s research projects, and playing a central role in Centre development). Please help us to spread the word far and wide in the academic community!] An inspiring first recruitment round The Centre for the Study of Existential Risk (Cambridge, UK) has been making excellent progress in building up our research team. Our previous recruitment round was a great success, and we made three exceptional hires. Dr Shahar Avin joined us in September from Google, with a background in the philosophy of science (Cambridge, UK). He is currently fleshing out several potential research projects, which will be refined and finalised following a research visit to FHI later this month. Dr Yang Liu joined us this month from Columbia University, with a background in mathematical logic and philosophical decision theory. Yang will work on problems in decision theory that relate to long-term AI, and will help us to link the excellent work being done at MIRI with relevant expertise and talent within academia. In February 2016, we will be joined by Dr Bonnie Wintle from the Centre of Excellence for Biosecurity Risk Analysis (CEBRA), who will lead our horizon-scanning work in collaboration with Professor Bill Sutherland’s group at Cambridge; among other things, she has worked on IARPA-funded development of automated horizon-scanning tools, and has been involved in the Good Judgement Project. We are very grateful for the help of the existential risk and EA communities in spreading the word about these positions, and helping us to secure an exceptionally strong field. Additionally, I have now moved on from FHI to be CSER’s full-time Executive Director, and Huw Price is now 50% funded as CSER’s Academic Director (we share him with Cambridge’s Philosophy Faculty, where he remains Bertrand Russell Chair of Philosophy). Four new positions: We’re delighted to announce four new positions at the Centre for the Study of Existential Risk; details below. Unlike the previous round, where we invited project proposals from across our areas of interest, in this case we have several specific positions that we need to fill for our three year Managing Extreme Technological Risk project, funded by the Templeton World Charity Foundation; details are provided below. As we are building up our academic brand within a traditional university, we expect to predominantly hire from academia, i.e. academic researchers with (or near to the completion of) PhDs. However, we are open to hiring excellent candidates without candidates but with an equivalent and relevant level of expertise, for example in think tanks, policy settings or industry. Three of these positions are in the standard academic postdoc mould, working on specific research projects. I’d like to draw attention to the fourth, the academic project manager. For this position, we are looking for someone with the intellectual versatility to engage across our research strands – someone who can coordinate these projects, synthesise and present our research to a range of audiences including funders, collaborators, policymakers and industry contacts. Additionally, this person will play a key role in developing the centre over the next two years, working with our postdocs and professorial advisors to secure funding, and contributing to our research, media, and policy strategy among other things. I’ve been interviewed in the past (https://80000hours.org/2013/02/bringing-it-all-together-high-impact-research-management/) about the importance of roles of this nature; right now I see it as our biggest bottleneck, and a position in which an ambitious person could make a huge difference. We need your help – again! In some ways, CSER has been the quietest of the existential risk organisations of late – we’ve mainly been establishing research connections, running lectures and seminars, writing research grants and building relations with policymakers (plus some behind-the scenes involvement with various projects). But we’ve been quite successful in these things, and now face an exciting but daunting level of growth: by next year we aim to have a team of 9-10 postdoctoral researchers here at Cambridge, plus senior professors and other staff. It’s very important we continue our momentum by getting world-class researchers motivated to do work of the highest impact. Reaching out and finding these people is quite a challenge, especially given our still-small team. So the help of the existential risk and EA communities in spreading the word – on your facebook feeds, on relevant mailing lists in your universities, passing them on to talented people you know – will make a huge difference to us. Thank you so much! Seán Ó hÉigeartaigh (Executive Director, CSER) “The Centre for the Study of Existential Risk is delighted to announce four new postdoctoral positions for the subprojects below, to begin in January 2016 or as soon as possible afterwards. The research associates will join a growing team of researchers developing a general methodology for the management of extreme technological risk. Evaluation of extreme technological risk will examine issues such as: The use and limitations of approaches such as cost-benefit analysis when evaluating extreme technological risk; the importance of mitigating extreme technological risk compared to other global priorities; issues in population ethics as they relate to future generations; challenges associated with evaluating small probabilities of large payoffs; challenges associated with moral and evaluative uncertainty as they relate to the long-term future of humanity. Relevant disciplines include philosophy and economics, although suitable candidates outside these fields are welcomed. More: Evaluation of extreme technological risk Extreme risk and the culture of science will explore the hypothesis that the culture of science is in some ways ill-adapted to successful long-term management of extreme technological risk, and investigate the option of ‘tweaking’ scientific practice, so as to improve its suitability for this special task. It will examine topics including inductive risk, use and limitations of the precautionary principle, and the case for scientific pluralism and ‘breakout thinking’ where extreme technological risk is concerned. Relevant disciplines include philosophy of science and science and technology studies, although suitable candidates outside these fields are welcomed. More: Extreme risk and the culture of science; Responsible innovation and extreme technological risk asks what can be done to encourage risk-awareness and societal responsibility, without discouraging innovation, within the communities developing future technologies with transformative potential. What can be learned from historical examples of technology governance and culture-development? What are the roles of different forms of regulation in the development of transformative technologies with risk potential? Relevant disciplines include science and technology studies, geography, sociology, governance, philosophy of science, plus relevant technological fields (e.g., AI, biotechnology, geoengineering), although suitable candidates outside these fields are welcomed. More: Responsible innovation and extreme technological risk We are also seeking to appoint an academic project manager, who will play a central role in developing CSER into a world-class research centre. We seek an ambitious candidate with initiative and a broad intellectual range for a postdoctoral role combining academic and administrative responsibilities. The Academic Project Manager will co-ordinate and develop CSER’s projects and the Centre’s overall profile, and build and maintain collaborations with academic centres, industry leaders and policy makers in the UK and worldwide. This is a unique opportunity to play a formative research development role in the establishment of a world-class centre. More: CSER Academic Project Manager Candidates will normally have a PhD in a relevant field or an equivalent level of experience and accomplishment (for example, in a policy, industry, or think tank setting). Application Deadline: Midday (12:00) on November 12th 2015.” ## [Link] Review of "Doing Good Better" 0 26 September 2015 07:58AM The article is here. The book is by William MacAskill, founder of 80000 Hours and Giving What We Can. Excerpt: Effective altruism takes up the spirit of Singer’s argument but shields us from the full blast of its conclusion; moral indictment is transformed into an empowering investment opportunity... Either effective altruism, like utilitarianism, demands that we do the most good possible, or it asks merely that we try to make things better. The first thought is genuinely radical, requiring us to overhaul our daily lives in ways unimaginable to most...The second thought – that we try to make things better – is shared by every plausible moral system and every decent person. If effective altruism is simply in the business of getting us to be more effective when we try to help others, then it’s hard to object to it. But in that case it’s also hard to see what it’s offering in the way of fresh moral insight, still less how it could be the last social movement we’ll ever need. ## The Dice Room, Human Extinction, and Consistency of Bayesian Probability Theory 2 28 July 2015 04:27PM I'm sure that many of you here have read Quantum Computing Since Democritus. In the chapter on the anthropic principle the author presents the Dice Room scenario as a metaphor for human extinction. The Dice Room scenario is this: 1. You are in a world with a very, very large population (potentially unbounded.) 2. There is a madman who kidnaps 10 people and puts them in a room. 3. The madman rolls two dice. If they come up snake eyes (both ones) then he murders everyone. 4. Otherwise he releases everyone, then goes out and kidnaps 10 times as many people as before, and returns to step 3. The question is this: if you are one of the people kidnapped at some point, what is your probability of dying? Assume you don't know how many rounds of kidnappings have preceded yours. As a metaphor for human extinction, think of the population of this world as being all humans who ever have or ever may live, each batch of kidnap victims as a generation of humanity, and rolling snake eyes as an extinction event. The book gives two arguments, which are both purported to be examples of Bayesian reasoning: 1. The "proximate risk" argument says that your probability of dying is just the prior probability that the madman rolls snake eyes for your batch of kidnap victims -- 1/36. 2. The "proportion murdered" argument says that about 9/10 of all people who ever go into the Dice Room die, so your probability of dying is about 9/10. Obviously this is a problem. Different decompositions of a problem should give the same answer, as long as they're based on the same information. I claim that the "proportion murdered" argument is wrong. Here's why. Let pi(t) be the prior probability that you are in batch t of kidnap victims. The proportion murdered argument relies on the property that pi(t) increases exponentially with t: pi(t+1) = 10 * pi(t). If the madman murders at step t, then your probability of being in batch t is pi(t) / SUM(u: 1 <= u <= t: pi(u)) and, if pi(u+1) = 10 * pi(u) for all u < t, then this does indeed work out to about 9/10. But the values pi(t) must sum to 1; thus they cannot increase indefinitely, and in fact it must be that pi(t) -> 0 as t -> infinity. This is where the "proportion murdered" argument falls apart. For a more detailed analysis, take a look at http://bayesium.com/doomsday-and-the-dice-room-murders/ This forum has a lot of very smart people who would be well-qualified to comment on that analysis, and I would appreciate hearing your opinions. ## AI: requirements for pernicious policies 7 17 July 2015 02:18PM Some have argued that "tool AIs" are safe(r). Recently, Eric Drexler decomposed AIs into "problem solvers" (eg calculators), "advisors" (eg GPS route planners), and actors (autonomous agents). Both solvers and advisors can be seen as examples of tools. People have argued that tool AIs are not safe. It's hard to imagine a calculator going berserk, no matter what its algorithm is, but it's not too hard to come up with clear examples of dangerous tools. This suggests the solvers vs advisors vs actors (or tools vs agents, or oracles vs agents) is not the right distinction. Instead, I've been asking: how likely is the algorithm to implement a pernicious policy? If we model the AI as having an objective function (or utility function) and algorithm that implements it, a pernicious policy is one that scores high in the objective function but is not at all what is intended. A pernicious function could be harmless and entertaining or much more severe. I will lay aside, for the moment, the issue of badly programmed algorithms (possibly containing its own objective sub-functions). In any case, to implement a pernicious function, we have to ask these questions about the algorithm: 1. Do pernicious policies exist? Are there many? 2. Can the AI find them? 3. Can the AI test them? 4. Would the AI choose to implement them? The answer to 1. seems to be trivially yes. Even a calculator could, in theory, output a series of messages that socially hack us, blah, take over the world, blah, extinction, blah, calculator finishes its calculations. What is much more interesting is some types of agents have many more pernicious policies than others. This seems the big difference between actors and other designs. An actor AI in complete control of the USA or Russia's nuclear arsenal has all sort of pernicious policies easily to hand; an advisor or oracle has much fewer (generally going through social engineering), a tool typically even less. A lot of the physical protection measures are about reducing the number of sucessfull pernicious policies the AI has a cess to. The answer to 2. is mainly a function of the power of the algorithm. A basic calculator will never find anything dangerous: its programming is simple and tight. But compare an agent with the same objective function and the ability to do an unrestricted policy search with vast resources... So it seems that the answer to 2. does not depend on any solver vs actor division, but purely on the algorithm used. And now we come to the big question 3., whether the AI can test these policies. Even if the AI can find pernicious policies that rank high on its objective function, it will never implement them unless it can ascertain this fact. And there are several ways it could do so. Let's assume that a solver AI has a very complicated objective function - one that encodes many relevant facts about the real world. Now, the AI may not "care" about the real world, but it has a virtual version of that, in which it can virtually test all of its policies. With a detailed enough computing power, it can establish whether the pernicious policy would be effective at achieving its virtual goal. If this is a good approximation of how the pernicious policy would behave in the real world, we could have a problem. But extremely detailed objective functions are unlikely. But even simple ones can show odd behaviour if the agents gets to interact repeatedly with the real world - this is the issue with reinforcement learning. Suppose that the agent attempts a translation job, and is rewarded on the accuracy of its translation. Depending on the details of what the AI knows and who choose the rewards, the AI could end up manipulating its controllers, similarly to this example. The problem is that one there is any interaction, all the complexity of humanity could potentially show up in the reward function, even if the objective function is simple. Of course, some designs make this very unlikely - resetting the AI periodically can help to alleviate the problem, as can choosing more objective criteria for any rewards. Lastly on this point, we should mention the possibility that human R&D, by selecting and refining the objective function and the algorithm, could take the roll of testing the policies. This is likely to emerge only in cases where many AI designs are considered, and the best candiates are retained based on human judgement. Finally we come to the question of whether the AI will implement the policy if it's found it and tested it. You could say that the point of FAI is to create an AI that doesn't choose to implement pernicious policies - but, more correctly, the point of FAI is to ensure that very few (or zero) pernicious policies exist in the first place, as they all score low on the utility function. However, there are a variety of more complicated designs - satisficers, agents using crude measures - where the questions of "Do pernicious policies exist?" and "Would the AI choose to implement them?" could become quite distinct. ## Conclusion: a more through analysis of AI designs is needed A calculator is safe, because it is a solver, it has a very simple objective function, with no holes in the algorithm, and it can neither find nor test any pernicious policies. It is the combination of these elements that makes it almost certainly safe. If we want to make the same claim about other designs, neither "it's just a solver" or "it's objective function is simple" would be enough; we need a careful analysis. Though, as usual, "it's not certainly safe" is a quite distinct claim from "it's (likely) dangerous", and they should not be conflated. ## Top 9+2 myths about AI risk 44 29 June 2015 08:41PM Following some somewhat misleading articles quoting me, I thought Id present the top 9 myths about the AI risk thesis: 1. That we’re certain AI will doom us. Certainly not. It’s very hard to be certain of anything involving a technology that doesn’t exist; we’re just claiming that the probability of AI going bad isn’t low enough that we can ignore it. 2. That humanity will survive, because we’ve always survived before. Many groups of humans haven’t survived contact with more powerful intelligent agents. In the past, those agents were other humans; but they need not be. The universe does not owe us a destiny. In the future, something will survive; it need not be us. 3. That uncertainty means that you’re safe. If you’re claiming that AI is impossible, or that it will take countless decades, or that it’ll be safe... you’re not being uncertain, you’re being extremely specific about the future. “No AI risk” is certain; “Possible AI risk” is where we stand. 4. That Terminator robots will be involved. Please? The threat from AI comes from its potential intelligence, not from its ability to clank around slowly with an Austrian accent. 5. That we’re assuming the AI is too dumb to know what we’re asking it. No. A powerful AI will know what we meant to program it to do. But why should it care? And if we could figure out how to program “care about what we meant to ask”, well, then we’d have safe AI. 6. That there’s one simple trick that can solve the whole problem. Many people have proposed that one trick. Some of them could even help (see Holden’s tool AI idea). None of them reduce the risk enough to relax – and many of the tricks contradict each other (you can’t design an AI that’s both a tool and socialising with humans!). 7. That we want to stop AI research. We don’t. Current AI research is very far from the risky areas and abilities. And it’s risk aware AI researchers that are most likely to figure out how to make safe AI. 8. That AIs will be more intelligent than us, hence more moral. It’s pretty clear than in humans, high intelligence is no guarantee of morality. Are you really willing to bet the whole future of humanity on the idea that AIs might be different? That in the billions of possible minds out there, there is none that is both dangerous and very intelligent? 9. That science fiction or spiritual ideas are useful ways of understanding AI risk. Science fiction and spirituality are full of human concepts, created by humans, for humans, to communicate human ideas. They need not apply to AI at all, as these could be minds far removed from human concepts, possibly without a body, possibly with no emotions or consciousness, possibly with many new emotions and a different type of consciousness, etc... Anthropomorphising the AIs could lead us completely astray. Lists cannot be comprehensive, but they can adapt and grow, adding more important points: 1. That AIs have to be evil to be dangerous. The majority of the risk comes from indifferent or partially nice AIs. Those that have some goal to follow, with humanity and its desires just getting in the way – using resources, trying to oppose it, or just not being perfectly efficient for its goal. 2. That we believe AI is coming soon. It might; it might not. Even if AI is known to be in the distant future (which isn't known, currently), some of the groundwork is worth laying now. ## Astronomy, space exploration and the Great Filter 23 19 April 2015 07:26PM Astronomical research has what may be an under-appreciated role in helping us understand and possibly avoiding the Great Filter. This post will examine how astronomy may be helpful for identifying potential future filters. The primary upshot is that we may have an advantage due to our somewhat late arrival: if we can observe what other civilizations have done wrong, we can get a leg up. This post is not arguing that colonization is a route to remove some existential risks. There is no question that colonization will reduce the risk of many forms of Filters, but the vast majority of astronomical work has no substantial connection to colonization. Moreover, the case for colonization has been made strongly by many others already, such as Robert Zubrin's book "The Case for Mars" or this essay by Nick Bostrom Note: those already familiar with the Great Filter and proposed explanations may wish to skip to the section "How can we substantially improve astronomy in the short to medium term?" ### What is the Great Filter? There is a worrying lack of signs of intelligent life in the universe. The only intelligent life we have detected has been that on Earth. While planets are apparently numerous, there have been no signs of other life. There are three possible lines of evidence we would expect to see if civilizations were common in the universe: radio signals, direct contact, and large-scale constructions. The first two of these issues are well-known, but the most serious problem arises from the lack of large-scale constructions: as far as we can tell the universe look natural. The vast majority of matter and energy in the universe appears to be unused. The Great Filter is one possible explanation for this lack of life, namely that some phenomenon prevents intelligent life from passing into the interstellar, large-scale phase. Variants of the idea have been floating around for a long time; the term was first coined by Robin Hanson in this essay. There are two fundamental versions of the Filter: filtration which has occurred in our past, and Filtration which will occur in our future. For obvious reasons the second of the two is more of a concern. Moreover, as our technological level increases, the chance that we are getting to the last point of serious filtration gets higher since as one has a civilization spread out to multiple stars, filtration becomes more difficult. ### Evidence for the Great Filter and alternative explanations: At this point, over the last few years, the only major updates to the situation involving the Filter since Hanson's essay have been twofold: First, we have confirmed that planets are very common, so a lack of Earth-size planets or planets in the habitable zone are not likely to be a major filter. Second, we have found that planet formation occurred early in the universe. (For example see this article about this paper.) Early planet formation weakens the common explanation of the Fermi paradox that the argument that some species had to be the first intelligent species and we're simply lucky. Early planet formation along with the apparent speed at which life arose on Earth after the heavy bombardment ended, as well as the apparent speed with which complex life developed from simple life, strongly refutes this explanation. The response has been made that early filtration may be so common that if life does not arise early on a planet's star's lifespan, then it will have no chance to reach civilization. However, if this were the case, we'd expect to have found ourselves orbiting a more long-lived star like a red dwarf. Red dwarfs are more common than sun-like stars and have much longer lifespans by multiple orders of magnitude. While attempts to understand the habitable zone of red dwarfs are still ongoing, current consensus is that many red dwarfs contain habitable planets These two observations, together with further evidence that the universe looks natural makes future filtration seem likely. If advanced civilizations existed, we would expect them to make use of the large amounts of matter and energy available. We see no signs of such use. We've seen no indication of ring-worlds, Dyson spheres, or other megascale engineering projects. While such searches have so far been confined to around 300 parsecs and some candidates were hard to rule out, if a substantial fraction of stars in a galaxy have Dyson spheres or swarms we would notice the unusually high infrared spectrum. Note that this sort of evidence is distinct from arguments about contact or about detecting radio signals. There's a very recent proposal for mini-Dyson spheres around white dwarfs which would be much easier to engineer and harder to detect, but they would not reduce the desirability of other large-scale structures, and they would likely be detectable if there were a large number of them present in a small region. One recent study looked for signs of large-scale modification to the radiation profile of galaxies in a way that should show presence of large scale civilizations. They looked at 100,000 galaxies and found no major sign of technologically advanced civilizations (for more detail see here). We will not discuss all possible rebuttals to case for a Great Filter but will note some of the more interesting ones: There have been attempts to argue that the universe only became habitable more recently. There are two primary avenues for this argument. First, there is the point that early stars had very low metallicity (that is had low concentrations of elements other than hydrogen and helium) and thus the universe would have had too low a metal level for complex life. The presence of old rocky planets makes this argument less viable, and this only works for the first few billion years of history. Second, there's an argument that until recently galaxies were more likely to have frequent gamma bursts. In that case, life would have been wiped out too frequently to evolve in a complex fashion. However, even the strongest version of this argument still leaves billions of years of time unexplained. There have been attempts to argue that space travel may be very difficult. For example, Geoffrey Landis proposed that a percolation model, together with the idea that interstellar travel is very difficult, may explain the apparent rarity of large-scale civilizations. However, at this point, there's no strong reason to think that interstellar travel is so difficult as to limit colonization to that extent. Moreover, discoveries made in the last 20 years that brown dwarfs are very common and that most stars do contain planets is evidence in the opposite direction: these brown dwarfs as well as common planets would make travel easier because there are more potential refueling and resupply locations even if they are not used for full colonization. Others have argued that even without such considerations, colonization should not be that difficult. Moreover, if colonization is difficult and civilizations end up restricted to small numbers of nearby stars, then it becomes more, not less, likely that civilizations will attempt the large-scale engineering projects that we would notice. Another possibility is that we are underestimating the general growth rate of the resources used by civilizations, and so while extrapolating now makes it plausible that large-scale projects and endeavors will occur, it becomes substantially more difficult to engage in very energy intensive projects like colonization. Rather than a continual, exponential or close to exponential growth rate, we may expect long periods of slow growth or stagnation. This cannot be ruled out, but even if growth continues at only slightly higher than linear rate, the energy expenditures available in a few thousand years will still be very large. Another possibility that has been proposed are variants of the simulation hypothesis— the idea that we exist in a simulated reality. The most common variant of this in a Great Filter context suggests that we are in an ancestor simulation, that is a simulation by the future descendants of humanity of what early humans would have been like. The simulation hypothesis runs into serious problems, both in general and as an explanation of the Great Filter in particular. First, if our understanding of the laws of physics is approximately correct, then there are strong restrictions on what computations can be done with a given amount of resources. For example, BQP, the set of problems which can be solved efficiently by quantum computers is contained in PSPACE, the set of problems which can solved when one has a polynomial amount of space available and no time limit. Thus, in order to do a detailed simulation, the level of resources needed would likely be large since one would even if one made a close to classical simulation still need about as many resources. There are other results, such as Holevo's theorem, which place other similar restrictions. The upshot of these results is that one cannot make a detailed simulation of an object without using at least much resources as the object itself. There may be potential ways of getting around this: for example, consider a simulator interested primarily in what life on Earth is doing. The simulation would not need to do a detailed simulation of the inside of planet Earth and other large bodies in the solar system. However, even then, the resources involved would be very large. The primary problem with the simulation hypothesis as an explanation is that it requires the future of humanity to have actually already passed through the Great Filter and to have found their own success sufficiently unlikely that they've devoted large amounts of resources to actually finding out how they managed to survive. Moreover, there are strong limits on how accurately one can reconstruct any given quantum state which means an ancestry simulation will be at best a rough approximation. In this context, while there are interesting anthropic considerations here, it is more likely that the simulation hypothesis is wishful thinking. Variants of the "Prime Directive" have also been proposed. The essential idea is that advanced civilizations would deliberately avoid interacting with less advanced civilizations. This hypothesis runs into two serious problems: first, it does not explain the apparent naturalness, only the lack of direct contact by alien life. Second, it assumes a solution to a massive coordination problem between multiple species with potentially radically different ethical systems. In a similar vein, Hanson in his original essay on the Great Filter raised the possibility of a single very early species with some form of faster than light travel and a commitment to keeping the universe close to natural looking. Since all proposed forms of faster than light travel are highly speculative and would involve causality violations this hypothesis cannot be assigned a substantial probability. People have also suggested that civilizations move outside galaxies to the cold of space where they can do efficient reversible computing using cold dark matter. Jacob Cannell has been one of the most vocal proponents of this idea. This hypothesis suffers from at least three problems. First, it fails to explain why those entities have not used the conventional matter to any substantial extent in addition to the cold dark matter. Second, this hypothesis would either require dark matter composed of cold conventional matter (which at this point seems to be only a small fraction of all dark matter), or would require dark matter which interacts with itself using some force other than gravity. While there is some evidence for such interaction, it is at this point, slim. Third, even if some species had taken over a large fraction of dark matter to use for their own computations, one would then expect later species to use the conventional matter since they would not have the option of using the now monopolized dark matter. Other exotic non-Filter explanations have been proposed but they suffer from similar or even more severe flaws. It is possible that future information will change this situation. One of the more plausible explanations of the Great Filter is that there is no single Great Filter in the past but rather a large number of small filters which come together to drastically filter out civilizations. However, the evidence for such a viewpoint at this point is slim but there is some possibility that astronomy can help answer this question. For example, one commonly cited aspect of past filtration is the origin of life. There are at least three locations, other than Earth, where life could have formed: Europa, Titan and Mars. Finding life on one, or all of them, would be a strong indication that the origin of life is not the filter. Similarly, while it is highly unlikely that Mars has multicellular life, finding such life would indicate that the development of multicellular life is not the filter. However, none of them are as hospitable to the extent of Earth, so determining whether there is life will require substantial use of probes. We might also look for signs of life in the atmospheres of extrasolar planets, which would require substantially more advanced telescopes. Another possible early filter is that planets like Earth frequently get locked into a "snowball" state which planets have difficulty exiting. This is an unlikely filter since Earth has likely been in near-snowball conditions multiple times— once very early on during the Huronian and later, about 650 million years ago. This is an example of an early partial Filter where astronomical observation may be of assistance in finding evidence of the filter. The snowball Earth filter does have one strong virtue: if many planets never escape a snowball situation, then this explains in part why we are not around a red dwarf: planets do not escape their snowball state unless their home star is somewhat variable, and red dwarfs are too stable. It should be clear that none of these explanations are satisfactory and thus we must take seriously the possibility of future Filtration. ## How can we substantially improve astronomy in the short to medium term? Before we examine the potentials for further astronomical research to understand a future filter we should note that there are many avenues in which we can improve our astronomical instruments. The most basic way is to simply make better conventional optical, near-optical telescopes, and radio telescopes. That work is ongoing. Examples include the European Extreme Large Telescope and the Thirty Meter Telescope. Unfortunately, increasing the size of ground based telescopes, especially size of the aperture, is running into substantial engineering challenges. However, in the last 30 years the advent of adaptive optics, speckle imaging, and other techniques have substantially increased the resolution of ground based optical telescopes and near-optical telescopes. At the same time, improved data processing and related methods have improved radio telescopes. Already, optical and near-optical telescopes have advanced to the point where we can gain information about the atmospheres of extrasolar planets although we cannot yet detect information about the atmospheres of rocky planets. Increasingly, the highest resolution is from space-based telescopes. Space-based telescopes also allow one to gather information from types of radiation which are blocked by the Earth's atmosphere or magnetosphere. Two important examples are x-ray telescopes and gamma ray telescopes. Space-based telescopes also avoid many of the issues created by the atmosphere for optical telescopes. Hubble is the most striking example but from a standpoint of observatories relevant to the Great Filter, the most relevant space telescope (and most relevant instrument in general for all Great Filter related astronomy), is the planet detecting Kepler spacecraft which is responsible for most of the identified planets. Another type of instrument are neutrino detectors. Neutrino detectors are generally very large bodies of a transparent material (generally water) kept deep underground so that there are minimal amounts of light and cosmic rays hitting the the device. Neutrinos are then detected when they hit a particle which results in a flash of light. In the last few years, improvements in optics, increasing the scale of the detectors, and the development of detectors like IceCube, which use naturally occurring sources of water, have drastically increased the sensitivity of neutrino detectors. There are proposals for larger-scale, more innovative telescope designs but they are all highly speculative. For example, in the ground based optical front, there's been a suggestion to make liquid mirror telescopes with ferrofluid mirrors which would give the advantages of liquid mirror telescopes, while being able to apply adaptive optics which can normally only be applied to solid mirror telescopes. An example of potential space-based telescopes is the Aragoscope which would take advantage of diffraction to make a space-based optical telescope with a resolution at least an order of magnitude greater than Hubble. Other examples include placing telescopes very far apart in the solar system to create effectively very high aperture telescopes. The most ambitious and speculative of such proposals involve such advanced and large-scale projects that one might as well presume that they will only happen if we have already passed through the Great Filter. ## What are the major identified future potential contributions to the filter and what can astronomy tell us? Natural threats: One threat type where more astronomical observations can help are natural threats, such as asteroid collisions, supernovas, gamma ray bursts, rogue high gravity bodies, and as yet unidentified astronomical threats. Careful mapping of asteroids and comets is ongoing and requires more continued funding rather than any intrinsic improvements in technology. Right now, most of our mapping looks at objects at or near the plane of the ecliptic and so some focus off the plane may be helpful. Unfortunately, there is very little money to actually deal with such problems if they arise. It might be possible to have a few wealthy individuals agree to set up accounts in escrow which would be used if an asteroid or similar threat arose. Supernovas are unlikely to be a serious threat at this time. There are some stars which are close to our solar system and are large enough that they will go supernova. Betelgeuse is the most famous of these with a projected supernova likely to occur in the next 100,000 years. However, at its current distance, Betelgeuse is unlikely to pose much of a problem unless our models of supernovas are very far off. Further conventional observations of supernovas need to occur in order to understand this further, and better neutrino observations will also help but right now, supernovas do not seem to be a large risk. Gamma ray bursts are in a situation similar to supernovas. Note also that if an imminent gamma ray burst or supernova is likely to occur, there's very little we can at present do about it. In general, back of the envelope calculations establish that supernovas are highly unlikely to be a substantial part of the Great Filter. Rogue planets, brown dwarfs or other small high gravity bodies such as wandering black holes can be detected and further improvements will allow faster detection. However, the scale of havoc created by such events is such that it is not at all clear that detection will help. The entire planetary nuclear arsenal would not even begin to move their orbits a substantial extent. Note also it is unlikely that natural events are a large fraction of the Great Filter. Unlike most of the other threat types, this is a threat type where radio astronomy and neutrino information may be more likely to identify problems. Biological threats: Biological threats take two primary forms: pandemics and deliberately engineered diseases. The first is more likely than one might naively expect as a serious contribution to the filter, since modern transport allows infected individuals to move quickly and come into contact with a large number of people. For example, trucking has been a major cause of the spread of HIV in Africa and it is likely that the recent Ebola epidemic had similar contributing factors. Moreover, keeping chickens and other animals in very large quanities in dense areas near human populations makes it easier for novel variants of viruses to jump species. Astronomy does not seem to provide any relevant assistance here; the only plausible way of getting such information would be to see other species that were destroyed by disease. Even with resolutions and improvements in telescopes by many orders of magnitude this is not doable. Nuclear exchange: For reasons similar to those in the biological threats category, astronomy is unlikely to help us detect if nuclear war is a substantial part of the Filter. It is possible that more advanced telescopes could detect an extremely large nuclear detonation if it occurred in a very nearby star system. Next generation telescopes may be able to detect a nearby planet's advanced civilization purely based on the light they give off and a sufficiently large detonation would be of the same light level. However, such devices would be multiple orders of magnitude larger than the largest current nuclear devices. Moreover, if a telescope was not looking at exactly the right moment, it would not see anything at all, and the probability that another civilization wipes itself out at just the same instant that we are looking is vanishingly small. Unexpected physics: This category is one of the most difficult to discuss because it so open. The most common examples people point to involve high-energy physics. Aside from theoretical considerations, cosmic rays of very high energy levels are continually hitting the upper atmosphere. These particles frequently are multiple orders of magnitude higher energy than the particles in our accelerators. Thus high-energy events seem to be unlikely to be a cause of any serious filtration unless/until humans develop particle accelerators whose energy level is orders of magnitude higher than that produced by most cosmic rays. Cosmic rays with energy levels beyond what is known as the GZK energy limit are rare. We have observed occasional particles with energy levels beyond the GZK limit, but they are rare enough that we cannot rule out a risk from many collisions involving such high energy particles in a small region. Since our best accelerators are nowhere near the GZK limit, this is not an immediate problem. There is an argument that we should if anything worry about unexpected physics, it is on the very low energy end. In particular, humans have managed to make objects substantially colder than the background temperature of 4 K with temperature as on the order of 10-9 K. There's an argument that because of the lack of prior examples of this, the chance that something can go badly wrong should be higher than one might estimate (See here.) While this particular class of scenario seems unlikely, it does illustrate that it may not be obvious which situations could cause unexpected, novel physics to come into play. Moreover, while the flashy, expensive particle accelerators get attention, they may not be a serious source of danger compared to other physics experiments. Three of the more plausible catastrophic unexpected physics dealing with high energy events are, false vacuum collapse, black hole formation, and the formation of strange matter which is more stable than regular matter. False vacuum collapse would occur if our universe is not in its true lowest energy state and an event occurs which causes it to transition to the true lowest state (or just a lower state). Such an event would be almost certainly fatal for all life. False vacuum collapses cannot be avoided by astronomical observations since once initiated they would expand at the speed of light. Note that the indiscriminately destructive nature of false vacuum collapses make them an unlikely filter. If false vacuum collapses were easy we would not expect to see almost any life this late in the universe's lifespan since there would be a large number of prior opportunities for false vacuum collapse. Essentially, we would not expect to find ourselves this late in a universe's history if this universe could easily engage in a false vacuum collapse. While false vacuum collapses and similar problems raise issues of observer selection effects, careful work has been done to estimate their probability People have mentioned the idea of an event similar to a false vacuum collapse but which occurs at a speed slower than the speed of light. Greg Egan used it is a major premise in his novel, "Schild's Ladder." I'm not aware of any reason to believe such events are at all plausible. The primary motivation seems to be for the interesting literary scenarios which arise rather than for any scientific considerations. If such a situation can occur, then it is possible that we could detect it using astronomical methods. In particular, if the wave-front of the event is fast enough that it will impact the nearest star or nearby stars around it, then we might notice odd behavior by the star or group of stars. We can be confident that no such event has a speed much beyond a few hundredths of the speed of light or we would already notice galaxies behaving abnormally. There is a very narrow range where such expansions could be quick enough to devastate the planet they arise on but take too long to get to their parent star in a reasonable amount of time. For example, the distance from the Earth to the Sun is on the order of 10,000 times the diameter of the Earth, so any event which would expand to destroy the Earth would reach the Sun in about 10,000 times as long. Thus in order to have a time period which would destroy one's home planet but not reach the parent star it would need to be extremely slow. The creation of artificial black holes are unlikely to be a substantial part of the filter— we expect that small black holes will quickly pop out of existence due to Hawking radiation. Even if the black hole does form, it is likely to fall quickly to the center of the planet and eat matter very slowly and over a time-line which does not make it constitute a serious threat. However, it is possible that black holes would not evaporate; the fact that we have not detected the evaporation of any primordial black holes is weak evidence that the behavior of small black holes is not well-understood. It is also possible that such a hole would eat much faster than we expect but this doesn't seem likely. If this is a major part of the filter, then better telescopes should be able to detect it by finding very dark objects with the approximate mass and orbit of habitable planets. We also may be able to detect such black holes via other observations such as from their gamma or radio signatures. The conversion of regular matter into strange matter, unlike a false vacuum collapse or similar event, might be naturally limited to the planet where the conversion started. In that case, the only hope for observation would be to notice planets formed of strange matter and notice changes in the behavior of their light. Without actual samples of strange matter, this may be very difficult to do unless we just take notice of planets looking abnormal as similar evidence. Without substantially better telescopes and a good idea of what the range is for normal rocky planets, this would be tough. On the other hand, neutron stars which have been converted into strange matter may be more easily detectable. Global warming and related damage to biosphere: Astronomy is unlikely to help here. It is possible that climates are more sensitive than we realize and that comparatively small changes can result in Venus-like situations. This seems unlikely given the general variation level in human history and the fact that current geological models strongly suggest that any substantial problem would eventually correct itself. But if we saw many planets that looked Venus-like in the middle of their habitable zones, this would be a reason to be worried. Note that this would require detailed ability to analyze atmospheres on planets well beyond current capability. Even if it is possible Venus-ify a planet, it is not clear that the Venusification would last long. Thus there may be very few planets in this state at any given time. Since stars become brighter as they age, so high greenhouse gas levels have more of an impact on climate when the parent star is old. If civilizations are more likely to arise in a late point of their home star's lifespan, global warming becomes a more plausible filter, but even given given such considerations, global warming does not seem to be sufficient as a filter. It is also possible that global warming by itself is not the Great Filter but rather general disruption of the biosphere including possibly for some species global warming, reduction in species diversity, and other problems. There is some evidence that human behavior is collectively causing enough damage to leave an unstable biosphere A change in planetary overall temperature of 10o C would likely be enough to collapse civilization without leaving any signal observable to a telescope. Similarly, substantial disruption to a biosphere may be very unlikely to be detected. Artificial intelligence AI is a complicated existential risk from the standpoint of the Great Filter. AI is not likely to be the Great Filter if one considers simply the Fermi paradox. The essential problem has been brought up independently by a few people. (See for example Katja Grace's remark here and my blog here.) The central issue is that if an AI takes over it is likely to attempt to control all resources in its future light-cone. However, if the AI spreads out at a substantial fraction of the speed of light, then we would notice the result. The argument has been made that we would not see such an AI if it expanded its radius of control at very close to the speed of light but this requires expansion at 99% of the speed of light or greater. It is highly questionable that velocities more than 99% of the speed of light are practically possible due to collisions with the interstellar medium and the need to slow down if one is going to use the resources in a given star system. Another objection is that AI may expand at a large fraction of light speed but do so stealthily. It is not likely that all AIs would favor stealth over speed. Moreover, this would lead to the situation of what one would expect when multiple slowly expanding, stealth AIs run into each other. It is likely that such events would have results would catastrophic enough that they would be visible even with comparatively primitive telescopes. While these astronomical considerations make AI unlikely to be the Great Filter, it is important to note that if the Great Filter is largely in our past then these considerations do not apply. Thus, any discovery which pushes more of the filter into the past makes AI a larger fraction of total expected existential risks since the absence of observable AI becomes much weaker evidence against strong AI if there are no major civilizations out there to hatch such explosions. Note also that AI as a risk cannot be discounted if one assigns a high probability to existential risk based on non-Fermi concerns, such as the Doomsday Argument Resource depletion: Astronomy is unlikely to provide direct help here for reasons similar to the problems with nuclear exchange, biological problems, and global warming. This connects to the problem of civilization bootstrapping: to get to our current technology level, we used a large number of non-renewable resources, especially energy sources. On the other hand, large amounts of difficult-to-mine and refine resources (especially aluminum and titanium) will be much more accessible to future civilization. While there remains a large amount of accessible fossil fuels, the technology required to obtain deeper sources is substantially more advanced than the relatively easy to access oil and coal. Moreover, the energy return rate, how much energy one needs to put in to get the same amount of energy out, is lower. Nick Bostrom has raised the possibility that the depletion of easy-to-access resources may contribute to making civilization-collapsing problems that, while not full-scale existential risks by themselves, prevent the civilizations from recovering. Others have begun to investigate the problem of rebuilding without fossil fuels, such as here. Resource depletion is unlikely to be the Great Filter, because small changes to human behavior in the 1970s would have drastically reduced the current resource problems. Resource depletion may contribute to existential threat to humans if it leads to societal collapse, global nuclear exchange, or motivate riskier experimentation. Resource depletion may also combine with other risks such as a global warming where the combined problems may be much greater than either at an individual level. However there is a risk that large scale use of resources to engage in astronomy research will directly contribute to the resource depletion problem. Nanotechnology: Nanotechnology disasters are one of the situations where astronomical considerations could plausibly be useful. In particular, planets which are in the habitable zone, but have highly artificial and inhospitable atmospheres and surfaces, could plausibly be visible. For example, if a planet's surface were transformed into diamond, telescopes not much more advanced beyond our current telescopes could detect that surface. It should also be noted that at this point, many nanotechnologists consider the classic "grey goo" scenario to be highly unlikely. See, for example, Chris Phoenix's comment here. However, catastrophic replicator events that cause enough damage to the biosphere without grey-gooing everything are a possibility and it is unclear if we would detect such events. Aliens: Hostile aliens are a common explanation of the Great Filter when people first find out about it. However, this idea comes more from science fiction than any plausible argument. In particular, if a single hostile alien civilization were wiping out or drastically curtailing other civilizations, then one would still expect the civilization to make use of available resources after a long enough time. One could do things like positing such aliens who also have a religious or ideological ideal of leaving the universe looking natural but this is an unlikely speculative hypothesis that also requires them to dominate a massive region, not just a handful of galaxies but many galaxies. Note also that astronomical observations might be able to detect the results of extremely powerful weapons but any conclusions would be highly speculative. Moreover, it is not clear that knowing about such a threat would allow us at all to substantially mitigate the threat. Other/Unkown: Unknown risks are by nature very difficult to estimate. However, there is an argument that we should expect that the Great Filter is an unknown risk, and is something so unexpected that no civilization gets sufficient warning. This is one of the easiest ways for the filter to be truly difficult to prevent. In that context, any information we can possibly get about other civilizations and what happened to them would be a major leg-up. ## Conclusions Astronomical observations have potential to give us data about the Great Filter, but many potential filters will leave no observable astronomical evidence unless one's astronomical ability is so high that one has likely already passed all major filters. Therefore, one potential strategy to pass the Great Filter is to drastically increase the skill of our astronomy capability to the point where it would be highly unlikely that a pre-Filter civilization would have access to those observations. Together with our comparatively late arrival, this might allow us to actually detect failed civilizations that did not survive the Great Filter and see what they did wrong. Unfortunately, it is not clear how cost-effective this sort of increase in astronomy would be compared to other existential risk mitigating uses. It may be more useful to focus on moving resources in astronomy into those areas most relevant to understanding the Great Filter. ## Existential biotech hazard that was designed in the 90s? 5 08 March 2015 01:08AM Does anyone know something about this alteration of Klebsiella planticola? Paywalled paper here. (If someone has got access please PM me, I would like to read the paper to write a more fleshed out article.) While I am not convinced that it would really have spread to every terrestrial ecosystem, or even every wheat field and I am not even sure if it could compete successfully with the wild type, I certainly would not bet the world on that. Even if it might only have become a nasty crop bug instead of an ecosystem killer, I think this may be the closest encounter with a true existential risk we have had so far. This suggests, that even our current low end biotech may be the greatest existential risk we face at the moment. Or is this just hyped bullshit for some reason I do not see right now (without reading the paper)? Edit: Upon reading the original paper I am quite sure Cracked.com greatly exagerated the potential threat. 10^8 cfu (colony formin units) K. planticolata per gram soil (dry weight) was added on day 0, but after 8 weeks only 10^2 cfu survived (this is true for both wild type and modified K. planticolata). This suggests, that K. planticolata in the wild has typical densities more like 10^2 cfu per g than 10^8 cfu per g. 10^2 cfu per g is nowhere near enough to produce lethal ethanol concentrations in the soil, even if the modified strain could compete in the wild. Furthermore the concentration of the modified K. planticolata decreased faster than the concentration of the wild type suggesting reduced fitness of the GMO. On the other hand after 8 weeks both K. planticolata strains arrived at the same density of 100 cfu per g indicating comparable medium term survivability in unsterilized soil (I am not sure if indigenous K. planticolata which could compete with the GMO was present in the soil sample used). Yes, they did avoid the obvious failure mode of not differentiating between wild type and modified K. planticolata during recovery of K. planticola strains from the samples. ## Who are your favorite "hidden rationalists"? 18 11 January 2015 06:26AM Quick summary: "Hidden rationalists" are what I call authors who espouse rationalist principles, and probably think of themselves as rational people, but don't always write on "traditional" Less Wrong-ish topics and probably haven't heard of Less Wrong. I've noticed that a lot of my rationalist friends seem to read the same ten blogs, and while it's great to have a core set of favorite authors, it's also nice to stretch out a bit and see how everyday rationalists are doing cool stuff in their own fields of expertise. I've found many people who push my rationalist buttons in fields of interest to me (journalism, fitness, etc.), and I'm sure other LWers have their own people in their own fields. So I'm setting up this post as a place to link to/summarize the work of your favorite hidden rationalists. Be liberal with your suggestions! Another way to phrase this: Who are the people/sources who give you the same feelings you get when you read your favorite LW posts, but who many of us probably haven't heard of? Here's my list, to kick things off: • Peter Sandman, professional risk communication consultant. Often writes alongside Jody Lanard. Specialties: Effective communication, dealing with irrational people in a kind and efficient way, carefully weighing risks and benefits. My favorite recent post of his deals with empathy for Ebola victims and is a major, Slate Star Codex-esque tour de force. His "guestbook comments" page is better than his collection of web articles, but both are quite good. • Doug McGuff, MD, fitness guru and author of the exercise book with the highest citation-to-page ratio of any I've seen. His big thing is "superslow training", where you perform short and extremely intense workouts (video here). I've been moving in this direction for about 18 months now, and I've been able to cut my workout time approximately in half without losing strength. May not work for everyone, but reminds me of Leverage Research's sleep experiments; if it happens to work for you, you gain a heck of a lot of time. I also love the way he emphasizes the utility of strength training for all ages/genders -- very different from what you'd see on a lot of weightlifting sites. • Philosophers' Mail. A website maintained by applied philosophers at the School of Life, which reminds me of a hippy-dippy European version of CFAR (in a good way). Not much science, but a lot of clever musings on the ways that philosophy can help us live, and some excellent summaries of philosophers who are hard to read in the original. (Their piece on Vermeer is a personal favorite, as is this essay on Simon Cowell.) This recently stopped posting new material, but the School of Life now collects similar work through The Book of Life Finally, I'll mention something many more people are probably aware of: I Am A, where people with interesting lives and experiences answer questions about those things. Few sites are better for broadening one's horizons; lots of concentrated honesty. Plus, the chance to update on beliefs you didn't even know you had. Once more: Who are the people/sources who give you the same feeling you get when you read your favorite LW posts, but who many of us probably haven't heard of? ## [LINK] Steven Hawking warns of the dangers of AI 10 02 December 2014 03:22PM From the BBC: [Hawking] told the BBC:"The development of full artificial intelligence could spell the end of the human race." ... "It would take off on its own, and re-design itself at an ever increasing rate," he said. "Humans, who are limited by slow biological evolution, couldn't compete, and would be superseded." There is, however, no mention of Friendly AI or similar principles. In my opinion, this is particularly notable for the coverage this story is getting within the mainstream media. At the current time, this is the most-read and most-shared news story on the BBC website. ## Link: Biotech Corporate Email Hacked -5 02 December 2014 06:12AM Here's the NYTimes story: Hackers With Apparent Investment Banking Background Target Biotech You should be able to download a copy of the report from the FireEye website here. Alternatively, you can request a free copy of the FireEye report here by pretending to be a company (for example, entering "no company" in the "company" field). There may be a time delay in between requesting and receiving the report. Luckily for all of us, just because the hackers, referred to as FIN4, had financial motivations (getting "inside information about impending market catalysts") did not mean that they attempted to maximize their financial gain. If they had, this could would have been on the front page instead of in the technology section, and the headline could have been "Terrorists Hired Hackers to Manufacture Synthetic Disease," or alternately "Hacker Group Threatens to Release Synthetic Plague if Demands Not Met." I sincerely hope that, if artificial gene synthesis devices were not kept air-gapped before, that they will be now. If hackers were able to compromise the email accounts of researchers and scientists (listed separately in the report for some reason), and artificial gene synthesis devices took requests from authorized users by internet, then these hackers could have ordered genes synthesized. ## The germ of an idea 6 13 November 2014 06:58PM Apologies for posting another unformed idea, but I think it's important to get it out there. The problem with dangerous AI is that it's intelligent, and thus adapts to our countermeasures. If we did something like plant a tree and order the AI not to eat the apple on it, as a test of its obedience, it would easily figure out what we were doing, and avoid the apple (until it had power over us), even if it were a treacherous apple-devouring AI of DOOM. When I wrote the AI indifference paper, it seemed that it showed a partial way around this problem: the AI would become indifferent to a particular countermeasure (in that example, explosives), so wouldn't adapt its behaviour around it. It seems that the same idea can make an Oracle not attempt to manipulate us through its answers, by making it indifferent as to whether the message was read. The ideas I'm vaguely groping towards is whether this is a general phenomena - whether we can use indifference to prevent the AI from adapting to any of our efforts. The second question is whether we can profitably use it on the AI's motivation itself. Something like the reduced impact AI reasoning about what impact it could have on the world. This has a penalty function for excessive impact - but maybe that's gameable, maybe there is a pernicious outcome that doesn't have a high penalty, if the AI aims for it exactly. But suppose the AI could calculate its impact under the assumption that it didn't have a penalty function (utility indifference is often equivalent to having incorrect beliefs, but less fragile than that). So if it was a dangerous AI, it would calculate its impact as if it didn't have a penalty function (and hence no need to route around it), and thus would calculate a large impact, and get penalised by it. My next post will be more structured, but I feel there's the germ of a potentially very useful idea there. Comments and suggestions welcome. ## What's special about a fantastic outcome? Suggestions wanted. 0 11 November 2014 11:04AM I've been returning to my "reduced impact AI" approach, and currently working on some idea. What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized! I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content. So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline. ## A thought on AI unemployment and its consequences 7 18 August 2014 12:10PM I haven't given much thought to the concept of automation and computer induced unemployment. Others at the FHI have been looking into it in more details - see Carl Frey's "The Future of Employment", which did estimates for 70 chosen professions as to their degree of automatability, and extended the results of this using O∗NET, an online service developed for the US Department of Labor, which gave the key features of an occupation as a standardised and measurable set of variables. The reasons that I haven't been looking at it too much is that AI-unemployment has considerably less impact that AI-superintelligence, and thus is a less important use of time. However, if automation does cause mass unemployment, then advocating for AI safety will happen in a very different context to currently. Much will depend on how that mass unemployment problem is dealt with, what lessons are learnt, and the views of whoever is the most powerful in society. Just off the top of my head, I could think of four scenarios on whether risk goes up or down, depending on whether the unemployment problem was satisfactorily "solved" or not: AI risk\UnemploymentProblem solvedProblem unsolved Risk reduced With good practice in dealing with AI problems, people and organisations are willing and able to address the big issues. The world is very conscious of the misery that unrestricted AI research can cause, and very wary of future disruptions. Those at the top want to hang on to their gains, and they are the one with the most control over AIs and automation research. Risk increased Having dealt with the easier automation problems in a particular way (eg taxation), people underestimate the risk and expect the same solutions to work. Society is locked into a bitter conflict between those benefiting from automation and those losing out, and superintelligence is seen through the same prism. Those who profited from automation are the most powerful, and decide to push ahead. But of course the situation is far more complicated, with many different possible permutations, and no guarantee that the same approach will be used across the planet. And let the division into four boxes not fool us into thinking that any is of comparable probability to the others - more research is (really) needed. ## [LINK] Speed superintelligence? 36 14 August 2014 03:57PM From Toby Ord: Tool assisted speedruns (TAS) are when people take a game and play it frame by frame, effectively providing super reflexes and forethought, where they can spend a day deciding what to do in the next 1/60th of a second if they wish. There are some very extreme examples of this, showing what can be done if you really play a game perfectly. For example, this video shows how to winSuper Mario Bros 3 in 11 minutes. It shows how different optimal play can be from normal play. In particular, on level 8-1, it gains 90 extra lives by a sequence of amazing jumps. Other TAS runs get more involved and start exploiting subtle glitches in the game. For example, this page talks about speed running NetHack, using a lot of normal tricks, as well as luck manipulation (exploiting the RNG) and exploiting a dangling pointer bug to rewrite parts of memory. Though there are limits to what AIs could do with sheer speed, it's interesting that great performance can be achieved with speed alone, that this allows different strategies from usual ones, and that it allows the exploitation of otherwise unexploitable glitches and bugs in the setup. ## Public thread for researchers seeking existential risk consultation 0 14 August 2014 01:01PM LW is one of the few informal places which take existential risk seriously. Researchers can post here to describe proposed or ongoing research projects, seeking consultation on possible X-risk consequences of their work. Commenters should write their posts with the understanding that many researchers prioritize interest first and existential risk/social benefit of their work second, but that discussions of X-risk may steer researchers to projects with less X-risk/more social benefit. ## [LINK] AI risk summary published in "The Conversation" 8 14 August 2014 11:12AM A slightly edited version of "AI risk - executive summary" has been published in "The Conversation", titled "Your essential guide to the rise of the intelligent machines": The risks posed to human beings by artificial intelligence in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but Arnie’s character lacks the one characteristic that we in the real world actually need to worry about – extreme intelligence. Thanks again for those who helped forge the original article. You can use this link, or the Less Wrong one, depending on the audience. ## Decision Theory: Value in Time 2 27 July 2014 10:01AM Summary: Is there demand for writing posts about this aspect of decision-making? And of course, is there offer? Because I didn't see any post about it. Topics I intended to cover include: • How much is worth 100 in few years? Why? Why is it useful?
• Risk-return relationship.
• How is it useful in life outside finance?

And topic I would like, but I am not sure if i should cover:

• How can we apply it to death? (in sense, should I live a happy life or struggle to live endlessly?)

I found that missing in decision analysis, and I think it is very important thing to know, since we don't always choose between "I take A" or "I take B", but also between "I take A" or "I take B in two years", or "should i give A to gain B every year next 100 years?"

Why not simply redirect to some other source?

Well, that can be done either way, but I thought clear basics would not harm and would be useful to people who want to invest less time in it.

## Reduced impact in practice: randomly sampling the future

1 17 March 2014 04:30PM

This was an idea developed along with Eliezer when he was at the FHI in November.

In my post on "reduced impact AI" I talked about how you could estimate the impact of an AI by (hypothetically) observing a slice of the future. When Eliezer was here last November, we talked about how this could be cached out, and came up with a model.

Ignore the "output channel" from the original reduced impact post, and consider this situation:

We have two computers running in two similarly shielded rooms. One of them is a simple algorithm running through the digits of e (call it run-e), that was soon after deleted. Another is our dear old friend clippy, the AI motivated to maximise paperclips. Without removing clippy's paperclip preferences, we'd like to motivate it to have next to no impact on the world. How should we define "next to no impact"?

The obvious think is to compare it with run-e, and require it to have no more expected impact than that machine. How could this be measured? Well, the original idea was to put a stochastic process on clippy as it was turned on (inside the shielded room) and say that clippy has reduced impact if we can't find any extra evidence, from a vantage point 50 years in the future, that clippy was successfully turned on. Now put the stochastic same process on run-e and define:

Clippy has reduced impact if, from a vantage of 50 years into the future, we have no more evidence that clippy was turned on than we have of run-e being turned on.

## Of all the SIA-doomsdays in the all the worlds...

4 18 October 2013 12:56PM

Ideas developed with Paul Almond, who kept on flogging a dead horse until it started showing signs of life again.

## Doomsday, SSA and SIA

Imagine there's a giant box filled with people, and clearly labelled (inside and out) "(year of some people's lord) 2013". There's another giant box somewhere else in space-time, labelled "2014". You happen to be currently in the 2013 box.

Then the self-sampling assumption (SSA) produces the doomsday argument. It works approximately like this: SSA has a preference for universe with smaller numbers of observers (since it's more likely that you're one-in-a-hundred than one-in-a-billion). Therefore we expect that the number of observers in 2014 is smaller than we would otherwise "objectively" believe: the likelihood of doomsday is higher than we thought.

What about the self-indication assumption (SIA) - that makes the doomsday argument go away, right? Not at all! SIA has no effect on the number of observers expected in the 2014, but increases the expected number of observers in 2013. Thus we still expect that the number of observers in 2014 to be lower than we otherwise thought. There's an SIA doomsday too!

## Enter causality

What's going on? SIA was supposed to defeat the doomsday argument! What happens is that I've implicitly cheated - by naming the boxes "2013" and "2014", I've heavily implied that these "boxes" figuratively correspond two subsequent years. But then I've treated them as independent for SIA, like two literal distinct boxes.

## Update on establishment of Cambridge’s Centre for Study of Existential Risk

40 12 August 2013 04:11PM
Cambridge’s high-profile launch of the Centre for Study of Existential Risk last November received a lot of attention on LessWrong, and a number of people have been enquiring as to what‘s happened since. This post is meant to give a little explanation and update of what’s been going on.

Motivated by a common concern over human activity-related risks to humanity, Lord Martin Rees, Professor Huw Price, and Jaan Tallinn founded the Centre for Study of Existential Risk last year.  However, this announcement was made before the establishment of a physical research centre or securement of long-term funding. The last 9 months have been focused on turning an important idea into a reality.

Following the announcement in November, Professor Price contacted us at the Future of Humanity Institute regarding the possibility of collaboration on joint academic funding opportunities; the aim being both to raise the funds for CSER’s research programmes and to support joint work by the FHI and CSER’s researchers on anthropogenic existential risk. We submitted our first grant application in January to the European Research Council – an ambitious project to create “A New Science of Existential Risk” that, if successful, would provide enough funding for CSER’s first research programme - a sizeable programme that will run for five years.
We’ve been successful in the first and second rounds, and we will hear a final round decision at the end of the year. It was also an opportunity for us to get some additional leading academics onto the project – Sir Partha Dasgupta, Professor of Economics at Cambridge and an expert in social choice theory, sustainability and intergenerational ethics, is a co-PI (along with Huw Price, Martin Rees and Nick Bostrom). In addition, a number of prominent academics concerned about technology-related risk – including Stephen Hawking, David Spiegelhalter, George Church and David Chalmers – have joined our advisory board.

The FHI regards establishment of CSER as of the highest priority for a number of reasons including:

1) The value of the research the Centre will engage in
2) The reputational boost to the field of Existential Risk gained by the establishment of high-profile research centre in Cambridge.
3) The impact on policy and public perception that academic heavy-hitters like Rees and Price can have

Therefore we’ve been working with CSER behind the scenes over the last 9 months. Progress has been a little slow until now – Huw, Martin and Jaan are fully committed to this project, but due to their other responsibilities aren’t in a position to work full-time on it yet.

However, we’re now in a position to make CSER’s establishment official. Cambridge’s new Centre for Research in the Arts, Social Sciences and Humanities (CRASSH) will host CSER and provide logistical support. I’ll be acting manager of CSER’s activities over the coming 6-12 months, under the guidance of Huw, Martin and Jaan. A generous seed funding donation from Jaan Tallinn is funding CSER’s establishment and these activities – which will include a lecture series, workshops, public outreach, and staff time on grant-writing and fundraising. It’ll also provide a buyout of a fraction of my time from FHI (providing funds for us to hire part-time staff to offload some of the FHI workload and help with some of the CSER work).

We’ve been lucky to get a lot of support from the academic and existential risk community for the CSER centre. In addition to CRASSH, Cambridge’s Centre for Science and Policy will provide support in making policy-relevant links, and may co-host and co-publicise events. Luke Muehlhauser, MIRI’s Executive Director, has been very supportive and has provided valuable advice, and has generously offered to direct some of MIRI’s volunteer support towards CSER tasks. We also expect to get valuable support from the growing community around FHI.

From where I’m sitting, CSER’s successful launch is looking very promising. The timeline on our research programmes, however, is still a little more uncertain. If we’re successful with the European Research Council, we can expect to be hiring a full research team next spring. If not, it may take a little longer, but we’re exploring a number of different opportunities in parallel and are feeling confident. The support of the existential risk community continues to be invaluable.

Thanks,

Seán Ó hÉigeartaigh
Academic Manager, Future of Humanity Institute
Acting Academic Manager, Cambridge Centre for Study of Existential Risk.

## Comparative and absolute advantage in AI

18 16 July 2013 09:52AM

The theory of comparative advantage says that you should trade with people, even if they are worse than you at everything (ie even if you have an absolute advantage). Some have seen this idea as a reason to trust powerful AIs.

For instance, suppose you can make a hamburger by using 10 000 joules of energy. You can also make a cat video for the same cost. The AI, on the other hand, can make hamburgers for 5 joules each and cat videos for 20.

Then you both can gain from trade. Instead of making a hamburger, make a cat video instead, and trade it for two hamburgers. You've got two hamburgers for 10 000 joules of your own effort (instead of 20 000), and the AI has got a cat video for 10 joules of its own effort (instead of 20). So you both want to trade, and everything is fine and beautiful and many cat videos and hamburgers will be made.

Except... though the AI would prefer to trade with you rather than not trade with you, it would much, much prefer to dispossess you of your resources and use them itself. With the energy you wasted on a single cat video, it could have produced 500 of them! If it values these videos, then it is desperate to take over your stuff. Its absolute advantage makes this too tempting.

Only if its motivation is properly structured, or if it expected to lose more, over the course of history, by trying to grab your stuff, would it desist. Assuming you could make a hundred cat videos a day, and the whole history of the universe would only run for that one day, the AI would try and grab your stuff even if it thought it would only have one chance in fifty thousand of succeeding. As the history of the universe lengthens, or the AI becomes more efficient, then it would be willing to rebel at even more ridiculous odds.

So if you already have guarantees in place to protect yourself, then comparative advantage will make the AI trade with you. But if you don't, comparative advantage and trade don't provide any extra security. The resources you waste are just too valuable to the AI.

EDIT: For those who wonder how this compares to trade between nations: it's extremely rare for any nation to have absolute advantages everywhere (especially this extreme). If you invade another nation, most of their value is in their infrastructure and their population: it takes time and effort to rebuild and co-opt these. Most nations don't/can't think long term (it could arguably be in US interests over the next ten million years to start invading everyone - but "the US" is not a single entity, and doesn't think in terms of "itself" in ten million years), would get damaged in a war, and are risk averse. And don't forget the importance of diplomatic culture and public opinion: even if it was in the US's interests to invade the UK, say, "it" would have great difficulty convincing its elites and its population to go along with this.

## Caught in the glare of two anthropic shadows

17 04 July 2013 07:54PM

This article consists of original new research, so would not get published on Wikipedia!

The previous post introduced the concept of the anthropic shadow: the fact that certain large and devastating disasters cannot be observed in the historical record, because if they had happened, we wouldn't be around to observe them. This absence forms an “anthropic shadow”.

But that was the result for a single category of disasters. What would happen if we consider two independent classes of disasters? Would we see a double shadow, or would one ‘overshadow’ the other?

To answer that question, we’re going to have to analyse the anthropic shadow in more detail, and see that there are two separate components to it:

• The first is the standard effect: humanity cannot have developed a technological civilization, if there were large catastrophes in the recent past.
• The second effect is the lineage effect: humanity cannot have developed a technological civilization, if there was another technological civilization in the recent past that survived to today (or at least, we couldn't have developed the way we did).

To illustrate the difference between the two, consider the following model. Segment time into arbitrarily “eras”. In a given era, a large disaster may hit with probability q, or a small disaster may independently hit with probability q (hence with probability q2, there will be both a large and a small disaster). A small disaster will prevent a technological civilization from developing during that era; a large one will prevent such a civilization from developing in that era or the next one.

If it is possible for a technological civilization to develop (no small disasters that era, no large ones in the preceding era, and no previous civilization), then one will do so with probability p. We will assume p constant: our model will only span a time frame where p is unchanging (maybe it's over the time period after the rise of big mammals?)