It worries me a tad that nobody in the discussion group corrected what I consider to be the obvious basic inaccuracy of the model.
Success on FAI is not a magical result of a researcher caring about safety. The researcher who would have otherwise first created AGI does not gain the power to create FAI just by being concerned about it. They would have to develop a stably self-improving AI which learned an understandable goal system which actually did what they wanted. This could be a completely different set of design technologies than what would have gone into something unstable that improved itself by ad-hoc methods well enough to go FOOM and end the game. The researcher who would have otherwise created AGI might not be good enough to do this. The best you might be able to convince them to do would be to retire from the game. It's a lot harder to convince someone to abandon the incredibly good idea they're enthusiastic about, and start over from scratch or leave the game, then to persuade people to be "concerned about safety", which is really cheap (you just put on a look of grave concern).
If I thought all you had to do to win was to convince the otherwise-first cre...
The main advantage of convincing mainstream AI people that FAI is a problem worth worrying about appears to be not that you will have mainstream AI people thinking twice before they build their AGI, but that you will then have mainstream AI people working on FAI. More people working on a given problem seems to make it massively more likely that the problem will be solved.
If there are rigorous arguments that FAI is worth worrying about, and that there are interesting questions about which people could be doing useful incremental research, then convincing people who work in universities to start doing this research has to be such a massive win than it would take something pretty huge to outweigh it - there are a lot of very clever people working in universities, massively more than will ever work at SingInst, and they already have a huge network in place to give them money to think about the things they find interesting.
We spent an evening at last week's Rationality Minicamp... We came up with a concrete (albeit greatly oversimplified) model...
Just to be clear: this model was drafted by a couple of mini-camp participants, not by the workshop as a whole, and isn't advocated by the Singularity Institute. For example, when I do my own back-of-the-envelopes I don't expect nearly a 30% increase in existential safety from convincing 30% of AI researchers that risk matters. Among other things, this is because there's a distance between "realize risk matters" and "successfully avoid creating UFAI" (much less "create FAI")", since sanity and know-how also play roles in AI design; and partly because there are more players than just AI researchers.
Still, it is good to get explicit models out there where they can be critiqued -- I just want to avoid folks having the impression that this is SingInst's model, or that it was taught at minicamp.
I agree that there is a lot of room for more and better academic work on this topic to reduce existential risk (including other channels like more academic research into AI safety strategies, influence on other actors like large corporations and governments, etc), but as I said at the minicamp, I think the assumptions of this model systematically lead to overestimates of effectiveness of this channel (EDIT: and would lead to overestimates of other strategies as well, including the "FAI team in a basement" strategy as I mention in my comment below).
One of the primary reasons for concern about AI risk is the likelihood of tradeoffs between safety and speed of development. Commercial or military competition make it plausible that quite extensive tradeoffs along these lines will be made, so that reckless (or self-deceived) projects are more likely to succeed first than more cautious ones. So the "random selection" assumption disproportionately favors safety.
The assumption that safety-conscious researchers always succeed in making any AI they produce safe is also fairly heroic and a substantial upward bias. There may be some cheap and simple safety measures that any ...
It's hard for me to imagine 100 good papers on the subject of AI safety (as opposed to say, FAI design). Once you have 10 good papers with variations of "AGI is dangerous, please be careful!", what can you say in the 11th one that you haven't already said? Also, 100 papers all carrying the same basic message, all funded by the same organization... that seems a bit surreal.
ETA: Sorry, I'm being overly skeptical and nitpicking. On reflection I think something like this probably is a good idea and should be pursued (unless money is a constraint and someone can come up with better use for it).
ETA2: If someone has done serious thinking about the feasibility of convincing a substantial fraction of AGI researchers about the need for safety, by "publishing X good quality papers", could they please explain their thoughts in more detail? (My mind keeps changing about whether this is feasible or not.)
It's hard for me to imagine 100 good papers on the subject of AI safety (as opposed to say, FAI design). Once you have 10 good papers with variations of "AGI is dangerous, please be careful!", what can you say in the 11th one that you haven't already said?
There's a lot to say at one layer remove - things like stability analyses of particular strategies for implementing goal systems, general safety measures such as fake network interfaces, friendliness analyses of hypothetical programs, and so on. A paper can impart the idea that safety is important, without being directly about safety. (In fact, there's some reason to suspect that articles one layer removed may be better than articles that are directly about safety).
No, for many reasons, including the following:
In general, there seems to have been substantial planning fallacy on the ease of getting skilled people to make progress on them via the Visiting Fellows program and other means. Versions of many of them have eventually come into being (as discussed below) but with great delays. And it seems that delivery of the planned reporting infrastructure failed badly. With respect to the individual papers:
.Containing superintelligence led to this paper which was accepted for a subsequently-cancelled conference and is now seeking a venue, as well as (I believe) an accepted Singularity Hypothesis chapter by Daniel Dewey.
The WBE-AGI one has lagged, but is a submission to the JCS special issue Chalmers' Singularity paper (by myself and Anders Sandberg), with presentations of the content at FHI, San Diego State University, and the AGI-11 workshop on the future of AI.
Collective Action Problems and AI Risk led to another Singularity Hypothesis submission.
AI risk philanthropy was taken on by an external author who never delivered, and subsequently had to be transferred to a different person who hasn't finished it yet.
There is an incarnation of the Singularity FAQ, and lukeprog, along with Anna Sala...
The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.
A human alone can't build a superintelligence. So, companies and other organisations are what we should mostly be concerned with. Targetting the engineering talent with the message is probably the wrong approach - you mostly want the managers and directors, since they are more likely to be the ones who willl dec...
Does anyone know of a historical example of a concerted effort to convince people in an academic discipline to pay attention to something, by funding a bunch of papers on or related to the topic?
If so, how well did it work?
As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.
Surely the most existential-risk-reduction-per-buck at this point is not "thinking and writing about AI safety", but thinking up more strategies like it in order to possibly find even better ones? Shouldn't SIAI (or perhaps FHI, depending on the comparative advantage between them) fund...
Summary:
One big penalty that was discussed is the likelihood of another researcher having the key insi...
After reading through the post and all the comments I think the most important moral is that a simple quantitative model thought up by very smart people in a context emphasizing rationality and examined and found lacking in significant sources of error (to the point that one of these smart people is willing to post it to Less Wrong main) can still ultimately be off by many orders of magnitude.
(Not to say that drafting a simple quantitative model isn't a great starting point, but instead that when interpreting such models one should assume that the margin of error is really really big, especially when pondering implications of the model, especially especially when pondering implications for decision policies.)
The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.
The "key insight" model seems deeply flawed. We know that the technical side of the problem involves performing inductive inference - which is a close cousin of stream compression. So, progress is very likely to look like progress with stream compression. Some low-hanging fruit - and then gradually diminishing returns. Rather like digging a big hole in the ground.
"Estimate a 10% current AI risk"... wait, where did that come from? You say "Let A be the probability that an AI will be created", but actually your A is the probability that an AI will be created which then goes on to wipe out humanity unless precautions are taken, but which will also fail to wipe out humanity if the proper precautions are taken.
Your estimate for that is a whopping 10%? Without any sort of substantiating argument??
...
Let's say I claim 0.000001% is a much more reasonable figure for this: what would be your rationale s...
Marginal taking-of-safety-seriously, as Eliezer points out, doesn't look good enough: you just delay the inevitable a little bit, if even that. On the other hand, establishing a widely-accepted consensus that AGI is as dangerous as A-bombs that blow up the whole universe might influence the field in more systematic ways (although it's unclear how, and achieving this goal doesn't look plausible).
Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?
...if there were a 100 good papers in about it in the right journals;
Just one paper (AI safety or FAI design)...I will be very impressed. I will donate a minimum of $10 ($20 for a technical paper on FAI design) per peer-reviewed research paper per journal to the SIAI.
I doubt I'll have to donate even once within the next 50 years. But I would be happy to be proven wrong.
Much of the dispersion is caused by the lack of unrestricted funds (and lack of future funding guarantees). Since we don't have enough funding from private philanthropists, we have to chase academic funding pots, and that then forces us to do some work that is less relevant to the important problems we would rather be working on. It would be unfortunate if potential private funders then looked at the fact that we've done some less-relevant work as a reason not to give.
A high fraction. "A dollar's worth of research" is not a well-defined quantity - that is, the worth of the research produced by a dollar varies a lot depending on whom the dollar is given to. I like to think FHI is good at converting dollars into research. The kind of research I'd prefer to do with unrestricted funds at the moment probably coincides pretty well with what a person with SIAI-typical estimates would prefer, though what can be researched also depends on the capabilities and interests of the research staff one can recruit. (There are various tradeoffs here - e.g. a weaker researcher who has a long record of working in this area or taking a chance with a slighly stronger researcher and risk that she will do irrelevant work? headhunting somebody who is already actively contributing to the area or attempt to involve a new mind who would otherwise not have contributed? etc.)
There are also indirect effects, which might lead to the fraction being larger than one - for example, if discussions, conferences, and various kinds of influence encourage external researchers to enter the field. FHI does some of that, as does the SIAI.
The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously.
I contest this use of the term "safety". If your goal is for humanity to survive, say that your goal is for humanity to survive. Not to "promote safety".
"Safety" means avoiding certain bad outcomes. By using the word "safety", you're trying to sneak past us the assumption...
humanity survives if and only if that researcher is careful and takes safety seriously
Here's where I'd stick in the 10^-3 penalty. It's reasonable to assume that taking safety seriously will keep you safe from accidental leaks of toxic chemicals, deadly viruses, etc. because these are well-understood phenomena that pose a single, predictable risk. If you can keep the muriatic acid off your skin, it won't burn you. If you can keep the swine flu out of your lungs, it won't infect you.
A truly general AI, though, almost by definition, would be able to thin...
You focus on visibly HAL-like or Skynet-like AI - the sort of thing that AI researchers produce as demos. However, we have large, smart, durable, existing entities (businesses and other computer+human teams) that are continuously getting smarter (and entrenching themselves deeper into our society) by automating their existing business practices.
I don't advocate trying to stop business automation, or humans organizing themselves into better and better teams; I think that would be throwing the baby out with the bathwater. However, I do think "business ...
We spent an evening at last week's Rationality Minicamp brainstorming strategies for reducing existential risk from Unfriendly AI, and for estimating their marginal benefit-per-dollar. To summarize the issue briefly, there is a lot of research into artificial general intelligence (AGI) going on, but very few AI researchers take safety seriously; if someone succeeds in making an AGI, but they don't take safety seriously or they aren't careful enough, then it might become very powerful very quickly and be a threat to humanity. The best way to prevent this from happening is to promote a safety culture - that is, to convince as many artificial intelligence researchers as possible to think about safety so that if they make a breakthrough, they won't do something stupid.
We came up with a concrete (albeit greatly oversimplified) model which suggests that the marginal reduction in existential risk per dollar, when pursuing this strategy, is extremely high. The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously. In this model, the goal is to convince as many researchers as possible to take safety seriously. So the question is: how many researchers can we convince, per dollar? Some people are very easy to convince - some blog posts are enough. Those people are convinced already. Some people are very hard to convince - they won't take safety seriously unless someone who really cares about it will be their friend for years. In between, there are a lot of people who are currently unconvinced, but would be convinced if there were lots of good research papers about safety in machine learning and computer science journals, by lots of different authors.
Right now, those articles don't exist; we need to write them. And it turns out that neither the Singularity Institute nor any other organization has the resources - staff, expertise, and money to hire grad students - to produce very much research or to substantially alter the research culture. We are very far from the realm of diminishing returns. Let's make this model quantitative.
Let A be the probability that an AI will be created; let R the fraction of researchers that would be convinced to take safety seriously if there were a 100 good papers in about it in the right journals; and let C be the cost of one really good research paper. Then the marginal reduction in existential risk per dollar is A*R/100*C. The total cost of a grad student-year (including recruiting, management and other expenses) is about $100k. Estimate a 10% current AI risk, and estimate that 30% of researchers currently don't take safety seriously but would be convinced. That gives is a marginal existential risk reduction per dollar of 0.1*0.3/100*100k = 3*10^-9. Counting only the ~7 billion people alive today, and not any of the people who will be born in the future, this comes to a little over two expected lives saved per dollar.
That's huge. Enormous. So enormous that I'm instantly suspicious of the model, actually, so let's take note of some of the things it leaves out. First, the "one researcher at random determines the fate of humanity" part glosses over the fact that research is done in groups; but it's not clear whether adding in this detail should make us adjust the estimate up or down. It ignores all the time we have between now and the creation of the first AI, during which a safety culture might arise without intervention; but it's also easier to influence the culture now, while the field is still young, rather than later. In order for promoting AI research safety to not be an extraordinarily good deal for philanthropists, there would have to be at least an additional 10^3 penalty somewhere, and I can't find one.
As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.