Safety Culture and the Marginal Effect of a Dollar

jimrandomh

31 Safety Culture and the Marginal Effect of a Dollar

9th Jun 2011

3 min read

31

We spent an evening at last week's Rationality Minicamp brainstorming strategies for reducing existential risk from Unfriendly AI, and for estimating their marginal benefit-per-dollar. To summarize the issue briefly, there is a lot of research into artificial general intelligence (AGI) going on, but very few AI researchers take safety seriously; if someone succeeds in making an AGI, but they don't take safety seriously or they aren't careful enough, then it might become very powerful very quickly and be a threat to humanity. The best way to prevent this from happening is to promote a safety culture - that is, to convince as many artificial intelligence researchers as possible to think about safety so that if they make a breakthrough, they won't do something stupid.

We came up with a concrete (albeit greatly oversimplified) model which suggests that the marginal reduction in existential risk per dollar, when pursuing this strategy, is extremely high. The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously. In this model, the goal is to convince as many researchers as possible to take safety seriously. So the question is: how many researchers can we convince, per dollar? Some people are very easy to convince - some blog posts are enough. Those people are convinced already. Some people are very hard to convince - they won't take safety seriously unless someone who really cares about it will be their friend for years. In between, there are a lot of people who are currently unconvinced, but would be convinced if there were lots of good research papers about safety in machine learning and computer science journals, by lots of different authors.

Right now, those articles don't exist; we need to write them. And it turns out that neither the Singularity Institute nor any other organization has the resources - staff, expertise, and money to hire grad students - to produce very much research or to substantially alter the research culture. We are very far from the realm of diminishing returns. Let's make this model quantitative.

Let A be the probability that an AI will be created; let R the fraction of researchers that would be convinced to take safety seriously if there were a 100 good papers in about it in the right journals; and let C be the cost of one really good research paper. Then the marginal reduction in existential risk per dollar is A*R/100*C. The total cost of a grad student-year (including recruiting, management and other expenses) is about $100k. Estimate a 10% current AI risk, and estimate that 30% of researchers currently don't take safety seriously but would be convinced. That gives is a marginal existential risk reduction per dollar of 0.1*0.3/100*100k = 3*10^-9. Counting only the ~7 billion people alive today, and not any of the people who will be born in the future, this comes to a little over two expected lives saved per dollar.

That's huge. Enormous. So enormous that I'm instantly suspicious of the model, actually, so let's take note of some of the things it leaves out. First, the "one researcher at random determines the fate of humanity" part glosses over the fact that research is done in groups; but it's not clear whether adding in this detail should make us adjust the estimate up or down. It ignores all the time we have between now and the creation of the first AI, during which a safety culture might arise without intervention; but it's also easier to influence the culture now, while the field is still young, rather than later. In order for promoting AI research safety to not be an extraordinarily good deal for philanthropists, there would have to be at least an additional 10^3 penalty somewhere, and I can't find one.

As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.

Personal Blog

31

New Comment

Rendering 0/110 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:12 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

31 Safety Culture and the Marginal Effect of a Dollar

by jimrandomh

9th Jun 2011

3 min read

110

31

Personal Blog

31

New Comment

Rendering 0/110 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:12 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from jimrandomh

Curated and popular this week

110Comments

110

Comment Permalink

jimrandomh15y30

It is very easy to dispel any such doubts, all he would have to do is publish some technical paper that manages to survive peer-review, thereby substantiate his claims and prove that he is qualified.

You seem to have the idea that this is all about Eliezer Yudkowsky. In actual fact, he wasn't at the meeting where we came up with the model I described in this article, he's influential but doesn't control SIAI, and the existential risk issue is bigger than SIAI and a lot bigger than any one person. Most of the people involved think AI risk is important based on their own reasoning, not based on trusting Eliezer. Personally, I don't really care whether he's qualified, because I consider myself qualified enough to judge his arguments (or anonymous arguments) directly. What may be throwing you off is that he's extremely visible - he's the public face of SIAI to a lot of people - because he's a prolific writer, and because he optimizes his writing to get lots of people to read it.

Journals are actually very bad for getting read by non-specialists, and Eliezer's specialized his writing skill for presenting to smart laymen, rather than academics. Nevertheless, other authors have written and published papers about AI risk have been published. The issue at hand right now is getting into prestigious machine learning and computer science journals, rather than philosophy journals, so that the right specialists will read them. That's much more difficult, because their editors think of them as having narrow topics that don't include philosophy or futurism.

XiXiDu15y10

Most of the people involved think AI risk is important based on their own reasoning, not based on trusting Eliezer. Personally, I don't really care whether he's qualified, because I consider myself qualified enough to judge his arguments (or anonymous arguments) directly.

As someone who is still acquiring a basic education I have to rely on some amount of intuition and trust in peer-review. Here I give a lot of weight to actual, provable success, recognition, and substantial evidence in the form of a real world demonstration of intelligence and skill.

The... (read more)

See in context