Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

FAI Research Constraints and AGI Side Effects

14 JustinShovelain 03 June 2015 07:25PM

Ozzie Gooen and Justin Shovelain


Friendly artificial intelligence (FAI) researchers have at least two significant challenges. First, they must produce a significant amount of FAI research in a short amount of time. Second, they must do so without producing enough general artificial intelligence (AGI) research to result in the creation of an unfriendly artificial intelligence (UFAI). We estimate the requirements of both of these challenges using two simple models.

Our first model describes a friendliness ratio and a leakage ratio for FAI research projects. These provide limits on the allowable amount of artificial general intelligence (AGI) knowledge produced per unit of FAI knowledge in order for a project to be net beneficial.

Our second model studies a hypothetical FAI venture, which is responsible for ensuring FAI creation. We estimate necessary total FAI research per year from the venture and leakage ratio of that research. This model demonstrates a trade off between the speed of FAI research and the proportion of AGI research that can be revealed as part of it. If FAI research takes too long, then the acceptable leakage ratio may become so low that it would become nearly impossible to safely produce any new research.

continue reading »

New(ish) AI control ideas

24 Stuart_Armstrong 05 March 2015 05:03PM

EDIT: this post is no longer being maintained, it has been replaced by this new one.


I recently went on a two day intense solitary "AI control retreat", with the aim of generating new ideas for making safe AI. The "retreat" format wasn't really a success ("focused uninterrupted thought" was the main gain, not "two days of solitude" - it would have been more effective in three hour sessions), but I did manage to generate a lot of new ideas. These ideas will now go before the baying bloodthirsty audience (that's you, folks) to test them for viability.

A central thread running through could be: if you want something, you have to define it, then code it, rather than assuming you can get if for free through some other approach.

To provide inspiration and direction to my thought process, I first listed all the easy responses that we generally give to most proposals for AI control. If someone comes up with a new/old brilliant idea for AI control, it can normally be dismissed by appealing to one of these responses:

  1. The AI is much smarter than us.
  2. It’s not well defined.
  3. The setup can be hacked.
    • By the agent.
    • By outsiders, including other AI.
    • Adding restrictions encourages the AI to hack them, not obey them.
  4. The agent will resist changes.
  5. Humans can be manipulated, hacked, or seduced.
  6. The design is not stable.
    • Under self-modification.
    • Under subagent creation.
  7. Unrestricted search is dangerous.
  8. The agent has, or will develop, dangerous goals.
continue reading »

Reduced impact AI: no back channels

13 Stuart_Armstrong 11 November 2013 02:55PM

A putative new idea for AI control; index here.

This post presents a further development of the reduced impact AI approach, bringing in some novel ideas and setups that allow us to accomplish more. It still isn't a complete approach - further development is needed, which I will do when I return to the concept - but may already allow certain types of otherwise dangerous AIs to be made safe. And this time, without needing to encase them in clouds of chaotic anti-matter!

Specifically, consider the following scenario. A comet is heading towards Earth, and it is generally agreed that a collision is suboptimal for everyone involved. Human governments have come together in peace and harmony to build a giant laser on the moon - this could be used to vaporise the approaching comet, except there isn't enough data to aim it precisely. A superintelligent AI programmed with a naive "save all humans" utility function is asked to furnish the coordinates to aim the laser. The AI is mobile and not contained in any serious way. Yet the AI furnishes the coordinates - and nothing else - and then turns itself off completely, not optimising anything else.

The rest of this post details an approach that could might make that scenario possible. It is slightly complex: I haven't found a way of making it simpler. Most of the complication comes from attempts to precisely define the needed counterfactuals. We're trying to bring rigour to inherently un-sharp ideas, so some complexity is, alas, needed. I will try to lay out the ideas with as much clarity as possible - first the ideas to constrain the AI, then ideas as to how to get some useful work out of it anyway. Classical mechanics (general relativity) will be assumed throughout. As in a previous post, the approach will be illustrated by a drawing of unsurpassable elegance; the rest of the post will aim to clarify everything in the picture:

continue reading »

Domesticating reduced impact AIs

9 Stuart_Armstrong 14 February 2013 04:59PM

About a year ago, I posted several ideas for "reduced impact AI" (what Nick Bostrom calls "domesticity"). I think the most promising approach was the third one, which I pompously titled "The information in the evidence". In this post, I'll attempt to put together a (non-realistic) example of this, to see if it's solid enough to build on. I'll be highlighting assumptions I'm making about the AI; please point out any implicit assumption that I missed, and any other weaknesses of the setup. For the moment, I'm more interested in "this doesn't work" than "this can't be done in practice" or "this can't be usefully generalised".

EDIT: It wasn't clear here, but any paperclip constructed by the reduced impact AI would be destroyed in the explosion, and the AIs would not be observed during the process. How to get useful work out of the AI will be the next step, if this model holds up.

Intuitive idea

For a reduced impact AI, we want an AI that can accomplish something, say building a paperclip, without it going out of control and optimising the universe. We want the future to be roughly the same whether or not the AI was turned on. Hence the piece of information "the AI was turned on" is not particularly important - if we didn't know, we wouldn't go far wrong in our predictions.

To enforce this we'll equip the AI with a two-piece motivation: a utility function U (causing it to build paperclips) and a penalty function R (which penalises the AI if its actions have a large future 'impact'). The challenge is to have a setup and a definition of R that implements this intuitive idea.


Reduced impact

Can we hope to define reduced impact? Doesn't any change percolate across all future time? Yes, but not all changes are the same - especially where our ability to predict them is concerned. Imagine an AI, confined to a box. While inside the box, it will build a paperclip. Then, in one hour's time, an antimatter bomb inside the box will explode, obliterating everything inside. We would intuitively say that whether the paperclip was built or not, would have very little impact afterwards. What we really mean is that we are unable to usefully use that information for future predictions. To make it even harder, surround the box by a large mass of antimatter gas, moving chaotically. In order to figure out the consequences of anything happening inside the box, we'd have to be able to unravel chaos: solve chaotic equation correctly and quantitatively, not just qualitatively. So our first assumption is that the AI can't unravel chaos.

continue reading »

Don't Build Fallout Shelters

27 katydee 07 January 2013 02:38PM

Related: Circular Altruism

One thing that many people misunderstand is the concept of personal versus societal safety. These concepts are often conflated despite the appropriate mindsets being quite different.

Simply put, personal safety is personal.

In other words, the appropriate actions to take for personal safety are whichever actions reduce your chance of being injured or killed within reasonable cost boundaries. These actions are largely based on situational factors because the elements of risk that two given people experience may be wildly disparate.

For instance, if you are currently a young computer programmer living in a typical American city, you may want to look at eating better, driving your car less often, and giving up unhealthy habits like smoking. However, if you are currently an infantryman about to deploy to Afghanistan, you may want to look at improving your reaction time, training your situational awareness, and practicing rifle shooting under stressful conditions.

One common mistake is to attempt to preserve personal safety for extreme circumstances such as nuclear wars. Some individuals invest sizeable amounts of money into fallout shelters, years worth of emergency supplies, etc.

While it is certainly true that a nuclear war would kill or severely disrupt you if it occurred, this is not necessarily a fully convincing argument in favor of building a fallout shelter. One has to consider the cost of building a fallout shelter, the chance that your fallout shelter will actually save you in the event of a nuclear war, and the odds of a nuclear war actually occurring.

Further, one must consider the quality of life reduction that one would likely experience in a post-nuclear war world. It's also important to remember that, in the long run, your survival is contingent on access to medicine and scientific progress. Future medical advances may even extend your lifespan very dramatically, and potentially provide very large amounts of utility. Unfortunately, full-scale nuclear war is very likely to impair medicine and science for quite some time, perhaps permanently.

Thus even if your fallout shelter succeeds, you will likely live a shorter and less pleasant life than you would otherwise. In the end, building a fallout shelter looks like an unwise investment unless you are extremely confident that a nuclear war will occur shortly-- and if you are, I want to see your data!

When taking personal precautionary measures, worrying about such catastrophes is generally silly, especially given the risks we all take on a regular basis-- risks that, in most cases, are much easier to avoid than nuclear wars. Societal disasters are generally extremely expensive for the individual to protect against, and carry a large amount of disutility even if protections succeed.

To make matters worse, if there's a nuclear war tomorrow and your house is hit directly, you'll be just as dead as if you fall off your bike and break your neck. Dying in a more dramatic fashion does not, generally speaking, produce more disutility than dying in a mundane fashion does. In other words, when optimizing for personal safety, focus on accidents, not nuclear wars; buy a bike helmet, not a fallout shelter.

The flip side to this, of course, is that if there is a full-scale nuclear war, hundreds of millions-- if not billions-- of people will die and society will be permanently disrupted. If you die in a bike accident tomorrow, perhaps a half dozen people will be killed at most. So when we focus on non-selfish actions, the big picture is far, far, far more important. If you can reduce the odds of a nuclear war by one one-thousandth of one percent, more lives will be saved on average than if you can prevent hundreds of fatal accidents.

When optimizing for overall safety, focus on the biggest possible threats that you can have an impact on. In other words, when dealing with societal-level risks, your projected impact will be much higher if you try to focus on protecting society instead of protecting yourself.

In the end, building fallout shelters is probably silly, but attempting to reduce the risk of nuclear war sure as hell isn't. And if you do end up worrying about whether a nuclear war is about to happen, remember that if you can reduce the risk of said war-- which might be as easy as making a movie-- your actions will have a much, much greater overall impact than building a shelter ever could.

How to avoid dying in a car crash

76 michaelcurzi 17 March 2012 07:44PM

Aside from cryonics and eating better, what else can we do to live long lives?

Using this tool, I looked up the risks of death for my demographic group. As a 15-24 year old male in the United States, the most likely cause of my death is a traffic accident; and so I’m taking steps to avoid that. Below I have included the results of my research as well as the actions I will take to implement my findings. Perhaps my research can help you as well.1

Before diving into the results, I will note that this data took me one hour to collect. It’s definitely not comprehensive, and I know that working together, we can do much better. So if you have other resources or data-backed recommendations on how to avoid dying in a traffic accident, leave a comment below and I’ll update this post.

General points

Changing your behavior can reduce your risk of death in a car crash. A 1985 report on British and American crash data discovered that driver error, intoxication and other human factors contribute wholly or partly to about 93% of crashes.” Other drivers’ behavior matters too, of course, but you might as well optimize your own.2

Secondly, overconfidence appears to be a large factor in peoples’ thinking about traffic safety. A speaker for the National Highway Traffic Safety Association (NHTSA) stated that “Ninety-five percent of crashes are caused by human error… but 75% of drivers say they're more careful than most other drivers. Less extreme evidence for overconfidence about driving is presented here.

One possible cause for this was suggested by the Transport Research Laboratory, which explains that “...the feeling of being confident in more and more challenging situations is experienced as evidence of driving ability, and that 'proven' ability reinforces the feelings of confidence. Confidence feeds itself and grows unchecked until something happens – a near-miss or an accident.”

So if you’re tempted to use this post as an opportunity to feel superior to other drivers, remember: you’re probably overconfident too! Don’t just humbly confess your imperfections – change your behavior.

Top causes of accidents


Driver distraction is one of the largest causes of traffic accident deaths. The Director of Traffic Safety at the American Automobile Association stated that "The research tells us that somewhere between 25-50 percent of all motor vehicle crashes in this country really have driver distraction as their root cause." The NHTSA reports the number as 16%.

If we are to reduce distractions while driving, we ought to identify which distractors are the worst. One is cell phone use. My solution: Don’t make calls in the car, and turn off your phone’s sound so that you aren’t tempted.

I brainstormed other major distractors and thought of ways to reduce their distracting effects.

Distractor: Looking at directions on my phone as I drive

  • Solution: Download a great turn-by-turn navigation app (recommendations are welcome).
  • Solution: Buy a GPS.

Distractor: Texting, Facebook, slowing down to gawk at an accident, looking at scenery

  • Solution [For System 2]: Consciously accept that texting (Facebook, gawking, scenery) causes accidents.
  • Solution [For System 1]: Once a week, vividly and emotionally imagine texting (using Facebook, gawking at an accident) and then crashing & dying.
  • Solution: Turn off your phone’s sound while driving, so you won’t answer texts.

Distractor: Fatigue

  • Solution [For System 2]: Ask yourself if you’re tired before you plan to get in the car. Use Anki or a weekly review list to remember the association.
  • Solution [For System 1]: Once a week, vividly and emotionally imagine dozing off while driving and then dying.

Distractor: Other passengers

  • Solution: Develop an identity as someone who drives safely and thinks it’s low status to be distracting in the car. Achieve this by meditating on the commitment, writing a journal entry about it, using Anki, or saying it every day when you wake up in the morning.
  • Solution [In the moment]: Tell people to chill out while you’re driving. Mentally simulate doing this ahead of time, so you don’t hesitate to do it when it matters.

Distractor: Adjusting the radio

  • Solution: If avoiding using the car radio is unrealistic, minimize your interaction with it by only using the hotkey buttons rather than manually searching through channels.
  • Solution: If you’re constantly tempted to change the channel (like I am), buy an iPod cable so you can listen to your own music and set playlists that you like, so you won't constantly want to change the song.

A last interesting fact about distraction, from Wikipedia:

Recent research conducted by British scientists suggests that music can also have an effect [on driving]; classical music is considered to be calming, yet too much could relax the driver to a condition of distraction. On the other hand, hard rock may encourage the driver to step on the acceleration pedal, thus creating a potentially dangerous situation on the road.


The Road and Traffic Authority of New South Wales claims that “speeding… is a factor in about 40 percent of road deaths.” Data from the NHTSA puts the number at 30%.

Speeding also increases the severity of crashes; “in a 60 km/h speed limit area, the risk of involvement in a casualty crash doubles with each 5 km/h increase in travelling speed above 60 km/h.

Stop. Think about that for a second. I’ll convert it to the Imperial system for my fellow Americans: in a [37.3 mph] speed limit area, the risk of involvement in a casualty crash doubles with each [3.1 mph] increase in travelling speed above [37.3 mph].” Remember that next time you drive a 'mere' 5 mph over the limit.

Equally shocking is this paragraph from the Freakonomics blog:

Kockelman et al. estimated that the difference between a crash on a 55 mph limit road and a crash on a 65 mph one means a 24 percent increase in the chances the accident will be fatal. Along with the higher incidence of crashes happening in the first place, a difference in limit between 55 and 65 adds up to a 28 percent increase in the overall fatality count.

Driving too slowly can be dangerous too. An NHTSA presentation cites two studies that found a U-shaped relationship between vehicle speed and crash incidence; thus “Crash rates were lowest for drivers traveling near the mean speed, and increased with deviations above and below the mean.”

However, driving fast is still far more dangerous than driving slowly. This relationship appears to be exponential, as you can see on the tenth slide of the presentation.

  • Solution: Watch this 30 second video for a vivid comparison of head-on crashes at 60 km/hr (37 mph) and 100 km/hr (60 mph). Imagine yourself in the car. Imagine your tearful friends and family. 
  • Solution: Develop an identity as someone who drives close to the speed limit, by meditating on the commitment, writing a journal entry about it, using Anki, or saying it every day when you wake up in the morning.

Driving conditions

Driving conditions are another source of driving risk.

One factor I discovered was the additional risk from driving at night. Nationwide, 49% of fatal crashes happen at night, with a fatality rate per mile of travel about three times as high as daytime hours. (Source)

  • Solution: make an explicit effort to avoid driving at night. Use Anki to remember this association.
  • Solution: Look at your schedule and see if you can change a recurring night-time drive to the daytime.

Berkeley research on 1.4 million fatal crashes found that “fatal crashes were 14% more likely to happen on the first snowy day of the season compared with subsequent ones.” The suggested hypothesis is that people take at least a day to recalibrate their driving behavior in light of new snow. 

  • Solution: make an explicit effort to avoid driving on the first snowy day after a sequence of non-snowy ones. Use Anki to remember this association.

Another valuable factoid: 77% of weather-related fatalities (and 75% of all crashes!) involve wet pavement.

Statistics are available for other weather-related issues, but the data I found wasn’t adjusted for the relative frequencies of various weather conditions. That’s problematic; it might be that fog, for example, is horrendously dangerous compared to ice or slush, but it’s rarer and thus kills fewer people. I’m interested in looking at appropriately adjusted statistics. 

Other considerations

  • Teen drivers are apparently way worse at not dying in cars than older people. So if you’re a teenager, take the outside view and accept that you (not just ‘other dumb teenagers’) may need to take particular care when driving. Relevant information about teen driving is available here.

  • Alcohol use appeared so often during my research that I didn’t even bother including stats about it. Likewise for wearing a seatbelt.

  • Since I’m not in the market for a car, I didn’t look into vehicle choice as a way to decrease personal existential risk. But I do expect this to be relevant to increasing driving safety.

  • “The most dangerous month, it turns out, is August, and Saturday the most dangerous day, according to the National Highway Traffic Safety Administration.” I couldn’t tell whether this was because of increased amount of driving or an increased rate of crashes.

  • This site recommends driving with your hands at 9 and 3 for increased control. The same site claims that “Most highway accidents occur in the left lane” because the other lanes have “more ‘escape routes’ should a problem suddenly arise that requires you to quickly change lanes”, but I found no citation for the claim.

  • Bad driver behavior appears to significantly increase the risk of death in an accident, so: don't ride in car with people who drive badly or aggressively. I have a few friends with aggressive driving habits, and I’m planning to either a) tell them to drive more slowly when I’m in the car or b) stop riding in their cars.

Commenters' recommendations

I should note here that I have not personally verified anything posted below. Be sure to look at the original comment and do followup research before depending on these recommendations.

  • MartinB recommends taking a driving safety class every few years.

  • Dmytry suggests that bicycling may be good training for constantly keeping one's eyes on the road, though others argue that bicycling itself may be significantly more dangerous than driving anyway.

  • Various commenters suggested simply avoiding driving whenever possible. Living in a city with good public transportation is recommended.

  • David_Gerard recommends driving a bigger car with larger crumple zones (but not an SUV because they roll over). He also recommends avoiding motorcycles altogether and taking advanced driving courses.

  • Craig_Heldreth adds that everyone in the car should be buckled up, as even a single unbuckled passenger can collide with and kill other passengers in a crash. Even cargo as light as a laptop should be secured or put in the trunk.

  • JRMayne offers a list of recommendations that merit reading directly. DuncanS also offers a valuable list.

1All bolding in the data was added for emphasis by me.

2The report notes that "57% of crashes were due solely to driver factors, 27% to combined roadway and driver factors, 6% to combined vehicle and driver factors, 3% solely to roadway factors, 3% to combined roadway, driver, and vehicle factors, 2% solely to vehicle factors and 1% to combined roadway and vehicle factors.”

The mathematics of reduced impact: help needed

10 Stuart_Armstrong 16 February 2012 02:23PM

A putative new idea for AI control; index here.

Thanks for help from Paul Christiano

If clippy, the paper-clip maximising AI, goes out of control, it would fill the universe with paper clips (or with better and better ways of counting the paper-clips it already has). If I sit down to a game with Deep Blue, then I know little about what will happen in the game, but I know it will end with me losing.

When facing a (general or narrow) superintelligent AI, the most relevant piece of information is what the AI's goals are. That's the general problem: there is no such thing as 'reduced impact' for such an AI. It doesn't matter who the next president of the United States is, if an AI wants to tile the universe with little smiley faces. But reduced impact is something we would dearly want to have - it gives us time to correct errors, perfect security systems, maybe even bootstrap our way to friendly AI from a non-friendly initial design. The most obvious path to coding reduced impact is to build a satisficer rather than a maximiser - but that proved unlikely to work.

But that ruthless maximising aspect of AIs may give us a way of quantifying 'reduced impact' - and hence including it in AI design. The central point being:

"When facing a (non-reduced impact) superintelligent AI, the AI's motivation is the most important fact we know."

Hence, conversely:

"If an AI has reduced impact, then knowing its motivation isn't particularly important. And a counterfactual world where the AI didn't exist, would not be very different from the one in which it does."

In this post, I'll be presenting some potential paths to formalising this intuition into something computable, giving us a numerical measure of impact that can be included in the AI's motivation to push it towards reduced impact. I'm putting this post up mainly to get help: does anyone know of already developed mathematical or computational tools that can be used to put these approaches on a rigorous footing?

continue reading »

Safety Culture and the Marginal Effect of a Dollar

23 jimrandomh 09 June 2011 03:59AM

We spent an evening at last week's Rationality Minicamp brainstorming strategies for reducing existential risk from Unfriendly AI, and for estimating their marginal benefit-per-dollar. To summarize the issue briefly, there is a lot of research into artificial general intelligence (AGI) going on, but very few AI researchers take safety seriously; if someone succeeds in making an AGI, but they don't take safety seriously or they aren't careful enough, then it might become very powerful very quickly and be a threat to humanity. The best way to prevent this from happening is to promote a safety culture - that is, to convince as many artificial intelligence researchers as possible to think about safety so that if they make a breakthrough, they won't do something stupid.

We came up with a concrete (albeit greatly oversimplified) model which suggests that the marginal reduction in existential risk per dollar, when pursuing this strategy, is extremely high. The model is this: assume that if an AI is created, it's because one researcher, chosen at random from the pool of all researchers, has the key insight; and humanity survives if and only if that researcher is careful and takes safety seriously. In this model, the goal is to convince as many researchers as possible to take safety seriously. So the question is: how many researchers can we convince, per dollar? Some people are very easy to convince - some blog posts are enough. Those people are convinced already. Some people are very hard to convince - they won't take safety seriously unless someone who really cares about it will be their friend for years. In between, there are a lot of people who are currently unconvinced, but would be convinced if there were lots of good research papers about safety in machine learning and computer science journals, by lots of different authors.

Right now, those articles don't exist; we need to write them. And it turns out that neither the Singularity Institute nor any other organization has the resources - staff, expertise, and money to hire grad students - to produce very much research or to substantially alter the research culture. We are very far from the realm of diminishing returns. Let's make this model quantitative.

Let A be the probability that an AI will be created; let R the fraction of researchers that would be convinced to take safety seriously if there were a 100 good papers in about it in the right journals; and let C be the cost of one really good research paper. Then the marginal reduction in existential risk per dollar is A*R/100*C. The total cost of a grad student-year (including recruiting, management and other expenses) is about $100k. Estimate a 10% current AI risk, and estimate that 30% of researchers currently don't take safety seriously but would be convinced. That gives is a marginal existential risk reduction per dollar of 0.1*0.3/100*100k = 3*10^-9. Counting only the ~7 billion people alive today, and not any of the people who will be born in the future, this comes to a little over two expected lives saved per dollar.

That's huge. Enormous. So enormous that I'm instantly suspicious of the model, actually, so let's take note of some of the things it leaves out. First, the "one researcher at random determines the fate of humanity" part glosses over the fact that research is done in groups; but it's not clear whether adding in this detail should make us adjust the estimate up or down. It ignores all the time we have between now and the creation of the first AI, during which a safety culture might arise without intervention; but it's also easier to influence the culture now, while the field is still young, rather than later. In order for promoting AI research safety to not be an extraordinarily good deal for philanthropists, there would have to be at least an additional 10^3 penalty somewhere, and I can't find one.

As a result of this calculation, I will be thinking and writing about AI safety, attempting to convince others of its importance, and, in the moderately probable event that I become very rich, donating money to the SIAI so that they can pay others to do the same.

Advice for AI makers

7 Stuart_Armstrong 14 January 2010 11:32AM

A friend of mine is about to launch himself heavily into the realm of AI programming. The details of his approach aren't important; probabilities dictate that he is unlikely to score a major success. He's asked me for advice, however, on how to design a safe(r) AI. I've been pointing him in the right directions and sending him links to useful posts on this blog and the SIAI.

Do people here have any recommendations they'd like me to pass on? Hopefully, these may form the basis of a condensed 'warning pack' for other AI makers.

Addendum: Advice along the lines of "don't do it" is vital and good, but unlikely to be followed. Coding will nearly certainly happen; is there any way of making it less genocidally risky?