Domesticating reduced impact AIs

9 Stuart_Armstrong 14 February 2013 04:59PM

About a year ago, I posted several ideas for "reduced impact AI" (what Nick Bostrom calls "domesticity"). I think the most promising approach was the third one, which I pompously titled "The information in the evidence". In this post, I'll attempt to put together a (non-realistic) example of this, to see if it's solid enough to build on. I'll be highlighting assumptions I'm making about the AI; please point out any implicit assumption that I missed, and any other weaknesses of the setup. For the moment, I'm more interested in "this doesn't work" than "this can't be done in practice" or "this can't be usefully generalised".

EDIT: It wasn't clear here, but any paperclip constructed by the reduced impact AI would be destroyed in the explosion, and the AIs would not be observed during the process. How to get useful work out of the AI will be the next step, if this model holds up.

Intuitive idea

For a reduced impact AI, we want an AI that can accomplish something, say building a paperclip, without it going out of control and optimising the universe. We want the future to be roughly the same whether or not the AI was turned on. Hence the piece of information "the AI was turned on" is not particularly important - if we didn't know, we wouldn't go far wrong in our predictions.

To enforce this we'll equip the AI with a two-piece motivation: a utility function U (causing it to build paperclips) and a penalty function R (which penalises the AI if its actions have a large future 'impact'). The challenge is to have a setup and a definition of R that implements this intuitive idea.

 

Reduced impact

Can we hope to define reduced impact? Doesn't any change percolate across all future time? Yes, but not all changes are the same - especially where our ability to predict them is concerned. Imagine an AI, confined to a box. While inside the box, it will build a paperclip. Then, in one hour's time, an antimatter bomb inside the box will explode, obliterating everything inside. We would intuitively say that whether the paperclip was built or not, would have very little impact afterwards. What we really mean is that we are unable to usefully use that information for future predictions. To make it even harder, surround the box by a large mass of antimatter gas, moving chaotically. In order to figure out the consequences of anything happening inside the box, we'd have to be able to unravel chaos: solve chaotic equation correctly and quantitatively, not just qualitatively. So our first assumption is that the AI can't unravel chaos.

continue reading »

The mathematics of reduced impact: help needed

10 Stuart_Armstrong 16 February 2012 02:23PM

A putative new idea for AI control; index here.

Thanks for help from Paul Christiano

If clippy, the paper-clip maximising AI, goes out of control, it would fill the universe with paper clips (or with better and better ways of counting the paper-clips it already has). If I sit down to a game with Deep Blue, then I know little about what will happen in the game, but I know it will end with me losing.

When facing a (general or narrow) superintelligent AI, the most relevant piece of information is what the AI's goals are. That's the general problem: there is no such thing as 'reduced impact' for such an AI. It doesn't matter who the next president of the United States is, if an AI wants to tile the universe with little smiley faces. But reduced impact is something we would dearly want to have - it gives us time to correct errors, perfect security systems, maybe even bootstrap our way to friendly AI from a non-friendly initial design. The most obvious path to coding reduced impact is to build a satisficer rather than a maximiser - but that proved unlikely to work.

But that ruthless maximising aspect of AIs may give us a way of quantifying 'reduced impact' - and hence including it in AI design. The central point being:

"When facing a (non-reduced impact) superintelligent AI, the AI's motivation is the most important fact we know."

Hence, conversely:

"If an AI has reduced impact, then knowing its motivation isn't particularly important. And a counterfactual world where the AI didn't exist, would not be very different from the one in which it does."

In this post, I'll be presenting some potential paths to formalising this intuition into something computable, giving us a numerical measure of impact that can be included in the AI's motivation to push it towards reduced impact. I'm putting this post up mainly to get help: does anyone know of already developed mathematical or computational tools that can be used to put these approaches on a rigorous footing?

continue reading »

SIAI - An Examination

143 BrandonReinhart 02 May 2011 07:08AM

12/13/2011 - A 2011 update with data from the 2010 fiscal year is in progress. Should be done by the end of the week or sooner.

 

Disclaimer

Notes

  • Images are now hosted on LessWrong.com.
  • The 2010 Form 990 data will be available later this month.
  • It is not my intent to propagate misinformation. Errors will be corrected as soon as they are identified.

Introduction

Acting on gwern's suggestion in his Girl Scout Cookie analysis, I decided to look at SIAI funding. After reading about the Visiting Fellows Program and more recently the Rationality Boot Camp, I decided that the SIAI might be something I would want to support. I am concerned with existential risk and grapple with the utility implications. I feel that I should do more.

I wrote on the mini-boot camp page a pledge that I would donate enough to send someone to rationality mini-boot camp. This seemed to me a small cost for the potential benefit. The SIAI might get better at building rationalists. It might build a rationalist who goes on to solve a problem. Should I donate more? I wasn’t sure. I read gwern’s article and realized that I could easily get more information to clarify my thinking.

So I downloaded the SIAI’s Form 990 annual IRS filings and started to write down notes in a spreadsheet. As I gathered data and compared it to my expectations and my goals, my beliefs changed. I now believe that donating to the SIAI is valuable. I cannot hide this belief in my writing. I simply have it.

My goal is not to convince you to donate to the SIAI. My goal is to provide you with information necessary for you to determine for yourself whether or not you should donate to the SIAI. Or, if not that, to provide you with some direction so that you can continue your investigation.

continue reading »

How to Save the World

73 Louie 01 December 2010 05:17PM

Most of us want to make the world a better place. But what should we do if we want to generate the most positive impact possible? It’s definitely not an easy problem. Lots of smart, talented people with the best of intentions have tried to end war, eliminate poverty, cure disease, stop hunger, prevent animal suffering, and save the environment. As you may have noticed, we’re still working on all of those. So the track record of people trying to permanently solve the world's biggest problems isn’t that spectacular. This isn’t just a “look to your left, look to your right, one of you won’t be here next year”-kind of thing, this is more like “behold the trail of dead and dying who line the path before you, and despair”. So how can you make your attempt to save the world turn out significantly better than the generations of others who've tried this already?

It turns out there actually are a number of things we can do to substantially increase our odds of doing the most good. Here's a brief summary of some on the most crucial considerations that one needs to take into account when soberly approaching the task of doing the most good possible (aka "saving the world").

1. Patch your moral intuition (with math!) - Human moral intuition is really useful. But it tends to fail us at precisely the wrong times -- like when a problem gets too big [“millions of people dying? *yawn*”] or when it involves uncertainty [“you can only save 60% of them? call me when you can save everyone!”]. Unfortunately, these happen to be the defining characteristics of the world’s most difficult problems. Think about it. If your standard moral intuition were enough to confront the world’s biggest challenges, they wouldn’t be the world’s biggest challenges anymore... they’d be “those problems we solved already cause they were natural for us to understand”. If you’re trying to do things that have never been done before, use all the tools available to you. That means setting aside your emotional numbness by using math to feel what your moral intuition can’t. You can also do better by acquainting yourself with some of the more common human biases. It turns out your brain isn't always right. Yes, even your brain. So knowing the ways in which it systematically gets things wrong is a good way to avoid making the most obvious errors when setting out to help save the world.

2. Identify a cause with lots of leverage - It’s noble to try and save the world, but it’s ineffective and unrealistic to try and do it all on your own. So let’s start out by joining forces with an established organization who’s already working on what you care about. Seriously, unless you’re already ridiculously rich + brilliant or ludicrously influential, going solo or further fragmenting the philanthropic world by creating US-Charity#1,238,202 is almost certainly a mistake. Now that we’re all working together here, let's keep in mind that only a few charitable organizations are truly great investments -- and the vast majority just aren’t. So maximize your leverage by investing your time and money into supporting the best non-profits with the largest expected pay-offs.

continue reading »

Imperfect Levers

6 blogospheroid 17 November 2010 07:12PM

Related to : Lost Purposes, The importance of Goodhart's Law, Homo Hypocritus, SIAI's scary idea, Value Deathism

Summary : Whenever human beings seek to achieve goals far beyond their individual ability, they use leverage of some kind of another. Creating organizations to achieve goals is a very powerful source of leverage. However due to their nature, organizations are imperfect levers and the primary purpose is often lost. The inertia of present forms and processes dominates beyond its useful period. The present system of the world has many such imperfect organizations in power and any of them developing near-general intelligence without significant redesign of their utility function can be a source of existential risk/values risk.

continue reading »

Against picking up pennies

-1 [deleted] 13 December 2009 06:07AM

The eternally curious Tailsteak has written about how he always picks up pennies off the sidewalk. He's run a cost-benefit analysis and determined that it's better on average to pick up a penny than to pass it by. His mistake lies nowhere in the analysis itself; it's pretty much correct. His mistake is performing the analysis in the first place.

Pennies, you see, are easily the subject of scope insensitivity. When we come across a penny, we don't think, "Hey, that's something worth 0.05% of what I wish I had come across. I could buy a 25th of a gumball, a mouthful of an unhealthy carbonated beverage, a couple of seconds of footage on DVD, or enough gasoline to go a tenth of a mile." We think, "Hey, that's money," and we grab it.

The thing is, it's difficult to comprehend how little a penny is worth—we don't really have a separate concept for "mild happiness for a couple of seconds"—and we're likely to take risks that far outweigh the benefits. We don't think of bending over to pick up a penny as being a risky endeavour, but it's a penny. How much risk does it take to outweigh a penny? Surely the risk of "something unforeseen" easily does the job. Are you 99.999999% sure that picking up that penny won't kill you? You need a reason for every 9 (if you're ambivalent between using seven 9s and using nine 9s, you should use seven; the number of 9s is never arbitrary), and by the time you come up with eight reasons to pick up the penny, you'll have wasted several cents' worth of time. If you can reduce the probability of harm that far, I applaud you.

Of course, penny-grabbing doesn't have to involve actual pennies. Suppose that President Kodos of the Unified States of Somewhere (population 300 million) uses the word "idiot" in an important speech, causing the average citizen to scowl and ponder for one minute. Now, if a penny can buy you five seconds of happiness, and scowling and pondering brings the same amount of unhappiness, then that's twelve cents for every citizen, or 36 million dollars, of damage that Kodos just caused. Arguably, that's the value of a couple of human lives. As you can see, Kodos' decisions are extremely important. In this case, penny-grabbing would consist of anything less than trading precious seconds for precious human lives—if Kodos finds that he can save one life simply by going a few minutes out of his way, he should ignore it. (Photo ops and personal apologies are out of the question.) But keep in mind, of course, that avoiding saving someone's life because you have something better to do isn't rational unless you actually plan to do something better.

We're in danger. I must tell the others...

3 AllanCrossman 13 October 2009 11:06PM

... Oh, no! I've been shot!

— C3PO

A strange sort of paralysis can occur when risk-averse people (like me) decide that we're going to play it safe. We imagine the worst thing that could happen if we go ahead with our slightly risky plan, and this stops us from carrying it out.

One possible way of overcoming such paralysis is to remind yourself just how much danger you're actually in.

Humanity could be mutilated by nuclear war, biotechnology disasters, societal meltdown, environmental collapse, oppressive governments, disagreeable AI, or other horrors. On an individual level, anybody's life could turn sour for more mundane reasons, from disease to bereavement to divorce to unemployment to depression. The terrifying scenarios depend on your values, and differ from person to person. Those here who hope to live forever may die of old age, and then cryonics turns out not to work.

There must be some number X which is the probability of Really Bad Things happening to you. X is probably not a tiny figure, but instead significantly above zero, which encourages you to go ahead with whatever slightly risky plan you were contemplating, as long as it only nudges X upwards a little.

Admittedly, this tactic seems like a cheap hack that relies on an error in human reasoning - is nudging your danger level from .2 to .201 actually more acceptable than nudging it from 0 to .001? Perhaps not. Needless to say, a real rationalist ought to ignore all this and take the action with the highest expected value.

The mind-killer

23 ciphergoth 02 May 2009 04:49PM

Can we talk about changing the world? Or saving the world?

I think few here would give an estimate higher than 95% for the probability that humanity will survive the next 100 years; plenty might put a figure less than 50% on it. So if you place any non-negligible value on future generations whose existence is threatened, reducing existential risk has to be the best possible contribution to humanity you are in a position to make. Given that existential risk is also one of the major themes of Overcoming Bias and of Eliezer's work, it's striking that we don't talk about it more here.

One reason of course was the bar until yesterday on talking about artificial general intelligence; another factor are the many who state in terms that they are not concerned about their contribution to humanity. But I think a third is that many of the things we might do to address existential risk, or other issues of concern to all humanity, get us into politics, and we've all had too much of a certain kind of argument about politics online that gets into a stale rehashing of talking points and point scoring.

If we here can't do better than that, then this whole rationality discussion we've been having comes to no more than how we can best get out of bed in the morning, solve a puzzle set by a powerful superintelligence in the afternoon, and get laid in the evening. How can we use what we discuss here to be able to talk about politics without spiralling down the plughole?

I think it will help in several ways that we are a largely community of materialists and expected utility consequentialists. For a start, we are freed from the concept of "deserving" that dogs political arguments on inequality, on human rights, on criminal sentencing and so many other issues; while I can imagine a consequentialism that valued the "deserving" more than the "undeserving", I don't get the impression that's a popular position among materialists because of the Phineas Gage problem. We need not ask whether the rich deserve their wealth, or who is ultimately to blame for a thing; every question must come down only to what decision will maximize utility.

For example, framed this way inequality of wealth is not justice or injustice. The consequentialist defence of the market recognises that because of the diminishing marginal utility of wealth, today's unequal distribution of wealth has a cost in utility compared to the same wealth divided equally, a cost that we could in principle measure given a wealth/utility curve, and goes on to argue that the total extra output resulting from this inequality more than pays for it.

However, I'm more confident of the need to talk about this question than I am of my own answers. There's very little we can do about existential risk that doesn't have to do with changing the decisions made by public servants, businesses, and/or large numbers of people, and all of these activities get us straight into the world of politics, as well as the world of going out and changing minds. There has to be a way for rationalists to talk about it and actually make a difference. Before we start to talk about specific ideas to do with what one does in order to change or save the world, what traps can we defuse in advance?

View more: Prev | Next