Reduced impact AI: no back channels

13 Stuart_Armstrong 11 November 2013 02:55PM

A putative new idea for AI control; index here.

This post presents a further development of the reduced impact AI approach, bringing in some novel ideas and setups that allow us to accomplish more. It still isn't a complete approach - further development is needed, which I will do when I return to the concept - but may already allow certain types of otherwise dangerous AIs to be made safe. And this time, without needing to encase them in clouds of chaotic anti-matter!

Specifically, consider the following scenario. A comet is heading towards Earth, and it is generally agreed that a collision is suboptimal for everyone involved. Human governments have come together in peace and harmony to build a giant laser on the moon - this could be used to vaporise the approaching comet, except there isn't enough data to aim it precisely. A superintelligent AI programmed with a naive "save all humans" utility function is asked to furnish the coordinates to aim the laser. The AI is mobile and not contained in any serious way. Yet the AI furnishes the coordinates - and nothing else - and then turns itself off completely, not optimising anything else.

The rest of this post details an approach that could might make that scenario possible. It is slightly complex: I haven't found a way of making it simpler. Most of the complication comes from attempts to precisely define the needed counterfactuals. We're trying to bring rigour to inherently un-sharp ideas, so some complexity is, alas, needed. I will try to lay out the ideas with as much clarity as possible - first the ideas to constrain the AI, then ideas as to how to get some useful work out of it anyway. Classical mechanics (general relativity) will be assumed throughout. As in a previous post, the approach will be illustrated by a drawing of unsurpassable elegance; the rest of the post will aim to clarify everything in the picture:

continue reading »

Domesticating reduced impact AIs

9 Stuart_Armstrong 14 February 2013 04:59PM

About a year ago, I posted several ideas for "reduced impact AI" (what Nick Bostrom calls "domesticity"). I think the most promising approach was the third one, which I pompously titled "The information in the evidence". In this post, I'll attempt to put together a (non-realistic) example of this, to see if it's solid enough to build on. I'll be highlighting assumptions I'm making about the AI; please point out any implicit assumption that I missed, and any other weaknesses of the setup. For the moment, I'm more interested in "this doesn't work" than "this can't be done in practice" or "this can't be usefully generalised".

EDIT: It wasn't clear here, but any paperclip constructed by the reduced impact AI would be destroyed in the explosion, and the AIs would not be observed during the process. How to get useful work out of the AI will be the next step, if this model holds up.

Intuitive idea

For a reduced impact AI, we want an AI that can accomplish something, say building a paperclip, without it going out of control and optimising the universe. We want the future to be roughly the same whether or not the AI was turned on. Hence the piece of information "the AI was turned on" is not particularly important - if we didn't know, we wouldn't go far wrong in our predictions.

To enforce this we'll equip the AI with a two-piece motivation: a utility function U (causing it to build paperclips) and a penalty function R (which penalises the AI if its actions have a large future 'impact'). The challenge is to have a setup and a definition of R that implements this intuitive idea.

 

Reduced impact

Can we hope to define reduced impact? Doesn't any change percolate across all future time? Yes, but not all changes are the same - especially where our ability to predict them is concerned. Imagine an AI, confined to a box. While inside the box, it will build a paperclip. Then, in one hour's time, an antimatter bomb inside the box will explode, obliterating everything inside. We would intuitively say that whether the paperclip was built or not, would have very little impact afterwards. What we really mean is that we are unable to usefully use that information for future predictions. To make it even harder, surround the box by a large mass of antimatter gas, moving chaotically. In order to figure out the consequences of anything happening inside the box, we'd have to be able to unravel chaos: solve chaotic equation correctly and quantitatively, not just qualitatively. So our first assumption is that the AI can't unravel chaos.

continue reading »

The mathematics of reduced impact: help needed

10 Stuart_Armstrong 16 February 2012 02:23PM

A putative new idea for AI control; index here.

Thanks for help from Paul Christiano

If clippy, the paper-clip maximising AI, goes out of control, it would fill the universe with paper clips (or with better and better ways of counting the paper-clips it already has). If I sit down to a game with Deep Blue, then I know little about what will happen in the game, but I know it will end with me losing.

When facing a (general or narrow) superintelligent AI, the most relevant piece of information is what the AI's goals are. That's the general problem: there is no such thing as 'reduced impact' for such an AI. It doesn't matter who the next president of the United States is, if an AI wants to tile the universe with little smiley faces. But reduced impact is something we would dearly want to have - it gives us time to correct errors, perfect security systems, maybe even bootstrap our way to friendly AI from a non-friendly initial design. The most obvious path to coding reduced impact is to build a satisficer rather than a maximiser - but that proved unlikely to work.

But that ruthless maximising aspect of AIs may give us a way of quantifying 'reduced impact' - and hence including it in AI design. The central point being:

"When facing a (non-reduced impact) superintelligent AI, the AI's motivation is the most important fact we know."

Hence, conversely:

"If an AI has reduced impact, then knowing its motivation isn't particularly important. And a counterfactual world where the AI didn't exist, would not be very different from the one in which it does."

In this post, I'll be presenting some potential paths to formalising this intuition into something computable, giving us a numerical measure of impact that can be included in the AI's motivation to push it towards reduced impact. I'm putting this post up mainly to get help: does anyone know of already developed mathematical or computational tools that can be used to put these approaches on a rigorous footing?

continue reading »

Completeness, incompleteness, and what it all means: first versus second order logic

45 Stuart_Armstrong 16 January 2012 05:38PM

First order arithmetic is incomplete. Except that it's also complete. Second order arithmetic is more expressive - except when it's not - and is also incomplete and also complete, except when it means something different. Oh, and full second order-logic might not really be a logic at all. But then, first order logic has no idea what the reals and natural numbers are, especially when it tries to talk about them.

That was about the state of my confusion, and I set out to try and clear it up. Here I'll try and share an understanding of what is really going on with first and second order logic and why they differ so radically. It will be deliberately informal, so I won't be distinguishing between functions, predicates and subsets, and will be using little notation. It'll be exactly what I wish someone had told me before I started looking into the whole field. 

Meaningful Models

An old man starts talking to you about addition, subtraction and multiplication, and how they interact. You assume he was talking about the integers; turns out he means the rational numbers. The integers and the rationals are both models of addition, subtraction and multiplication, in that they obey all the properties that the old man set out. But notice though he had the rationals in mind, he didn't mention them at all, he just listed the properties, and the rational numbers turned out, very non-coincidentally, to obey them.

These models are generally taken to give meaning to the abstract symbols in the axioms - to give semantics to the syntax. In this view, "for all x,y xy=yx" is a series of elegant squiggles, but once we have the model of the integers (or the rationals) in mind, we realise that this means that multiplication is commutative.

continue reading »

Recommended Reading for Friendly AI Research

26 Vladimir_Nesov 09 October 2010 01:46PM

This post enumerates texts that I consider (potentially) useful training for making progress on Friendly AI/decision theory/metaethics.

continue reading »

The two meanings of mathematical terms

-2 JamesCole 15 June 2009 02:30PM

[edit: sorry, the formatting of links and italics in this is all screwy.  I've tried editing both the rich-text and the HTML and either way it looks ok while i'm editing it but the formatted terms either come out with no surrounding spaces or two surrounding spaces]

In the latest Rationality Quotes thread, CronoDAS  quoted  Paul Graham: 

It would not be a bad definition of math to call it the study of terms that have precise meanings.

Sort of. I started writing a this as a reply to that comment, but it grew into a post.
We've all heard of the story of  epicycles  and how before Copernicus came along the movement of the stars and planets were explained by the idea of them being attached to rotating epicycles, some of which were embedded within other larger, rotating epicycles (I'm simplifying the terminology a little here).
As we now know, the Epicycles theory was completely wrong.  The stars and planets were not at the distances from earth posited by the theory, or of the size presumed by it, nor were they moving about on some giant clockwork structure of rings.  
In the theory of Epicycles the terms had precise mathematical meanings.  The problem was that what the terms were meant to represent in reality were wrong.  The theory involved applied mathematical statements, and in any such statements the terms don’t just have their mathematical meaning -- what the equations say about them -- they also have an ‘external’ meaning concerning what they’re supposed to represent in or about reality.
Lets consider these two types of meanings.  The mathematical, or  ‘internal’, meaning of a statement like ‘1 + 1 = 2’ is very precise.  ‘1 + 1’ is  defined  as ‘2’, so ‘1 + 1 = 2’ is pretty much  the  pre-eminent fact or truth.  This is why mathematical truth is usually given such an exhaulted place.  So far so good with saying that mathematics is the study of terms with precise meanings. 
But what if ‘1 + 1 = 2’ happens to be used to describe something in reality?  Each of the terms will then take on a  second meaning -- concerning what they are meant to be representing in reality.  This meaning lies outside the mathematical theory, and there is no guarantee that it is accurate.
The problem with saying that mathematics is the study of terms with precise meanings is that it’s all to easy to take this as trivially true, because the terms obviously have a precise mathematical sense.  It’s easy to overlook the other type of meaning, to think there is just  the  meaning of the term, and that there is just the question of the precision of their meanings.   This is why we get people saying "numbers don’t lie".  
‘Precise’ is a synonym for "accurate" and "exact" and it is characterized by "perfect conformity to fact or truth" (according to WordNet).  So when someone says that mathematics is the study of terms with precise meanings, we have a tendancy to take it as meaning it’s the study of things that are accurate and true.  The problem with that is, mathematical precision clearly does not guarantee the precision -- the accuracy or truth -- of applied mathematical statements, which need to conform with reality.
There are quite subtle ways of falling into this trap of confusing the two meanings.  A believer in epicycles would likely have thought that it must have been correct because it gave mathematically correct answers.  And  it actually did .  Epicycles actually did precisely calculate the positions of the stars and planets (not absolutely perfectly, but in principle the theory could have been adjusted to give perfectly precise results).  If the mathematics was right, how could it be wrong?  
But what the theory was actually calcualting was not the movement of galactic clockwork machinery and stars and planets embedded within it, but the movement of points of light (corresponding to the real stars and planets) as those points of light moved across the sky.  Those positions were right but they had it conceptualised all wrong.  
Which begs the question of whether it really matters if the conceptualisation is wrong, as long as the numbers are right?  Isn’t instrumental correctness all that really matters?  We might think so, but this is not true.  How would Pluto’s existence been predicted  under an epicycles conceptualisation?  How would we have thought about space travel under such a conceptualisation?
The moral is, when we're looking at mathematical statements, numbers are representations, and representations can lie.



If you're interested in knowing more about epicycles and how that theory was overthrown by the Copernican one, Thomas Kuhn's quite readable  The Copernican Revolution  is an excellent resource.  

 

View more: Next