Alexandros comments on Why not just write failsafe rules into the superintelligent machine? - Less Wrong

8 Post author: lukeprog 08 March 2011 09:07PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (79)

You are viewing a single comment's thread. Show more comments above.

Comment author: Yvain 09 March 2011 06:49:17PM *  4 points [-]

What is the difference between "a rule" and "what it wants"?

I'm interpreting this as the same question you wrote below as "What is the difference between a constraint and what is optimized?". Dave gave one example but a slightly different metaphor comes to my mind.

Imagine an amoral businessman in a country that takes half his earnings as tax. The businessman wants to maximize money, but has the constraint is that half his earnings get taken as tax. So in order to achieve his goal of maximizing money, the businessman sets up some legally permissible deal with a foreign tax shelter or funnels it to holding corporations or something to avoid taxes. Doing this is the natural result of his money-maximization goal, and satisfies the "pay taxes" constraint..

Contrast this to a second, more patriotic businessman who loved paying taxes because it helped his country, and so didn't bother setting up tax shelters at all.

The first businessman has the motive "maximize money" and the constraint "pay taxes"; the second businessman has the motive "maximize money and pay taxes".

From the viewpoint of the government, the first businessman is an unFriendly agent with a constraint, and the second businessman is a Friendly agent.

Does that help answer your question?

Comment author: Alexandros 09 March 2011 07:06:16PM 0 points [-]

What's stopping us from adding 'maintain constraints' to the agent's motive?

Comment author: CuSithBell 09 March 2011 07:18:07PM 0 points [-]

I agree (with this question) - what makes us so sure that "maximize paperclips" is the part of the utility function that the optimizer will really value? Couldn't it symmetrically decide that "maximize paperclips" is a constraint on "try not to murder everyone"?

Comment author: Yvain 09 March 2011 08:03:50PM *  5 points [-]

Asking what it really values is anthropomorphic. It's not coming up with loopholes around the "don't murder" people constraint because it doesn't really value it, or because the paperclip part is its "real" motive.

It will probably come up with loopholes around the "maximize paperclips" constraint too - for example, if "paperclip" is defined by something paperclip-shaped, it will probably create atomic-scale nanoclips because these are easier to build than full-scale human-sized ones, much to the annoyance of the office-supply company that built it.

But paperclips are pretty simple. Add a few extra constraints and you can probably specify "paperclip" to a degree that makes them useful for office supplies.

Human values are really complex. "Don't murder" doesn't capture human values at all - if Clippy encases us in carbonite so that we're still technically alive but not around to interfere with paperclip production, ve has fulfilled the "don't murder" imperative, but we would count this as a fail. This is not Clippy's "fault" for deliberately trying to "get around" the anti-murder constraint, it's our "fault" for telling ver "don't murder" when we really meant "don't do anything bad".

Building a genuine "respect" and "love" for the "don't murder" constraint in Clippy wouldn't help an iota against the carbonite scenario, because that's not murder and we forgot to tell ver there should be a constraint against that too.

So you might ask: okay, but surely there are a finite number of constraints that capture what we want. Just build an AI with a thousand or ten thousand constraints, "don't murder", "don't encase people in carbonite", "don't eat puppies", etc., make sure the list is exhaustive and that'll do it.

The first objection is that we might miss something. If the ancient Romans had made such a list, they might have forgotten "Don't release damaging radiation that gives us cancer." They certainly would have missed "Don't enslave people", because they were still enslaving people themselves - but this would mean it would be impossible to update the Roman AI for moral progress a few centuries down the line.

The second objection is that human morality isn't just a system of constraints. Even if we could tell Clippy "Limit your activities to the Andromeda Galaxy and send us the finished clips" (which I think would still be dangerous), any more interesting AI that is going to interact with and help humans needs to realize that sometimes it is okay to engage in prohibited actions if they serve greater goals (for example, it can disable a crazed gunman to prevent a massacre, even though disabling people is usually verboten).

So to actually capture all possible constraints, and to capture the situations in which those constraints can and can't be relaxed, we need to program all human values in. In that case we can just tell Clippy "Make paperclips in a way that doesn't cause what we would classify as a horrifying catastrophe" and ve'll say "Okay!" and not give us any trouble.

Comment author: lessdazed 02 July 2011 02:11:37PM 0 points [-]

They certainly would have missed "Don't enslave people", because they were still enslaving people themselves - but this would mean it would be impossible to update the Roman AI for moral progress a few centuries down the line.

Historical notes. The Romans had laws against enslaving the free-born and also allowed manumission.

Comment author: CuSithBell 09 March 2011 08:29:37PM 0 points [-]

Thanks, this all makes sense and I agree. Asking what it "really" values was intentionally anthropomorphic, as I was asking about what "it will want to work around constraints" really meant in practical terms, a claim which I believe was made by others.

I'm totally on board with "we can't express our actual desires with a finite list of constraints", just wasn't with "an AI will circumvent constraints for kicks".

I guess there's a subtlety to it - if you assign: "you get 1 utilon per paperclip that exists, and you are permitted to manufacture 10 paperclips per day", then we'll get problematic side effects as described elsewhere. If you assign "you get 1 utilon per paperclip that you manufacture, up to a maximum of 10 paperclips/utilons per day" or something along those lines, I'm not convinced that any sort of "circumvention" behavior would occur (though the AI would probably wipe out all life to ensure that nothing could adversely affect its future paperclip production capabilities, so the distinction is somewhat academic).

In any case, thanks for the detailed reply :)