The Friendly AI Game

bentarm

At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.

Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

DISCLAIMER (probably unnecessary, given the audience) - I think it is unlikely that anyone will manage to come up with a formally stated utility function for which none of us can figure out a way in which it could go hideously wrong. However, if they do so, this does NOT constitute a proof of Friendliness and I 100% do not endorse any attempt to implement an AI with said utility function.

(*) I'm slightly worried that it might have the opposite effect, as people build more and more complicated conjunctions of desires to overcome the objections that we've already seen, and start to think the problem comes down to nothing more than writing a long list of special cases but, on balance, I think that's likely to have less of an effect than just seeing how naive suggestions for Friendliness can be hideously broken.

There are three reasons I suggest playing this game. In descending order of importance, they are:

It sounds like fun
It might help to convince people that the Friendly AI problem is hard(*).
We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try

I believe the idea is that the AI will need to calculate the CEV, not the programmers (or it's not CEV). And the AI will have a whole lot more statistical data to calculate the CEV of humanity than the CEV of individual contributors.

The programmers want the AI to calculate CEV because they expect CEV to be something they will like. We can't calculate CEV ourselves, but that doesn't mean we don't know any of CEV's (expected) properties.

However, we might be wrong about what CEV will turn out to be like, and we may come to regret pre-committing to CEV. That's why I think we should prefer CEV, because we can predict it better.

So you want hard-coded compromises that opposes and overrides what these people would collectively prefer to do if they were more intelligent, more competent and more self-aware?

What I meant was that they might oppose and override some of the input to the CEV from the rest of humanity.

However, it might also be a good idea to override some of your own CEV results, because we don't know in advance what the CEV will be. We define the desired result as "the best possible extrapolation", but our implementation may produce something different. It's very dangerous to precommit the whole future universe to something you don't yet know at the moment of precommitment (my point number 1). So, you'd want to include overrides about things you're certain should not be in the CEV.

Do you believe that fundamentalist religion would exist if fundamentalist religionists believed that their religion was false, and were also completely self-aware?

This is a misleading question.

If you are certain that the CEV will decide against fundamentalist religion, you should not oppose precommitting the AI to oppose fundamentalist religion, because you're certain this won't change the outcome. If you don't want to include this modification to the AI, that means you 1) accept there is a possibility of religion being part of the CEV, and 2) want to precommit to living with that religion if it is part of the CEV.

Why do you think a CEV (which essentially means what people would want if they were as intelligent as the AI) would support a dangerous meme?

Maybe intelligent people like dangerous memes. I don't know, because I'm not yet that intelligent. I do know though that having high intelligence doesn't imply anything about goals or morals.

Broadly, this question is similar to "why do you think this brilliant AI-genie might misinterpret our request to alleviate world hunger?"

I don't think that the 9999 first contributors get to vote on whether they'll accept a donation from the 10,000th one.

Why not? If they're controlling the project at that point, they can make that decision.

And unless you believe these 10,000 people can create and defend their own country BEFORE the AI gets created, I'd urge not being vocal about them excluding everyone else, when developments in AI become close enough that the whole world starts paying serious attention.

I'm not being vocal about any actual group I may know of that is working on AI :-)

I might still want to be vocal about my approach, and might want any competing groups to adopt it. I don't have good probabilitiy estimates on this, but it might be the case that I would prefer CEV to CEV.

That's why CEV is far better than CEV.

Why are you certain of this? At the very least it depends on who the person contributing money is.

"Humanity" includes a huge variety of different people. Depending on the CEV it may also include an even wider variety of people who lived in the past and counterfactuals who might live in the future. And the CEV, as far as I know, is vastly underspecified right now - we don't even have a good conceptual test that would tell us if a given scenario is a probable outcome of CEV, let alone a generative way to calculate that outcome.

Saying that the CEV "will best please everyone" is just handwaving this aside. Precommitting the whole future lightcone to the result of a process we don't know in advance is very dangerous, and very scary. It might be the best possible compromise between all humans, but it is not the case that all humans have equal input into the behavior of the first AI. I have not seen any good arguments claiming that implementing CEV is a better strategy than just trying to be to build the first AI before anyone else and then making it implement a narrow CEV.

Suppose that the first AI is fully general, and can do anything you ask of it. What reason is there for its builders, whoever they are, to ask to it to implement CEV rather than CEV?

In an idealized form, I agree with you.

That is, if I really take the CEV idea seriously as proposed, there simply is no way I can prefer CEV(me + X) to CEV(me)... if it turns out that I would, if I knew enough and thought about it carefully enough and "grew" enough and etc., care about other people's preferences (either in and of themselves, as in "I hadn't thought of that but now that you point it out I want that too", or by reference to their owners, as in "I don't care about that but if you do then fine let's have that too,"... (read more)

50

The Friendly AI Game

50

50

50

The Friendly AI Game

50

50