lukeprog comments on Why safe Oracle AI is easier than safe general AI, in a nutshell - Less Wrong

0 Post author: Stuart_Armstrong 03 December 2011 12:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (66)

You are viewing a single comment's thread. Show more comments above.

Comment author: lukeprog 03 December 2011 06:58:28PM *  6 points [-]

CEV is an attempt to route around the problem you illustrate here, but it might be impossible. Oracle AI might also be impossible. But, well, you know how I feel about doing the impossible. When it comes to saving the world, all we can do is try. Both routes are worth pursuing, and I like your new paper on Oracle AI.

EDIT: Stuart, I suspect you're getting downvoted because you only repeated a point against which many arguments have already been given, instead of replying to those counter-arguments with something new.

Comment author: Stuart_Armstrong 04 December 2011 11:06:15AM 2 points [-]

The problem with CEV can be phrased by extending the metaphor: a CEV built from both hitler and Gandhi means that the areas in which their values differ, are not relevant to the final output. So attitudes to Jews and violence, for instance, will be unpredictable in that CEV (so we should model them now as essentially random).

Stuart, I suspect you're getting downvoted because you only repeated a point against which many arguments have already been given, instead of replying to those counter-arguments with something new.

It's interesting. Normally my experience is that metaphorical posts get higher votes than technical ones - nor could I have predicted the votes from reading the comments. Ah well; at least it seems to have generated discussion.

Comment author: lukeprog 04 December 2011 08:37:47PM 1 point [-]

The problem with CEV can be phrased by extending the metaphor: a CEV built from both hitler and Gandhi means that the areas in which their values differ, are not relevant to the final output. So attitudes to Jews and violence, for instance, will be unpredictable in that CEV (so we should model them now as essentially random).

That's not how I understand CEV. But, the theory is in its infancy and underspecified, so it currently admits of many variants.

Comment author: Stuart_Armstrong 06 December 2011 07:18:36PM *  1 point [-]

Hum... If we got the combined CEV of two people, one of whom thought violence was ennobling and one who thought it was degrading, would you expect either or both of:

a) their combined CEV would be the same as if we had started with two people both indifferent to violence

b) their combined CEV would be biased in a particular direction that we can know ahead of time

Comment author: lukeprog 06 December 2011 07:37:46PM 0 points [-]

The idea is that their extrapolated volitions would plausibly not contain such conflicts, though it's not clear yet whether we can know what that would be ahead of time. Nor is it clear whether their combined CEV would be the same as the combined CEV of two people indifferent to violence.

Comment author: Stuart_Armstrong 07 December 2011 11:11:35AM 0 points [-]

So, to my ears, it sounds like we don't have much of an idea at all where the CEV would end up - which means that it most likely ends up somewhere bad, since most random places are bad.

Comment author: Manfred 07 December 2011 02:22:33PM 1 point [-]

Well, if it captures the key parts of what you want, you can know it will turn out fine even if you're extremely ignorant about what exactly the result will be.

Comment author: Stuart_Armstrong 07 December 2011 05:41:21PM 1 point [-]

if it captures the key parts of what you want

Yes, as the Spartans answered to Alexander the Great's father when he said "You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city." :

"If".

Comment author: Manfred 07 December 2011 07:02:40PM 0 points [-]

Yup. So, perhaps, focus on that "if."

Comment author: vallinder 07 December 2011 01:50:38PM 1 point [-]

Shouldn't we be able to rule out at least some classes of scenarios? For instance, paperclip maximization seems like an unlikely CEV output.

Comment author: Stuart_Armstrong 07 December 2011 05:40:30PM 2 points [-]

Most likely we can rule out most scenarios that all humans agree are bad. So better than clippy, probably.

But we really need a better model of what CEV does! Then we can start to talk sensibly about it.

Comment author: [deleted] 17 October 2013 05:29:54PM 0 points [-]

which means that it most likely ends up somewhere bad, since most random places are bad.

I don't think that follows, at all. CEV isn't a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word 'bad'?