Eliezer_Yudkowsky comments on Moral Error and Moral Disagreement - Less Wrong

14 Post author: Eliezer_Yudkowsky 10 August 2008 11:32PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

Sort By: Old

You are viewing a single comment's thread.

Comment author: Eliezer_Yudkowsky 12 August 2008 04:32:00PM 5 points [-]

Roko:

I would not extrapolate the volitions of people whose volitions I deem to be particularly dangerous, in fact I would probably only extrapolate the volition of a small subset (perhaps 1 thousand - 1 million) people whose outward philosophical stances on life were at least fairly similar to mine.

Then you are far too confident in your own wisdom. The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.

I'm sure that Archimedes of Syracuse thought that Syracuse had lots of incredibly important philosophical and cultural differences with the Romans who were attacking his city.

Had it fallen to Archimedes to build an AI, he might well have been tempted to believe that the whole fate of humanity would depend on whether the extrapolated volition of Syracuse or of Rome came to rule the world - due to all those incredibly important philosophical differences.

Without looking in Wikipedia, can you remember what any of those philosophical differences were?

And you are separated from Archimedes by nothing more than a handful of centuries.

Comment author: cousin_it 18 November 2010 06:49:35AM *  9 points [-]

The overall FAI strategy has to be one that would have turned out okay if Archimedes of Syracuse had been able to build an FAI, because when you zoom out to the billion-year view, we may not be all that much wiser than they.

"Wiser"? What's that mean?

Your comment makes me think that, as of 12 August 2008, you hadn't yet completely given up on your dream of finding a One True Eternal Morality separate from the computation going on in our heads. Have you changed your opinion in the last two years?

Comment author: wedrifid 18 November 2010 07:18:11AM 4 points [-]

I like what Roko has to say here and find myself wary of Eliezer's reply. He may have just been signalling naivety and an irrational level of egalitarianism so people are more likely to 'let him out of the box'. Even so, taking this and the other statements EY has made on FAI behaviours (yes, those that he would unilaterally label friendly) scares me.

Comment author: cousin_it 18 November 2010 07:31:46AM *  12 points [-]

unilaterally label friendly

I love your turn of phrase, it has a Cold War ring to it.

The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of "sincerely want". If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can't he task it with looking at himself and inferring his best idea of how to infer humanity's wishes? How do we determine, in general, which things a document like CEV must spell out and which things can/should be left to the mysterious magic of "intelligence"?

Comment author: wedrifid 18 November 2010 07:53:37AM 11 points [-]

The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of "sincerely want". If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can't he task it with looking at himself and inferring his best idea of how to infer humanity's wishes?

This has been my thought exactly. Barring all but the most explicit convolution any given person would prefer their own personal volition to be extrapolated. If by happenstance I should be altruistically and perfectly infatuated by, say Sally, then that's the FAI's problem. It will turn out that extrapolating my volition will then entail extrapolating Sally's volition. The same applies to caring about 'humanity', whatever that fuzzy concept means when taken in the context of unbounded future potential.

I am also not sure how to handle those who profess an ultimate preference for a possible AI that extrapolates other than their own volition. I mean, clearly they are either lying, crazy or naive. It seems safer to trust someone who says "I would ultimately prefer FAI<someone> but I am creating FAI<larger group including wedrifid> for the purpose of effective cooperation."

Similarly, if someone wanted to credibly signal altruism to me it would be better to try to convince me that CEV<someone> has a lot of similarities with CEV<benefactor> that arise due to altruistic desires rather than saying that they truly sincerely prefer CEV<someone, benefactor>. Because the later is clearly bullshit of some sort.

How do we determine, in general, which things a document like CEV must spell out, and which things can/should be left to the mysterious magic of "intelligence"?

I have no idea, I'm afraid.

Comment author: Eugine_Nier 18 November 2010 08:29:45AM 8 points [-]

Eliezer appears to be asserting that CEV<someone> is equal for all humans. His arguments leave something to be desired. In particular, this is an assertion about human psychology, and requires evidence that is entangled with reality.

Leaving aside the question of whether even a single human's volition can be extrapolated into a unique coherent utility function, this assertion has two major components:

1) humans are sufficiently altruistic that say CEV<Alice> doesn't in any way favor Alice over Bob.

2) humans are sufficiently similar that any apparent moral disagreement between Alice and Bob is caused by one or both having false beliefs about the physical world.

I find both these statements dubious, especially the first, since I see on reason why evolution would make us that altruistic.

Comment author: Perplexed 18 November 2010 06:49:40PM 1 point [-]

Eliezer appears to be asserting that CEV<someone> is equal for all humans.

The phrase "is equal for all humans" is ambiguous. Even if all humans had identical psychologies, that could still all be selfish. The scare-quoted "source code" for Values<Eliezer> and Values<Archimedes> might be identical, but I think that both will involve self "pointers" resolving to Eliezer in one case and to Archimedes in the other.

We can define that two persons values are "parametrically identical" if they can be expressed in the same "source code", but the code contains one or more parameters which are interpreted differently for different persons. A self pointer is one obvious parameter that we might be prepared to permit in "coherent" human values. That people are somewhat selfish does not necessarily conflict with our goal of determining a fair composite CEV of mankind - there are obvious ways of combining selfish values into composite values by giving "equal weight" (more scare quotes) to the values of each person.

The question then arises, are there other parameters we should expect besides self? I believe there are. One of them can be called the now pointer - it designates the current point in time. The now pointer in Values<Archimedes> resolves to ~150 BC whereas Values<Eliezer> resolves to ~2010 AD. Both are allowed to be more interested in the present and immediate future than in the distant future. (Whether they should be interested at all in the recent past is an interesting question, but somewhat orthogonal to the present topic.)

How do we combine now pointers of different persons when constructing a CEV for mankind. Do we do it by assigning "equal weights" to the now of each person as we did for the self pointers? I believe this would be a mistake. What we really want, I believe, is a weighting scheme which changes over time - a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. "After all", they will rationally say to themselves, "we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199. Let the future care for itself. It certainly isn't going to care for us!"

There are various other parameters that may appear in the idealized common "source code" for Values<person>. For example, there may be different preferences regarding the discount rate used in the previous paragraph, and there may be different preferences regarding the "Malthusian factor" - how many biological descendents or clones one accumulates and how fast. It is not obvious to me whether we need to come up with rules for combining these into a CEV or whether the composite versions of these parameters fall out automatically from the rules for combining self and now parameters.

Sorry for the long response, but your comment inspired me.

Comment author: timtyler 18 November 2010 11:55:19PM *  -1 points [-]

What we really want, I believe, is a weighting scheme which changes over time - a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. "After all", they will rationally say to themselves, "we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199.

I don't think you need a "discounting" scheme. Or at least, you would get what is needed there "automatically" - if you just maximise expected utility. The same way Deep Blue doesn't waste its time worrying about promoting pawns on the first move of the game - even if you give it the very long term (and not remotely "discounted") goal of winning the whole game.

Comment author: Perplexed 19 November 2010 12:18:47AM 1 point [-]

I don't think you need a "discounting" scheme. Or at least, you would get what is needed there "automatically" - if you just maximise expected utility.

Could you explain why you say that? I can imagine two possible reasons why you might, but they are both wrong. Your "Deep Blue" example suggests that you are laboring under some profound misconceptions about utility theory and the nature of instrumental values.

Comment author: timtyler 19 November 2010 08:04:19AM *  -1 points [-]

This is this one again. You don't yet seem to agree with it - and it isn't clear to me why not.

Comment author: Jack 19 November 2010 12:31:36PM 0 points [-]

The same way Deep Blue doesn't waste its time worrying about promoting pawns on the first move of the game - even if you give it the very long term (and not remotely "discounted") goal of winning the whole game.

Is this really true? My understanding is that Deep Blue's position evaluation function was determined by an analysis of a hundreds of thousands of games. Presumably it ranked openings which had a tendency to produce more promotion opportunities higher than openings which tended to produce fewer promotion opportunities (all else being equal and assuming promoting pawns correlates with wins).

Comment author: timtyler 19 November 2010 08:40:45PM *  0 points [-]

I wasn't talking about that - I meant it doesn't evaluate board positions with promoted pawns at the start of the game - even though these are common positions in complete chess games. Anyway, forget that example if you don't like it, the point it illustrates is unchanged.

Comment author: timtyler 20 November 2010 10:16:04AM 1 point [-]

Eliezer appears to be asserting that CEV<someone> is equal for all humans.

The "C" in "CEV" stands for "Coherent". The concept refers to techniques of combining the wills of a bunch of agents. The idea is not normally applied to a population consisting of single human. That would just be EV<someone>. I am not aware of any evidence that Yu-El thinks that EV<someone> is independent of the <someone>.

Comment author: nshepperd 20 November 2010 11:43:14AM 2 points [-]

A CEV optimizer is less likely to do horrific things while its ability to extrapolate volition is "weak". If it can't extrapolate far from the unwise preferences people have now with the resources it has, it will notice that the EV varies a lot among the population, and take no action. Or if the extrapolation system has a bug in it, this will hopefully show up as well. So coherence is a kind of "sanity test".

That's one reason that leaps to mind anyway.

Of course the other is that there is no evidence any single human is Friendly anyway, so cooperation would be impossible among EV maximizing AI researchers. As such, an AI that maximizes EV<Eliezer> is out of the question already. CEV<humanity> is the next best thing.

Comment author: Vladimir_Nesov 18 November 2010 10:44:20AM *  3 points [-]

The argument seems to be, if Preference1<Archimedes> is too different from Preference1<cousin_it>, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2<Archimedes> much closer to Preference2<cousin_it>. And since Preference1 is inadequate, as demonstrated by this test case, Preference1<cousin_it> is also probably hugely worse for cousin_it than Preference2<Archimedes>, even if Preference2<cousin_it> is better than Preference2<Archimedes>.

We are not that wise in the sense that any moral progress we've achieved, if it's indeed progress (so that on reflection, both past and future would agree that the direction was right) and not arbitrary change, shouldn't be a problem for an AI to repeat, and thus this progress in particular (as opposed to other possible differences) shouldn't contribute to differences in extracted preference.

Comment author: Eugine_Nier 18 November 2010 04:23:37PM *  1 point [-]

The argument seems to be, if Preference1<Archimedes> is too different from Preference1<cousin_it>, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2<Archimedes> much closer to Preference2<cousin_it>. And since Preference1 is inadequate, as demonstrated by this test case, Preference1<cousin_it> is also probably hugely worse for cousin_it than Preference2<Archimedes>, even if Preference2<cousin_it> is better than Preference2<Archimedes>.

Of course the above constraint isn't nearly enough to uniquely specify Preference2.