Lumifer comments on Open Thread Feb 22 - Feb 28, 2016 - Less Wrong

5 Post author: Elo 21 February 2016 09:14PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (228)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 23 February 2016 09:05:15PM 1 point [-]

That's the reason EY came up with the concept of CEV -- Coherent Extrapolated Volition.

Comment author: halcyon 23 February 2016 10:49:42PM *  2 points [-]

The SEP says that preferences cannot be aggregated without additional constraints on how the aggregation is to be done, and the end result changes depending on things like the order of aggregation, so these additional constraints take on the quality of arbitrariness. How does CEV get around that problem?

Comment author: Kaj_Sotala 26 February 2016 05:39:02PM *  1 point [-]

From the CEV paper:

Different classes of satisfactory initial definitions may fall into different selfconsistent attractors for optimal definitions of volition. Or they may all converge to essentially the same endpoint. A CEV might survey the “space” of initial dynamics and self-consistent final dynamics, looking to see if one alternative obviously stands out as best; extrapolating the opinions humane philosophers might have of that space. But if there are multiple, self-consistent, satisficing endpoints, each of them optimal under their own criterion—okay. Whatever. As long as we end up in a Nice Place to Live.

And yes, the programmers’ choices may have a huge impact on the ultimate destiny of the human species. Or a bird, chirping in the programmers’ window. Or a science fiction novel, or a few lines spoken by a character in an anime, or a webcomic. Life is chaotic, small things have large effects. So it goes.

Which you could sum up as "CEV doesn't get around that problem, it treats it as irrelevant - the point isn't to find a particular good solution that's unique and totally non-arbitrary, it's just to find even one of the good solutions. If arbitrary reasons shift us from Good World #4 to Good World #36, who cares as long as they both really are good worlds".

Comment author: halcyon 28 February 2016 03:21:21PM *  1 point [-]

The real difficulty is that when you combine two sets of preferences, each of which make sense on their own, you get a set of preferences that makes no sense whatsoever: http://plato.stanford.edu/entries/economics/#5.2 https://www.google.com/search?q=site%3Aplato.stanford.edu+social+choice&ie=utf-8&oe=utf-8

There is no easy way to resolve this problem. There is also no known method that takes such an inconsistent set of preferences as input and gives a consistent set of preferences as output such that the output would be recognizable to either party who contributed an original set of preferences as furthering any of their original goals. These random decisions are required so often in cases where there isn't an unanimous agreement that in practice, there would be a large component of arbitrariness every single time CEV tries to arrive at a uniform set of preferences by extrapolating volitions of multiple agents into the future.

This doesn't mean the problem is unresolvable, just that it's an AI problem in its own right, but given these problems, wouldn't it be better to pick whichever Nice Place to Live is the safest to reach instead of bothering with CEV? I say this because I'm not sure Nice Place to Live can be defined in terms of CEV, as any CEV-approved output. Because of the preference aggregation problem, I'm not certain that a world that is provably CEV-abiding also provably avoids flagrant immorality. Two moral frameworks when aggregated by a non-smart algorithm might give rise to an immoral framework, so I'm not sure the essence of the problem is resolved just by CEV as explained in the paper.

Comment author: halcyon 28 February 2016 08:43:52PM *  0 points [-]

Although what if we told each party to submit goals rather than non-goal preferences? If the AI has access to a model specifying which actions lead to which consequences, then it can search for those actions that maximize the number of goals fulfilled regardless of which party submitted them, or perhaps takes a Rawlsian approach of trying to maximize the number of goals fulfilled that were submitted by whichever party will have the least number of goals fulfilled if that sequence of actions were taken, etc. That seems very imaginable to me. You can then have heuristics that constrain the search space and stuff. You can also have non-goal preferences in addition to goals if the parties have any of those.

In that light, it seems to me that the problem was inferring goals from a set of preferences which were not purely non-goal preferences but were actually presented with some unspecified goals in mind. Eg. One party wanted chocolate, but said, "I want to go to the store" instead. If that was the source of the original problem, then we can see why we might need an AI to solve it, since it calls for some lightweight mind reading. Of course, a CEV-implementing AI would have to be a mind reader anyway, since we don't really know what our goals ultimately are given everything we could know about reality.

This still does not guarantee basic morality, but parties should at least recognize some of their ultimate goals in the end result. They might still grumble about the result not being exactly what they wanted, but we can at least scold them for lacking a spirit of compromise.

All this presupposes that enough of our actions can be reduced to ultimate goals that can be discovered, and I don't think this process guarantees we will be satisfied with the results. For example, this might erode personal freedom to an unpleasant degree. If we would choose to live in some world X if we were wiser and nicer than we are, then it doesn't necessarily follow that X is a Nice Place to Live as we are now. Changing ourselves to reach that level of niceness and wisdom might require unacceptably extensive modifications to our actual selves.

Comment author: Kaj_Sotala 01 March 2016 07:55:57AM 1 point [-]

My recent paper touches upon preference aggregation a bit in section 8, BTW, though it's mostly focused on the question of figuring out a single individual's values. (Not sure how relevant that is for your comments, but thought maybe a little.)

Comment author: halcyon 05 March 2016 09:58:10PM 0 points [-]

Thanks, I'll look into it.

(And all my ranting still didn't address the fundamental difficulty: There is no rational way to choose from among different projections of values held by multiple agents, projections such as Rawlsianism and utilitarianism.)

Comment author: ChristianKl 24 February 2016 12:46:30AM 1 point [-]

I think that's on the list of MIRI open research problems.

Comment author: halcyon 24 February 2016 02:24:17PM 0 points [-]

Interesting. In that case, would you say an AI that provably implements CEV's replacement is, for that reason, provably Friendly? That is, AIs implementing CEV's replacement form an analytical subset of Friendly AIs? What is the current replacement for CEV anyway? Having some technical material would be even better. If it's open to the public, then I'd like to understand how EY proposes to install a general framework similar to CEV at the "initial dynamic" stage that can predictably generate a provably Friendly AI without explicitly modeling the target of its Friendliness.

Comment author: Kaj_Sotala 26 February 2016 05:45:29PM 1 point [-]

What is the current replacement for CEV anyway?

There isn't really one as far as I know; "The Value Learning Problem" discusses some of the questions involved, but seems to mostly at be the point of defining the problem rather than trying to answer it. (This seems appropriate to me; trying to answer the problem at this point seems premature.)

Comment author: halcyon 28 February 2016 03:32:58PM 1 point [-]

Thanks. That makes sense to me.

Comment author: ChristianKl 24 February 2016 02:38:55PM 1 point [-]

Interesting. In that case, would you say an AI that provably implements CEV's replacement is, for that reason, provably Friendly?

I think that's MIRI's usage of the term friendly.

If it's open to the public, then I'd like to understand how EY proposes to install a general framework similar to CEV at the "initial dynamic" stage

He's not proposing a mechanism as far as I know. That's another open problem.

Comment author: Gunnar_Zarncke 24 February 2016 09:30:25PM -1 points [-]

See Miris research for details.