TheOtherDave comments on The Urgent Meta-Ethics of Friendly Artificial Intelligence - Less Wrong

45 Post author: lukeprog 01 February 2011 02:15PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (249)

You are viewing a single comment's thread. Show more comments above.

Comment author: lukeprog 01 February 2011 11:04:08PM 9 points [-]

I don't yet have much of an opinion on what the best way to do it is, I'm just saying it needs doing. We need more brains on the problem. Eliezer's meta-ethics is, I think, far from obviously correct. Moving toward normative ethics, CEV is also not obviously the correct solution for Friendly AI, though it is a good research proposal. The fate of the galaxy cannot rest on Eliezer's moral philosophy alone.

We need critically-minded people to say, "I don't think that's right, and here are four arguments why." And then Eliezer can argue back, or change his position. And then the others can argue back, or change their positions. This is standard procedure for solving difficult problems, but as of yet I haven't seen much published dialectic like this in trying to figure out the normative foundations for the Friendly AI project.

Let me give you an explicit example. CEV takes extrapolated human values as the source of an AI's eventually-constructed utility function. Is that the right way to go about things, or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth? What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?

Comment author: TheOtherDave 02 February 2011 12:23:39AM 3 points [-]

Judging from his posts and comments here, I conclude that EY is less interested in dialectic than in laying out his arguments so that other people can learn from them and build on them. So I wouldn't expect critically-minded people to necessarily trigger such a dialectic.

That said, perhaps that's an artifact of discussion happening with a self-selected crowd of Internet denizens... that can exhaust anybody. So perhaps a different result would emerge if a different group of critically-minded people, people EY sees as peers, got involved. The Hanson/Yudkowsky debate about FOOMing had more of a dialectic structure, for example.

With respect to your example, the discussion here might be a starting place for that discussion, btw. The discussions here and here and here might also be salient.

Incidentally: the anticipated relationship between what humans want, what various subsets of humans want, and what various supersets including humans want, is one of the first questions I asked when I encountered the CEV notion.

I haven't gotten an explicit answer, but it does seem (based on other posts/discussions) that on EY's view a nonhuman intelligent species valuing something isn't something that should motivate our behavior at all, one way or another. We might prefer to satisfy that species' preferences, or we might not, but either way what should be motivating our behavior on EY's view is our preferences, not theirs. What matters on this view is what matters to humans; what doesn't matter to humans doesn't matter.

I'm not sure if I buy that, but satisfying "all the reasons for action that exist" does seem to be a step in the wrong direction.

Comment author: lukeprog 02 February 2011 01:17:52AM 0 points [-]

TheOtherDave,

Thanks for the links! I don't know what "satisfying all the reasons for action that exist" is the solution, but I listed it as an example alternative to Eliezer's theory. Do you have a preferred solution?

Comment author: TheOtherDave 02 February 2011 02:42:56AM 1 point [-]

Not really.

Rolling back to fundamentals: reducing questions about right actions to questions about likely and preferred results seems reasonable. So does treating the likely results of an action as an empirical question. So does approaching an individual's interests empirically, and as distinct from their beliefs about their interests, assuming they have any. The latter also allows for taking into account the interests of non-sapient and non-sentient individuals, which seems like a worthwhile goal.

Extrapolating a group's collective interests from the individual interests of its members is still unpleasantly mysterious to me, except in the fortuitous special case where individual interests happen to align neatly. Treating this as an optimization problem with multiple weighted goals is the best approach I know of, but I'm not happy with it; it has lots of problems I don't know how to resolve.

Much to my chagrin, some method for doing this seems necessary if we are to account for individual interests in groups whose members aren't peers (e.g., children, infants, fetuses, animals, sufferers of various impairments, minority groups, etc., etc., etc.), which seems good to address.

It's also at least useful to addressing groups of peers whose interests don't neatly align... though I'm more sanguine about marketplace competition as an alternative way of addressing that.

Something like this may also turn out to be critical for fully accounting for even an individual human's interests, if it turns out that the interests of the various sub-agents of a typical human don't align neatly, which seems plausible.

Accounting for the probable interests of probable entities (e.g., aliens) I'm even more uncertain about. I don't discount them a priori, but without a clearer understanding of such an accounting would actually look like I really don't know what to say about them. I guess if we have grounds for reliably estimating the probability of a particular interest being had by a particular entity, then it's just a subset of the general weighting problem, but... I dunno.

I reject accounting for the posited interests of counterfactual entities, although I can see where the line between that and probabilistic entities as above is hard to specify.

Does that answer your question?