eli_sennesh comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong

17 Post author: Stuart_Armstrong 13 July 2015 03:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (110)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 15 July 2015 06:59:46AM 10 points [-]

A thought I've been thinking of lately, derived from a reinforcement learning view of values, and also somewhat inspired by Nate's recent post on resting in motion... - value convergence seems to suggest a static endpoint, with some set of "ultimate values" we'll eventually reach and have ever after. But so far societies have never reached such a point, and if our values are an adaptation to our environment (including the society and culture we live in), then it would suggest that as long as we keep evolving and developing, our values will keep changing and evolving with us, without there being any meaningful endpoint.

There will always (given our current understanding of physics) be only a finite amount of resources available, and unless we either all merge into one enormous hivemind or get turned into paperclips, there will likely be various agents with differing preferences on what exactly to do with those resources. As the population keeps changing and evolving, the various agents will keep acquiring new kinds of values, and society will keep rearranging itself to a new compromise between all those different values. (See: the whole history of the human species so far.)

Possibly we shouldn't so much try to figure out what we'd prefer the final state to look like, but rather what we'd prefer the overall process to look like.

(The bias towards trying to figure out a convergent end-result for morality might have come from LW's historical tendency to talk and think in terms of utility functions, which implicitly assume a static and unchanging set of preferences, glossing over the fact that human preferences keep constantly changing.)

Comment author: [deleted] 18 July 2015 07:26:55PM 0 points [-]

Possibly we shouldn't so much try to figure out what we'd prefer the final state to look like, but rather what we'd prefer the overall process to look like.

Well, the general Good Idea in that model is that events or actions shouldn't be optimized to drift faster or more discontinuously than people's valuations of those events, so that the society existing at any given time is more-or-less getting what it wants while also evolving towards something else.

Of course, a compromise between the different "values" (scare-quotes because I don't think the moral-philosophy usage of the word points at anything real) of society's citizens is still a vast improvement on "a few people dominate everyone else and impose their own desires by force and indoctrination", which is what we still have to a great extent.