Stuart_Armstrong comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong

17 Post author: Stuart_Armstrong 13 July 2015 03:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (110)

You are viewing a single comment's thread. Show more comments above.

Comment author: Pentashagon 25 July 2015 03:39:12AM 1 point [-]

The problem is that un-self-consistent morality is unstable under general self improvement

Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it's expensive to maintain honor and most other values can be satisfied without it. In general, if U(moresatisfactionofvalue1) > U(moresatisfactionofvalue2) then maximization should tend to ignore value2 regardless of its consistency. If U(makevaluesselfconsistentvalue) > U(satisfyinganyother_value) then the obvious solution is to drop the other values and be done.

A sort of opposite approach is "make reality consistent with these pre-existing values" which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you're a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.

Comment author: Stuart_Armstrong 27 July 2015 04:02:02PM 2 points [-]

It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention.

That's something different - a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep "honor ancestors" in it, and do so or not depending on the cost and the interaction with other moral values.

If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type "This powerful being will not trade with agents having value V."