Stuart_Armstrong comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (110)
That's something different - a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep "honor ancestors" in it, and do so or not depending on the cost and the interaction with other moral values.
If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type "This powerful being will not trade with agents having value V."