Is P(my-primary-goal-should-change) < P(my-primary-goal-should-change | the-evidence-in-this-scenario) for either agent? If not, this implies that the agents believe their primary goal to be arbitrary yet still worth keeping intact forever without change, e.g. pencils and paperclips are their basic morality and there was no simpler basic morality like "do what my creators want me to do"
This strikes me as a little anthropomorphic. Maximizers would see their maximization targets as motivationally basic; they might develop quite complex behaviors in service to those goals, but there is no greater meta-motivation behind them. If there was, they wouldn't be maximizers. This is so alien to human motivational schemes that I think using the word "morality" to describe it is already a little misleading, but insofar as it is a morality it's defined in terms of the maximization target: a paperclipper would consider rewriting its motivational core if and only if it could be convinced that that would ultimately generate more paperclips than the alternative.
I wouldn't call that arbitrary, though, at least not from the perspective of the maximizer; doing so would be close to calling joy or happiness arbitrary from a human perspective, although there really isn't any precise analogy in our terms.
Reading http://lesswrong.com/lw/t1/arbitrary/ makes me think that a rational agent, even if its greatest motivation is to maximize its paperclip production, would be able to determine that its desire for paperclips was more arbitrary than its tools for rationality. It could perform simulations or thought experiments to determine its most likely origins and find that while many possible origins lead to the development of rationality there are only a few paths that specifically generate paperclip maximization. Equally likely are pencil maximization and smi...
Today's post, Moral Error and Moral Disagreement was originally published on 10 August 2008. A summary (taken from the LW wiki):
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Sorting Pebbles Into Correct Heaps, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.