When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.
Just to make sure I understand this, suppose a pencil maximizer and a paperclip maximizer meet each other while tiling deep space. They communicate (or eat parts of each other and evaluate the algorithms embedded therein) and discover that they are virtually identical except for the pencil/paperclip preference. They further discover that they are both the creation of a species of sentient beings who originated in different galaxies and failed the AI test. The sentient species shared far more in common than the difference in pencil/paperclip preference. Neither can find a flaw in the rationality algorithm that the other employs. Is P(my-primary-goal-should-change) < P(my-primary-goal-should-change | the-evidence-in-this-scenario) for either agent? If not, this implies that the agents believe their primary goal to be arbitrary yet still worth keeping intact forever without change, e.g. pencils and paperclips are their basic morality and there was no simpler basic morality like "do what my creators want me to do" in which case the probability of the paperclip/pencil maximization goal should receive a significant update upon discovering that two different species with so much in common accidentally ordered their own destruction by arbitrary artifacts.
Also, imagine that our basic morality is not as anthropomorphically nice as "What will save my friends, and my people, from getting hurt? How can we all have more fun? ..." and is instead "What will most successfully spread my genetic material?". The nice anthropomorphic questions we are aware of may only be a good-enough approximation of our true basic morality that we don't have (or need) conscious access to it. Why should we arbitrarily accept the middle level instead of accepting the "abortion is wrong" or "maximize our genetic material" morals at face value?
I find it interesting that single cells got together and built themselves an almost-friendly AI for the propagation of genetic material that is now talking about replacing genetic material with semiconductors. Or was it the Maximization Of Maximization Memes meme that got the cells going in the first place and is still wildly successful and planning its next conquest?
Is P(my-primary-goal-should-change) < P(my-primary-goal-should-change | the-evidence-in-this-scenario) for either agent? If not, this implies that the agents believe their primary goal to be arbitrary yet still worth keeping intact forever without change, e.g. pencils and paperclips are their basic morality and there was no simpler basic morality like "do what my creators want me to do"
This strikes me as a little anthropomorphic. Maximizers would see their maximization targets as motivationally basic; they might develop quite complex behav...
Today's post, Moral Error and Moral Disagreement was originally published on 10 August 2008. A summary (taken from the LW wiki):
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Sorting Pebbles Into Correct Heaps, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.