Paperclip maximization would be a quantitative internal alignment error. Ironically the error of drawing boundary between paperclip maximization & squiggle maximization was itself arbitrary decision. Feel free to message me to discuss this.
This entry should address the fact the "the full complement of human values" is an impossible and dynamic set. There is no full set, as the set is interactive with a dynamic environment that presents infinite conformations (from an obviously finite set of materials), and also because the set is riven with indissoluble conflicts (hence politics); whatever set was given to the maximizer AGI would have to be rendered free of these conflicts which would then no longer be the full set etc.
Question: Are innerly-misaligned (superintelligent) AI systems supposed to necessarily be squiggle maximizers, or are squiggle maximizers supposed to only be one class of innerly-misaligned systems?
I added some caveats about the potential for empirical versions of moral realism and how precise values targets are in practice.
While the target is small in mind space, IMO, it's not that small wrt. things like the distribution of evolved life or more narrowly the distribution of humans.
Renaming "paperclip maximizer" tag to "squiggle maximizer"
might be a handy vector for spreading awareness of squiggle maximization,
but epistemically this makes no sense.
The whole issue with "paperclip maximizer"
is that the meaning and implications are different,
so it's not another name for the same idea, it's a different idea.
In particular, literal paperclip maximization,
as it's usually understood, is not an example of squiggle maximization.
Being originally the same thing is just etymology
and doesn't have a normative claim on meaning.
Agree these are different concepts. The paperclip maximizer is good story to explain to a newbie in this topic. "You tell the AI to make you paperclips, it turns the whole universe into paperclips." Nobody believes that this is exactly what will happen, but it is a good story for pedagogical purposes. The squiggle maximizer, on the other hand, appears to be a high-level theory about what the AI actually ultimately does after killing all humans. I haven't seen any arguments for why molecular squiggles are a more likely outcome than paperclips or anything else. Where is that case made?
I wouldn't be as disturbed if I thought the class of hostile AIs I was talking about would have any of those qualities except for pure computational intelligence devoted to manufacturing an infinite number of paperclips. It turns out that the fact that this seems extremely "stupid" to us relies on our full moral architectures.
I addressed this in my top level comment also but do we think Yud here has the notion that there is such a thing as "our full moral architecture" or is he reasoning from the impossibility of such completeness that alignment cannot be achieved by modifying the 'goal'?
Paperclip maximization would be a quantitative internal alignment error. Ironically the error of drawing boundary between paperclip maximization & squiggle maximization was itself arbitrary decision. Feel free to message me to discuss this.
This entry should address the fact the "the full complement of human values" is an impossible and dynamic set. There is no full set, as the set is interactive with a dynamic environment that presents infinite conformations (from an obviously finite set of materials), and also because the set is riven with indissoluble conflicts (hence politics); whatever set was given to the maximizer AGI would have to be rendered free of these conflicts which would then no longer be the full set etc.
Question: Are innerly-misaligned (superintelligent) AI systems supposed to necessarily be squiggle maximizers, or are squiggle maximizers supposed to only be one class of innerly-misaligned systems?
I added some caveats about the potential for empirical versions of moral realism and how precise values targets are in practice.
While the target is small in mind space, IMO, it's not that small wrt. things like the distribution of evolved life or more narrowly the distribution of humans.
Renaming "paperclip maximizer" tag to "squiggle maximizer" might be a handy vector for spreading awareness of squiggle maximization, but epistemically this makes no sense.
The whole issue with "paperclip maximizer" is that the meaning and implications are different, so it's not another name for the same idea, it's a different idea. In particular, literal paperclip maximization, as it's usually understood, is not an example of squiggle maximization. Being originally the same thing is just etymology and doesn't have a normative claim on meaning.
Agree these are different concepts. The paperclip maximizer is good story to explain to a newbie in this topic. "You tell the AI to make you paperclips, it turns the whole universe into paperclips." Nobody believes that this is exactly what will happen, but it is a good story for pedagogical purposes. The squiggle maximizer, on the other hand, appears to be a high-level theory about what the AI actually ultimately does after killing all humans. I haven't seen any arguments for why molecular squiggles are a more likely outcome than paperclips or anything else. Where is that case made?
Probable first mention by Yudkowsky on the extropians mailing list:
I addressed this in my top level comment also but do we think Yud here has the notion that there is such a thing as "our full moral architecture" or is he reasoning from the impossibility of such completeness that alignment cannot be achieved by modifying the 'goal'?