Agents synchronization
There is a line in alignment-related thinking, of looking for ways that agents will tend to be similar. An early example is convergent instrumental goals, and a later is natural abstractions. Those two ideas share an important attribute - trying to think on something "mental" (values, abstractions) as at least partially grounded in the objective environment. My goal in this post is to present and discuss another family of mechanisms for convergence of agents. Those mechanisms are different in that they arise from interaction between the agents and make them "synchronize" with each other, rather than adapt to similar non-agentic environment. As a result, the convergence is around things that are "socially constructed" and somewhat arbitrary, rather than "objective” things. I’ll then shortly touch another family of mechanisms and conclude with some short points about relevance to alignment. Instrumental Value Synchronization Depending on their specific situations and values, agents may care about different aspects of the environment to different degrees. A not-too-ambitious paperclip maximizer (Pam for short) may care more about controlling metal on earth than it cares about controlling rocks on the moon. A huge-pyramids-on-moon maximizer (Pym) may have the opposite priorities. But they both care about uranium to energize their projects and get into conflicts around it. It then makes sense for Pym to care about controlling metal too – to have some leverage on Pam, that may then be used to achieve more uranium. It may want the ability to give Pam metal in exchange for uranium, or promise to create paperclips in exchange for uranium, or to retaliate with paperclip-destruction if Pam try to take its uranium. Anyway, the result for Pym is that it is now more like Pam in that it care about metal and paperclips, and about things that are relevant for those. Money Money is a strange thing, leading people to say strange things about how it works. The most common one
I seem to be the the only one who read the post that way, so probably I read my own opinions into it, but my main takeaway was pretty much that people with your (and my) values are often shamed into pretending to have other values and invent excuses for how their values are consistent with their actions, while it would be more honest and productive if we take a more pragmatic approach to cooperating around our altruistic goals.