Perhaps of import is that if paperclip maximizer A is considering whether to help paperclip maximizer B, B will only want A to take paperclip-maximizing actions. Cognitively sophisticated paperclip maximizers want everybody to want to maximize paperclips over all else. There is no obvious way in which any action could be considered helpful-to-B unless that action also maximizes paperclips, except on axes that don't matter to B except inasmuch as they maximize paperclips. A real paperclip maximizer will, with no internal conflict whatever, sacrifice its own existence if that act will maximize paperclips. The two paperclip maximizers have identical goals that are completely external to their own experiences (although they will react to their experiences of paperclips, what they want are real paperclips, not paperclip experiences). Most real agents aren't quite like that.
Perhaps an intuition pump is appropriate at this point, explicating what I mean by the verb "assist".
Alfonse, the paperclip maximizer, decides that the best way to maximize paperclips is to conquer the world. In pursuit of the subgoal of conquering the world, Alfonse transforms itself into an army. After a fierce battle with some non-paperclip-ists, an instance of Alfonse considers whether to bind a different, badly injured Alfonse's wound.
Binding the wound will cost some time and calories, resources which might be used in other ways. However, if...
In the post Outlawing Anthropics, there was a brief and intriguing scrap of reasoning, which used the principle of reflective inconsistency, which so far as I know is unique to this community:
This post expands upon and attempts to formalize that reasoning, in hopes of developing a logical framework for reasoning about reflective inconsistency.
In diagramming and analyzing this, I encountered a difficulty. There are probably many ways to resolve it, but in resolving it, I basically changed the argument. You might have reasonably chosen a different resolution. Anyway, I'll explain the difficulty and where I ended up.
The difficulty: The text "...maximizes your expectation of pleasant experience over future selves.". How would you compute expectation of pleasant experience? It ought to depend intensely on the situation. For example, a flat future, with no opportunity to influence my experience or that of my sibs for better or worse, would argue that caring for sibs has exactly the same expectation as not-caring. Alternatively, if a mad Randian was experimenting on me, rewarding selfishness, not-caring for my sibs might well have more pleasant experiences than caring. Also, I don't know how to compute with experiences - Total Utility, Average Utility, Rawlsian Minimum Utility, some sort of multiobjective optimization? Finally, I don't know how to compute with future selves. For example, imagine some sort of bicameral cognitive architecture, where two individuals have exactly the same percepts (and therefore choose exactly the same actions). Should I count that as one future self or two?
To resolve this, I replace EY's reason with an argument from analogy, like so:
Here is the same argument again, "expanded". Remember, the primary reason to expand it is not readability - the expanded version is certainly less readable - it is as a step towards a generally applicable scheme for reasoning using the principle of reflective inconsistency.
At first glance, the mechanism of natural selection seems to explain selfish, but not unselfish behavior. However, the structure of the EEA seems to have offered sufficient opportunities for kin to recognize kin with low-enough uncertainty and assist (with small-enough price to the helper and large-enough benefit to the helped) that unselfish entities do outcompete purely selfish ones. Note that the policy of selfishness is sufficiently simple that it was almost certainly tried many times. We believe that unselfishness is still a winning strategy in the present environment, and will continue to be a winning strategy in the future.
The two policies, caring about sibs or not-caring, do in fact behave differently in the EEA, and so they are incompatible - we cannot behave according to both policies at once. Also, since caring about sibs outcompetes not-caring in the EEA, if a not-caring agent, X, were selecting a proxy (or "future self") to compete in an EEA-tournament to for utilons (or paperclips), X would pick a caring agent as proxy. The policy of not-caring would choose to delegate to an incompatible policy. This is what "reflectively inconsistent" means. Given a particular situation S1, one can always construct another situation S2 where the choices available in S2 correspond to policies to send as proxies into S1. One might understand the new situation as having an extra "self-modification" or "precommitment" choice point at the beginning. If a policy chooses an incompatible policy as its proxy, then that policy is "reflectively inconsistent" on that situation. Therefore, not-caring is reflectively inconsistent on the EEA.
The last step to the conclusion is less interesting than the part about reflective inconsistency. The conclusion is something like: "Other things being equal, prefer caring about sibs to not-caring".
Enough handwaving - to the code! My (crude) formalization is written in Automath, and to check my proof, the command (on GNU/Linux) is something like: