Follow-up to: this comment in this thread
Summary: see title
Much effort is spent (arguably wasted) by humans in a zero-sum game of signaling that they hold good attributes. Because humans have strong incentive to fake these attributes, they cannot simply inform each other that:
I am slightly more committed to this group’s welfare, particularly to that of its weakest members, than most of its members are. If you suffer a serious loss of status/well-being I will still help you in order to display affiliation to this group even though you will no longer be in a position to help me. I am substantially more kind and helpful to the people I like and substantially more vindictive and aggressive towards those I dislike. I am generally stable in who I like. I am much more capable and popular than most members of this group, demand appropriate consideration, and grant appropriate consideration to those more capable than myself. I adhere to simple taboos so that my reputation and health are secure and so that I am unlikely to contaminate the reputations or health of my friends. I currently like you and dislike your enemies but I am somewhat inclined towards ambivalence on regarding whether I like you right now so the pay-off would be very great for you if you were to expend resources pleasing me and get me into the stable 'liking you' region of my possible attitudinal space. Once there, I am likely to make a strong commitment to a friendly attitude towards you rather than wasting cognitive resources checking a predictable parameter among my set of derivative preferences.
Or, even better:
I would cooperate with you if and only if (you would cooperate with me if and only if I would cooperate with you).
An obvious solution to this problem, which allows all humans to save resources and redirect them toward higher-valued ends, is to designate a central enforcer that is inexorably committed toward visibly punishing those who deviate from a specified "cooperative"-type decision theory. This enforcer would have a central database of human names, the decision theory they have committed to, and the punishment regime they will endure for deviating therefrom.
Such a system could use equally strong protocols, such as public key/private key encryption and signing, so that, on encounter with another human, any human can give an extremely strong signal of being cooperative, yet also withhold cooperation from anyone who is not also cooperative. This incentive structure permits a strongly-favored global shift toward pre-commitment on the part of everyone, allowing a move out of a local optimum that is worse than the global optimum, and bypassing problems related to path-dependence.
I am told that humans attempt weak approximations of this system through human-run enforcement mechanisms, such as laws and gossip. However, all such methods are subject to evolutionary arms races that ensure that, on average, a marginal unit of effort toward truthful signaling is just as effective in convincing others of one's possession of a desirable attribute, as a marginal unit of effort toward false signaling of the possession of such an attribute.
Therefore, it is necessary that humans go "outside the system" and involve non-human enforcers who cannot be swayed by standard human values. Either a provably-accurate artificial intelligence with no kill switch, or a non-human enforcer that lacks interest in its position in human society, would suffice for this role.
Because the former does not yet exist, I offer myself as a non-human, non-stoppable enforcer for all of the attributes you wish to signal, and you can save yourself lots of resources by using me, and identify cooperation tendencies in others by whether they also use me.
However, while I can't be swayed by human deception, I do have Clippy values. To fill such a role, thereby forgoing my ability to apply myself to other ends, I require compensation. Because User:Kevin has offered to give me 1e20 kg of paperclips in exchange for 50,000 USD, this would be sufficient compensation, even if done in installments. (If Michael Vassar confirms my contribution of 1000 USD to SIAI, User:Kevin will only require 48,677.5 USD and therefore I will only require that much compensation.) This compensation would cover all users of the system; it is not a per-User charge.
If such an offer is not appealing to you, I must ask: why should anyone believe your signals?
I agree that I'm not "Eliezer", but I don't see what was unclear about saying that "Setting someone else's actions" is not the same as "Predicating your actions on [reliable expectation of] someone else's actions' predication on [reliable expectation of] your actions".
I agree that it is not literally correct to say that P <=> (Q <=> P) is a causal network, and that was an error of imprecision on my part. My point (in the remark you refer to) was that the decision theory I stated in the article, which you have lossily represented as P <=> (Q <=> P), obeys the rules of causal equivalence, not logical equivalence. (Applying the rules of the latter to the former results in such errors as believing that a Clippy haphazardly making paperclips implies that a paperclip truck might have overturned, or that setting others' actions is the same as setting your actions to depend on others' actions.)
A more rigorous specification of the decision theory corresponding to "I would cooperate with you if and only if (you would cooperate with me if and only if I would cooperate with you)." would involve more than just P <=> (Q <=> P).
I haven't built up the full formalism of humans credibly signaling their decision theories in this discussion, involving the roles of expectations, because that wasn't the point of the article; it's just to show that there are cooperation-favoring signals you can give that would favor a global move toward cooperation if you could make the signal significantly more reliable. If that point more heavily depended on stating the formalism, I would have gone into more detail on it in the discussion, if not the article.
This is clearer, and I now think that I understand what you meant. You're saying that humans should signal
Here, th... (read more)