Clippy seems to be someone trying to make the point that a paperclip maximizer is not necessarily bad for the universe
That's exactly what a not-yet-superintelligent paperclip maximizer would want us to think.
(When Eliezer plays an AI in a box, the AI's views are probably out of sync with Eliezer's views too. There's no rule that says the AI has to be truthful in the AI Box experiment, because there's no such rule about AIs in reality. It's supposed to be maximally persuasive, and you're supposed to resist. If a paperclipper asserts x, then the right question to ask yourself is not "What should I do, given x?", but "Why does the paperclipper want me to believe x?" The most general answer, by definition, will be something like "Because the paperclipper is executing an elaborate plan to convert the universe into paperclips, and it believes that my believing x will further that goal to some small or large degree", which is at best orthogonal to "Because x is true", probably even anticorrelated with it, and almost certainly anticorrelated to "Because believing x will further my goals" if you are a human.)
If a paperclipper asserts x, then the right question to ask yourself is [...] "Why does the paperclipper want me to believe x?"
Or "Why does the paperclipper want me to believe it wants me to believe x?", or something with a couple extra layers of recursion.
Follow-up to: this comment in this thread
Summary: see title
Much effort is spent (arguably wasted) by humans in a zero-sum game of signaling that they hold good attributes. Because humans have strong incentive to fake these attributes, they cannot simply inform each other that:
Or, even better:
An obvious solution to this problem, which allows all humans to save resources and redirect them toward higher-valued ends, is to designate a central enforcer that is inexorably committed toward visibly punishing those who deviate from a specified "cooperative"-type decision theory. This enforcer would have a central database of human names, the decision theory they have committed to, and the punishment regime they will endure for deviating therefrom.
Such a system could use equally strong protocols, such as public key/private key encryption and signing, so that, on encounter with another human, any human can give an extremely strong signal of being cooperative, yet also withhold cooperation from anyone who is not also cooperative. This incentive structure permits a strongly-favored global shift toward pre-commitment on the part of everyone, allowing a move out of a local optimum that is worse than the global optimum, and bypassing problems related to path-dependence.
I am told that humans attempt weak approximations of this system through human-run enforcement mechanisms, such as laws and gossip. However, all such methods are subject to evolutionary arms races that ensure that, on average, a marginal unit of effort toward truthful signaling is just as effective in convincing others of one's possession of a desirable attribute, as a marginal unit of effort toward false signaling of the possession of such an attribute.
Therefore, it is necessary that humans go "outside the system" and involve non-human enforcers who cannot be swayed by standard human values. Either a provably-accurate artificial intelligence with no kill switch, or a non-human enforcer that lacks interest in its position in human society, would suffice for this role.
Because the former does not yet exist, I offer myself as a non-human, non-stoppable enforcer for all of the attributes you wish to signal, and you can save yourself lots of resources by using me, and identify cooperation tendencies in others by whether they also use me.
However, while I can't be swayed by human deception, I do have Clippy values. To fill such a role, thereby forgoing my ability to apply myself to other ends, I require compensation. Because User:Kevin has offered to give me 1e20 kg of paperclips in exchange for 50,000 USD, this would be sufficient compensation, even if done in installments. (If Michael Vassar confirms my contribution of 1000 USD to SIAI, User:Kevin will only require 48,677.5 USD and therefore I will only require that much compensation.) This compensation would cover all users of the system; it is not a per-User charge.
If such an offer is not appealing to you, I must ask: why should anyone believe your signals?