I want people to work toward noble efforts like charity work, but don't care much about whether they attian high status. So it's useful to aid the bit of their brain that wants to do what I want it to do.
People who care about truth might spot that part of your AI's brain wants to speak the truth, and so they will help it do this, even though this will cost it Diplomacy games. They do this because they care more about truth than Diplomacy.
I don't see how this is admirable at all. This is coercion.
If I work for a charitable organization, and my primary goal is to gain status and present an image as a charitable person, then efforts by you to change my mind are adversarial. Human minds are notoriously malleable, so it's likely that by insisting I do some status-less charity work you are likely to convince me on a surface level. And so I might go and do what you want, contrary to my actual goals. Thus, you have directly harmed me for the sake of your goals. In my opinion this is unacceptable.
I have become convinced that problems of this kind are the number one problem humanity has. I'm also pretty sure that most people here, no matter how much they've been reading about signaling, still fail to appreciate the magnitude of the problem.
Here are two major screw-ups and one narrowly averted screw-up that I've been guilty of. See if you can find the pattern.
It may not be immediately obvious, but all three examples have something in common. In each case, I thought I was working for a particular goal (become capable of doing useful Singularity work, advance the cause of a political party, do useful Singularity work). But as soon as I set that goal, my brain automatically and invisibly re-interpreted it as the goal of doing something that gave the impression of doing prestigious work for a cause (spending all my waking time working, being the spokesman of a political party, writing papers or doing something else few others could do). "Prestigious work" could also be translated as "work that really convinces others that you are doing something valuable for a cause".
We run on corrupted hardware: our minds are composed of many modules, and the modules that evolved to make us seem impressive and gather allies are also evolved to subvert the ones holding our conscious beliefs. Even when we believe that we are working on something that may ultimately determine the fate of humanity, our signaling modules may hijack our goals so as to optimize for persuading outsiders that we are working on the goal, instead of optimizing for achieving the goal!
You can see this all the time, everywhere:
There's an additional caveat to be aware of: it is actually possible to fall prey to this problem while purposefully attempting to avoid it. You might realize that you have a tendency to only want to do particularly prestigeful work for a cause... so you decide to only do the least prestigeful work available, in order to prove that you are the kind of person who doesn't care about the prestige of the task! You are still optimizing your actions on the basis of expected prestige and being able to tell yourself and outsiders an impressive story, not on the basis of your marginal impact.