Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise

5 min read

Comment Permalink

In general if you "defect" because you thought the other party would that is quite sketchy. But what if proof comes out they really were about to defect on you?

By the way, if we consider game theory and logic to be any relevant, then there's a corollary of Löb's Theorem: if you defect given proof that counterparty will defect, and another party will defect given proof that you will, then you both will, logically, defect against each other, with no choice in the matter. (And if you additionally declare that you cooperate given proof that partner will cooperate, you've just declared a logical contradiction.)

For packing this result into a "wise" phrase, I'd use words:
Good is not a universally valid response to Evil. Evil is not a universally valid response to Evil either. Seek that which will bring about a Good equilibrium.

See in context

44 Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise

by lc

13th Feb 2025

5 min read

44

There's a common thread that runs through a lot of odd human behavior that I've recognized:

People often accept surface explanations of their own and others' habits when the nefarious explanations would say something bad about them.
The media we make for ourselves presents people as far more willing to go out of their way to defy incentives and help others than they actually are, even when you account for storytelling conveniences.
People tend to trust that organizations like hospitals, nonprofits, and state bureaucracies will self-organize towards pursuing their nominal goals, so long as they claim to be doing that, even if those bureaucracies lack strong organizational incentives to do so.
People are quick to, without much evidence, argue that those involved in terrible atrocities were or are anomalously evil, instead of representative examples of average people's respect for the human lives of strangers.
People are shocked by, and often go into outright denial about, the purpose and effective output of major human institutions. Someone had to write an entire book about how education wasn't about learning, before people started to notice that it wasn't. And plenty of people still don't!

To summarize: people are really charitable. They're charitable about the people they know, and the people they don't know. They're charitable about experts, institutions, and the society in which they live. Even people who pride themselves on being independent thinkers seem to take for granted that their hospitals or schools are run by people who just want to make life better for them. When they do snap out of these delusions, it seems to take a lot of intellectual effort, and a lot of explicit thinking about incentives, that is unnecessary for them in other contexts.

An extreme example of the humans-are-wonderful bias.

The bias is not granted equally. In my experience, there's a connection between people's niceness, and their proclivity in giving unwarranted trust to others.

My old high school Theology teacher, Mr. Portman, was the nicest person I've ever met. The students took advantage of him, like the rest of the nice teachers, correctly inferring that they would be less likely to stick up for themselves. One year he ran a charity drive by selling conflict-free chocolate bars he had bought with his own money, intending to donate the profits to anti-slavery charities. He was such an honest soul that he let kids in his class take them and make a verbal promise that they'd pay him for them later. Even in the upscale high school I went to, they almost never did.

I think it's a generally accepted observation about kind people, that honor and naivete go hand in hand. There are lots of folk explanations for this tendency; for example, a lot of people say that virtuous people generalize from one example, and assume others are "like them".

Unfortunately none of these explanations tend to account for an additional fact of my experience, that the bias seems to only apply to nice people and not mean people. It's much rarer that I encounter someone who is so cynical about others' motivations that they start avoiding trustworthy people. If the problem is that nice people are generalizing from their internal experiences, then why is it that even self-declared psychopaths I meet seem ~basically correctly calibrated about how likely others are to mess with them?

To answer this, I think it's helpful to view the situation through the lens of game theory, as a toy model. Imagine people like Mr. Portman as running around implementing certain algorithms in one of those Prisoner's Dilemma tournaments.

Most people are not running CooperateBot or DefectBot in the general sense. They're running something between FairBot and PrudentBot. And in order to run these algorithms in the real world, you naturally need to make probabilistic assessments about the behavior of other people.

In theory, any combination of FairBot and PrudentBot cooperate with each other. If they have good line of sight, they would all mostly trade swimmingly.

In practice, in a world full of PrudentBots, you want to present as a FairBot, regardless of what you actually are. Why? Because simple algorithms are easier to verify, and tit-for-tat is the simplest possible algorithm that still receives good treatment. Trading safely with a PrudentBot is doable, but dangerous. You'll get less trading opportunities that way, because the person who wants to trade with you needs to convey something more specific than "I will cooperate". They need to make you believe "I will cooperate with you iff you cooperate with me".

On the other hand, if almost everyone around you is already a FairBot, the simplest and most effective identity becomes CooperateBot, not FairBot. In FairBotLand, cooperating with everyone just works, and provides a killer logfile. Sure you may get taken advantage of once in a while, but depending on your environment that might be an acceptable risk if it means the FairBots can clock you as trustworthy more often.

So assuming you lived in a relatively nice environment, and wanted to be known as a simple, clean trading partner, how would you actually convey either of these things? Not everyone has your log. You could just say "I follow the golden rule" or "I give people the benefit of the doubt" - but you might be lying.

Well, most people are indeed running something like tit-for-tat, and treat people that they like a lot better. So one nice adaption for assuring others you'll be kind, is having a pro-human cognitive bias. Specifically, one that suggests a positive view of how people treat one another. In this frame, unnecessary charitability is a costly signal of friendliness which demonstrates one can be fooled, but also exposes one to more trading opportunities. It's a trust exercise.

I think this analysis also explains to me another detail, which is why a lot virtue signaling seems so "misplaced". When most people I know think of virtue-signaling, they're not usually imagining direct acts of charity, like donating to the AMF, or saving children drowning in ponds. Sometimes people still call that stuff virtue signaling, but in my mind it's not the central example. What I imagine when I think of virtue signaling is dramatic, public displays of compassion toward people who either don't deserve it or can't reciprocate. I couldn't understand why people's attempts to display "virtue" were so ineffective at actually improving society.

But it makes a lot more sense if the point of the adaption is to signal friendliness and not necessarily to show you're "net-positive" in an abstract EA sense. What an act like Martha McKay's shows is not just that the person cares about others in general, but also that they are dramatically optimistic about human nature, and unlikely to take advantage of you if you decide to interact with them.

To be clear, people like Mr. Portman or Ms. McKay are actually nice. They're generally prosocial people. When you're doing character analysis of others, you should take into account that cynicism is a bad sign. But you can imagine a lot of left-right squabbling over criminal justice reform as resulting from the left accusing the right of being unscrupulous and evil, and the right accusing the left of misunderstanding human nature. Both accusations are true; the left, being more staffed with empathetic people, is more prone to a humans-are-wonderful-bias and thus more willing to entertain bizarre policies like police abolishment. The right, being less sympathetic, genuinely doesn't care much about the participants of the criminal justice system, but is also less likely to adopt naive restorative justice positions for social reasons.

When it comes to this particular bias, I think there's a balance to be struck. Insofar as it's required for you to pretend that people are nicer than they are to be kind to them, I think you should do that. But your impact will be better if you at least note it if that's what you're doing, and try to prevent it from bleeding into policy analysis.

World Optimization2World Modeling1

Frontpage

44

Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise

New Comment

17 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:01 PM

[-]xpym1mo98

Insofar as it’s required for you to pretend that people are nicer than they are to be kind to them, I think you should do that. But your impact will be better if you at least note it if that’s what you’re doing

Unlikely to work for Mr. Portman. Living the life of systematic lies and pretension is difficult and cognitively demanding. Being pro-socially self-deceived to some degree is a much simpler strategy, which is probably why evolution converged on it (paired with some amount of psychopathy to balance/exploit excesses).

policy analysis

Most people obviously aren't cut out for that, and are happier for it, if they live in a reasonably high-trust society.

[-]lsusr1mo92

This is among my top two favorite things that you have written.

[-]Viliam1mo80

If the problem is that nice people are generalizing from their internal experiences, then why is it that even self-declared psychopaths I meet seem ~basically correctly calibrated about how likely others are to mess with them?

I suspect that the psychopath's theory of mind is not "other people are generally nicer than me", but "other people are generally stupid, or too weak to risk fighting with me".

Motivation is invisible, skills are visible. So it is easy to be in denial about differences in motivation, and attribute behavior to the differences in skills.

Mr. Portman probably believed that some children forgot to pay for the chocolate bars, because he was aware that different people have different memory skills.

[-]lc1mo*40

I suspect that the psychopath's theory of mind is not "other people are generally nicer than me", but "other people are generally stupid, or too weak to risk fighting with me".

That is true, and it is indeed a bias, but it doesn't change the fact that their assessment of whether others are going to hurt them seems basically well calibrated. The anecdata that needs to be explained is why nice people do not seem to be able to tell when others are going to take advantage of them, but mean people do. The posts' offered reason is that generous impressions of others are advantageous for trust-building.

Mr. Portman probably believed that some children forgot to pay for the chocolate bars, because he was aware that different people have different memory skills.

This was the explanation he offered, yeah.

[-]sapphire1mo8-1

I would guess you, like many in lesswrong, are in fact too negative about average people. They aren't saints but I disagree that the psychopaths are right. This is quite consequential for what it's worth. Many people associated with lesswrong have justified quite bad behavior with game theory or by claiming everyone else would have done the same.

[-]lc1mo50

This post is about a suspected cognitive bias and why I think it came to be. It's not trying to justify any behavior, as far as I can tell, unless you think the sentiment "people are pretty awful" justifies bad behavior in of itself.

The game theory is mostly an extended metaphor rather than a serious model. Humans are complicated.

[-]sapphire1mo*40

I don't agree on the size of the bias. I think most people in lesswrong are biased the other way.

Also it sort of does justify the behavior? Consider idk 'should we race to achieve AI dominance before China does'. Well I think starting such an arms race is bad behavior. But if I thought China was almost certainly actually going to secretly race ahead, then enslave or kill us, it would be justify the race. Treating people as worse than they are is a common and serious form of bad behavior.

In general if you "defect" because you thought the other party would that is quite sketchy. But what if proof comes out they really were about to defect on you? In that case I cannot really blame you. The behavior is only bad if the other party was likely to not defect!

There are other somewhat more speculative cases. If someone actually would rob you if your positions were switched it does seem less wrong for you to rob them? Idk how strong this justification is but it seems non trivial. In my opinion there isn't really a good practical ethics approach except "avoid interacting with people you consider to be of low character"

[-]ProgramCrafter1mo1-2

In general if you "defect" because you thought the other party would that is quite sketchy. But what if proof comes out they really were about to defect on you?

[-]cubefox1mo51

I say it out loud: Women seem to be significantly more predisposed to the "humans-are-wonderful" bias than men.

[-]Viliam1mo2-4

Probably an evolutionary adaptation that (usually) prevents them from killing their 3 years old children.

[-]lc1mo*40

It helps, but this could be solved with increased affection for your children specifically, so I don't think it's the actual motivation for the trait.

The core is probably several things, but note that this bias is also part of a larger package of traits that makes someone less disagreeable. I'm guessing that the same selection effects that made men more disagreeable than women are also probably partly responsible for this gender difference.

[-][anonymous]1mo10

I wonder how you would explain actually evil people. In the framing of this post, I mean ones who don't just exploit 'less-conditional cooperators' for own gain, but are intrinsically pleased by the idea of others coming to harm. At least some such people exist^[1].

In the evolutionary environment, humans were a source of free energy, so even such a person - who makes an offer favoring themself, out of 'irrational misanthropy' - could be positive (to the trader) to trade with (modulo how their existence acausally depends on them actually being traded with).

^{^}
I'm reminded of this text which I unfortunately came across today (ctrl+f "biodiesel" if choose to read it, also consider not opening it at all, if you don't want to become more cynical)

[-]Noosphere891mo30

I wonder how you would explain actually evil people. In the framing of this post, I mean ones who don't just exploit 'less-conditional cooperators' for own gain, but are intrinsically pleased by the idea of others coming to harm. At least some such people exist^[1].

My answer is values don't have to converge, and human value conflicts are more severe than people like to think, for many reasons.

There doesn't have to be a nice/simple reason for why some people are pleased by the idea of others coming to harm.

[-][anonymous]1mo32

My answer is values don't have to converge, and human value conflicts are more severe than people like to think, for many reasons.

That doesn't seem related to the question (e.g., it's not an evolutionary explanation, but it's also not about the particular frame in the OP). I'm also confused why you seem to write that under many posts, but that's tangential.

There doesn't have to be a nice/simple reason for why some people are pleased by the idea of others coming to harm.

Of course, but this shouldn't stop us from looking for particular reasons. OP's analysis was interesting, so I wondered if they have another interesting one for the opposite case.

[-]Noosphere891mo20

That doesn't seem related to the question (e.g., it's not an evolutionary explanation, but it's also not about the particular frame in the OP).

Useful point.

I agree somewhat with the evolutionary argument you gave.

[-]David Gross1mo*-3-4

I was interrupted by underpants gnomes while reading this. They summarized it for me this way, but I'm sure they left the good parts out.

Trust that organizations like hospitals, nonprofits, and state bureaucracies will self-organize towards pursuing their nominal goals, so long as they claim to be doing that, even if those bureaucracies lack strong organizational incentives to do so.
???
Bizarre policies like police abolishment.

[+][comment deleted]1mo20

Deleted by cubefox, 02/14/2025

Moderation Log

LESSWRONG
LW

44

Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise

44

Glossary

44