It's maybe fun to debate about whether they had mens rea, and the courts might care about the mens rea after it all blows up, but from our perspective, the main question is what behaviors they’re likely to engage in, and there turn out to be many really bad behaviors that don’t require malice at all.
I agree this is the main question, but I think it's bad to dismiss the relevance of mens rea entirely. Knowing what's going on with someone when they cause harm is important for knowing how best to respond, both for the specific case at hand and the strategy for preventing more harm from other people going forward.
I used to race bicycles with a guy who did some extremely unsportsmanlike things, of the sort that gave him an advantage relative to others. After a particularly bad incident (he accepted a drink of water from a rider on another team, then threw the bottle, along with half the water, into a ditch), he was severely penalized and nearly kicked off the team, but the guy whose job was to make that decision was so utterly flabbergasted by his behavior that he decided to talk to him first. As far as I can tell, he was very confused about the norms and didn't realize how badly he'...
I suggest minting a new word, for people who have the effects of malicious behavior, whether it's intentional or not.
Why only malicious behavior? It seems like the relevant idea is more general: oftentimes we care about what outcomes a pattern of behavior looks optimized to achieve in the world, not about the person's conscious subjective verbal narrative. (Separately from whether we think those outcomes are good or bad.)
Previously, I had suggested "algorithmic" intent, as contrasted to "conscious" intent. Claims about algorithmic intent correspond to predictions about how the behavior responds to interventions. Mistakes that don't repeat themselves when corrected are probably "honest mistakes." "Mistakes" that resist correction, that systematically steer the future in a way that benefits the actor, are probably algorithmically intentional.
Sorry, not load-bearing; I think "steering the future" was the important part of that sentence.
Although in the case of tantrums, I think the game-theoretic logic is pretty clear: if I predictably make a fuss when I don't get my way, then people who don't want me to make a fuss are more likely to let me get my way (to a point). The fact that tantrums don't benefit the actor when they happen, isn't itself enough to show that they're not being used to successfully extort concessions to make them happen less often. If it doesn't work in the modern workplace, it probably worked in the environment of evolutionary adaptedness.
Sometimes also tantrums work in the training distribution of childhood and don't work in the deployment environment of professional work.
I suggest minting a new word, for people who have the effects of malicious behavior, whether it’s intentional or not.
I've long used "destructive" for that.
Labeling (in particular) catastrophically incompetent people "maleficient" sounds malevolent. While the concern might be valid in theory, this label has connotations that probably don't help with the inherent practical witch hunt and reign of terror risks of the whole concept.
Also, the apparent Chesterton-Schelling fences my intuition is loudly hallucinating at this post say to stop before instituting a habit of using such classification. Immediately-decision-relevant concepts are autonomous superweapons, controversial norms that resist attempts at keeping their boundaries in reasonable/intended places.
My stance is "the more we promote awareness of the psychological landscape around destructive patterns of behavior, the better." This isn't necessarily at odds with what you're saying because "the psychological landscape" is a descriptive thing, whereas your objection to Nate's proposal is that it seeks to be "immediately-decision-relevant," i.e., that it's normative (or comes with direct normative implications).
So, maybe I'd agree that "maleficient" might be slightly too simplistic of a classification (because we may want to draw action-relevant boundaries in different places depending on the context – e.g., different situations call for different degrees of risk tolerance of false positives vs. false negatives).
That said, I think there's an important message in Nate's post and (if I had to choose one or the other) I'm more concerned about people not internalizing that message than about it potentially feeding ammunition to witch hunts. (After all, someone who internalizes Nate's message will probably become more concerned about the possibility of witch hunts – if only explicitly-badly-intentioned people instigated witch hunts or added fuel to the fires, history would look very different.)
"maleficient" might be slightly too simplistic of a classification
There is an interesting phenomenon around culture wars where a crazy amount of concepts is generated to describe the contested territory with mind-boggling nuance. I have a hunch that this is not just expertise signaling, but actually useful for dissolving the conceptual superweapons in a sea of distinctions. This divests the original contentious immediately-decision-relevant concept of its special role that gives it power, by replacing it with a hundred slightly-decision-relevant distinctions where none of them have significant power.
A disagreement that was disputing a definition about placement of its boundaries becomes a disagreement about decision procedures in terms of many unchanging and uncontroversial definitions that cover all contested territory in detail. After the dispute is over, most of the technical distinctions can once again be discarded.
Humans care about this stuff enough to bake it into their legal codes.
It's mostly Western culture that does this. There's a lot of variation in how much cultures care about bad intentions.
IANAL, but I believe that the doctrine of mens rea is different from what is suggested here, and the difference has application to the larger context.
The mens rea is simply the intention to have done the actus reus, the illegal act. If, for example, a company director puts their signature to a set of false accounts, knowing they are false, then there is mens rea. It will cut no ice in court for them to profess that "my goodness, I didn't know that was illegal!", or "oh, but surely that wasn't really fraud", or "but it was for a vital cause!"
What matters is that they did the thing, intending to do the thing.
I suggest minting a new word, for people who have the effects of malicious behavior
I thought that "toxic" was the usual word these days.
this might be a bit outside the scope of this post, but it would probably help if there was a way to positively respond to someone who was earnestly messing up in this manner before they cause a huge fiasco. If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked
One hypothesis I have for why people care so much about some distinction like this is that humans have social/mental modes for dealing with people who are explicitly malicious towards them, who are explicitly faking cordiality in attempts to extract some resource. And these are pretty different from their modes of dealing with someone who's merely being reckless or foolish. So they care a lot about the mental state behind the act.
[...]
On this theory, most people who are in effect trying to exploit resources from your community, won't be explicitly malicious, not even in the privacy of their own minds. (Perhaps because the content of one’s own mind is just not all that private; humans are in fact pretty good at inferring intent from a bunch of subtle signals.) Someone who could be exploiting your community, will often act so as to exploit your community, while internally telling themselves lots of stories where what they're doing is justified and fine.
I note that while I find both paragraphs individually reasonable [and I find myself nodding along to them], there seems to be a soft contradiction between them that needs explanation.
Namely, why is human (whether genetic or cultural) e...
Copied text from a Facebook post that feels related (separating intent from result):
...In Duncan-culture, there are more mistakes you're allowed to make, up-front, with something like "no fault."
e.g. the punch bug thing—if you're in a context where lots of people play punch bug, then you're not MORALLY CULPABLE if you slug somebody on the shoulder and then they say "Ouch, I don't like that, do not do that."
(You're morally culpable if you do it again, after their clear boundary, but Duncan-culture has more wiggle room for first-trespasses.)
However, Duncan-culture is MORE strict about something like ...
"I hurt people! But it's okay, I patched the dynamic that led to the hurt. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay..."
In Duncan-culture, you can get away with about two rounds of that. On the third screwup, pretty much everybody joins in to say "no. Stop. You are clearly just capable of inventing new mistakes every time. Cease this iterative process."
And if you don't—if you keep going,
Curated.
In some sense, I knew all this 10 years ago when I first started community-organizing and running into problems with various flavors of deception, manipulation, and people-hurting-each-other.
But, I definitely struggled to defend my communities against people who didn't quite match my preconception of what "a person I would need to defend against" looked like. My sympathy and empathy for some people made me more hesitant to enforce my boundaries.
I don't know that I'm thrilled with "malefactor" or "maleficence" as words (they seem too similar to "malicious" and don't think they convey the right set of things), but, I very much agree with the distinction being useful.
Interpersonal abuse (eg parental, partner, etc) has a similar issue. People like to talk as if the abuser is twirling their mustache in their abuse-scheme. And while this is occasionally the case, I claim that MOST abuse is perpetrated by people with a certain level of good intent. They may truly love their partner and be the only one who is there for them when they need it, BUT they lack the requisite skills to be in a healthy relationship.
Sadly this is often due to a mental illness, or a history of trauma, or not getting to practice these skills growing ...
I don't have any terminological suggestions that I love
Following on my prior comment, the actual legal terms used for the (oxymoronic) "purposeless and unknowing mens rea" might provide an opening for the legal-social technologies to provide wisdom on operationizing these ideas - "negligent" at first, and "reckless" when it's reached a tipping point.
This is an important distinction, otherwise you risk getting into unproductive discussions about someone's intent instead of focusing on whether a person's patterns are compatible with your or your group/community's needs.
It doesn't matter if someone was negligent or malicious: if they are bad at reading your nonverbal cues and you are bad at explicitly saying no to boundary crossing behaviors, you are incompatible and that is reason enough to end the relationship. It doesn't matter if someone is trying their best: if their best is still disruptive to your...
When dealing with someone who's doing something bad, and it's not clear whether they're conscious of it or not, one tactic is to tell them about it and see how they respond. (It is the most obviously prosocial approach.) Ideally, this will either fix the situation or lead towards establishing that they are, at the very least, reprehensibly negligent, and then you can treat them as malicious. (In principle, the difference between a malicious person and one who accidentally behaves badly is that, if both of them come to understand that thei...
(As an example, various crimes legally require mens rea, lit. “guilty mind”, in order to be criminal. Humans care about this stuff enough to bake it into their legal codes.)
Even in the law of mental states, intent follows the advice in this post. U.S. law commonly breaks down the 'guilty mind' into at least four categories, which, in the absence of a confession, all basically work by observing the defendant's patterns of behaviour. There may be some more operational ideas in the legal treatment of reckless and negligent behaviour.
I know this post was chronologically first, but since I read them out of order my reaction was "wow, this post is sure using some of the notions from the Waluigi Effect mega-post, but for humans instead of chatbots"! In particular, they're both pointing at the notion that an agent (human or AI chatbot) can be in something like a superposition between good actor and bad actor unlike the naive two-tone picture of morality one often gets from children's books.
At the time, I remarked to some friends that it felt weird that this was being presented as a new insight to this audience in 2023 rather than already being local conventional wisdom.[1] (Compare "Bad Intent Is a Disposition, Not a Feeling" (2017) or "Algorithmic Intent" (2020).) Better late than never!
The "status" line at the top does characterize it as partially "common wisdom", but it's currently #14 in the 2023 Review 1000+ karma voting, suggesting novelty to the audience. ↩︎
Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.
After a recent article in NY Times, I realized that it's a perfect analogy. The smartest people, when motivated by money, get so high that they venture into unsafe territory. They kinda know its unsafe, but even internally it doesn't feel like crossing the red line.
It's not even about the strength of characters, when incentives are aligned 99:1 against your biology, you can try to work against it, but you most probably stand no chance.
It takes enormous willpower to quit smoking explicitly because the risks are invisible and so "small". It's not only you ha...
Strong upvote. A corollary here is that a really important part of being a “good person” is being good at being able to tell when you’re rationalizing your behavior/otherwise deceiving yourself into thinking you’re doing good. The default is that people are quite bad at this but as you said don’t have explicitly bad intentions, which leads to a lot of people who are at some level morally decent acting in very morally bad ways.
Very excited for there to be definitely no differences between stereotypical malefactors and actual malefactors; no differences between stereotypical maleficence and actual maleficence; very excited for there to be no gameable cultural impressions about what makes a person a probable malefactor
... Not to imply that any gaming that would take place would be intentional, of course.
...This isn’t to say no coordination happens. I expect a little coordination happens openly, through prosocial slogans, just to overcome free rider problems. Remember Trivers’ theory
I agree with this very intensely. I strongly regret unilaterally promoting the CFAR Handbook on various groups on Facebook; I thought that it was critical to minimize the number of AI safety and adjacent people using Facebook and that spreading the CFAR handbook was the best way to do that, and I mistakenly believed that CFAR was bad at marketing their material instead of choosing not to in order to avoid overcomplicating things. I had no way of knowing about the long list of consequences for CFAR for spreading their research in the wrong places, and CFAR ...
I have used this dichotomy, 5 - 100 times during the last few years. I am glad it was brought to my attention.
It does seem worth having a term here! +4 for pointing it out and the attempt.
I gesture at a similar model here: https://www.lesswrong.com/posts/XPwEptSSFRCnfHqFk/zoe-curzi-s-experience-with-leverage-research?commentId=EM5TKrdsLLgBK78Qz
Here is a fictional, but otherwise practical example: the attempted rape that sets in motion the action of "Thelma and Louise". Here on YouTube. Notice what Harlan says at 0:50: "I'm not gonna hurt you".
How does he experience his intentions at that moment? At the moment after Thelma slaps him and he beats her?
Does it matter?
focusing less on intent and more on patterns of harm
In a general context, understanding intent though will help to solve the issue fundamentally. There might be two general reasons behind harmful behaviors: 1.do not know this will cause harm, or how not to cause harm, aka uneducated on this behavior/being ignorant, 2.do know this will cause harm, and still decided to do so. There might be more nuances but these two are probably the two high level categories. Knowing what the intent is helps to create strategies to address the issue - 1.more education? 2.more punishments/legal actions?
What people need to get is that Lying is the weaker subset of Deception. It's the type you can easily call out and retaliate against.
Which is why we evolved to have strong instinctive reactions to it.
Yeah, we don't know if the people who sent the Boy Who Had Cried Wolf to guard the sheep were stupid or evil. But we do know they committed murder.
What material policy changes are being advocated for, here? I am having trouble imagining how this won't turn into a witch-hunt.
Harmful people often lack explicit malicious intent.
I was having a discussion with ChatGPT where it also claimed to believe the same thing as this. I asked it to explain why it thinks this. It's reasoning was that well-intentioned people often make mistakes, and that malign actors do not always succeed in their aims. I'll say!
I disagree completely with the idea that well-intentioned people can actually cause any harm, but even if you presume that they could, it isn't clear to me how malign actors being unable to succeed in their aims is enough to balance o...
As a former EA, I basically agree with this, and I definitely agree that we should start shifting to a norm that focuses on punishing bad actions, rather than trying to infer their mental state.
On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.
Even if you accept that all cryptocurrency is valueless, it is possible to operate a crypto-related firm that does what it says it does or one that doesn't.
For example, if two crypto exchanges accept Bitcoin deposits and say they will keep the Bitcoin in a safe vault for their customers, and then one of them keeps the Bitcoin in the vault while the other takes it to cover its founder's personal expenses/an affiliated firm's losses, I think it is fair to say that the second of these has committed fraud and the first has not, regardless of whether Bitcoin has anything 'real' about it or whether it disappears into a puff of smoke tomorrow.
Status: some mix of common wisdom (that bears repeating in our particular context), and another deeper point that I mostly failed to communicate.
Short version
Harmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.
(Credit to my explicit articulation of this idea goes in large part to Aella, and also in part to Oliver Habryka.)
Long version
A few times now, I have been part of a community reeling from apparent bad behavior from one of its own. In the two most dramatic cases, the communities seemed pretty split on the question of whether the actor had ill intent.
A recent and very public case was the one of Sam Bankman-Fried, where many seem interested in the question of Sam's mental state vis-a-vis EA. (I recall seeing this in the responses to Kelsey's interview, but haven't done the virtuous thing of digging up links.)
It seems to me that local theories of Sam's mental state cluster along lines very roughly like (these are phrased somewhat hyperbolically):
One hypothesis I have for why people care so much about some distinction like this is that humans have social/mental modes for dealing with people who are explicitly malicious towards them, who are explicitly faking cordiality in attempts to extract some resource. And these are pretty different from their modes of dealing with someone who's merely being reckless or foolish. So they care a lot about the mental state behind the act.
(As an example, various crimes legally require mens rea, lit. “guilty mind”, in order to be criminal. Humans care about this stuff enough to bake it into their legal codes.)
A third theory of Sam’s mental state that I have—that I credit in part to Oliver Habryka—is that reality just doesn’t cleanly classify into either maliciousness or negligence.
On this theory, most people who are in effect trying to exploit resources from your community, won't be explicitly malicious, not even in the privacy of their own minds. (Perhaps because the content of one’s own mind is just not all that private; humans are in fact pretty good at inferring intent from a bunch of subtle signals.) Someone who could be exploiting your community, will often act so as to exploit your community, while internally telling themselves lots of stories where what they're doing is justified and fine.
Those stories might include significant cognitive distortion, delusion, recklessness, and/or negligence, and some perfectly reasonable explanations that just don't quite fit together with the other perfectly reasonable explanations they have in other contexts. They might be aware of some of their flaws, and explicitly acknowledge those flaws as things they have to work on. They might be legitimately internally motivated by good intent, even as they wander down the incentive landscape towards the resources you can provide them. They can sub- or semi-consciously mold their inner workings in ways that avoid tripping your malice-detectors, while still managing to exploit you.
And, well, there’s mild versions of the above paragraph that apply to almost everyone, and I’m not sure how to sharpen it. (Who among us doesn’t subconsciously follow incentives, and live under the influence of some self-serving blind spots?)
But in the cases that dramatically blow up, the warp was strong enough to create a variety of advance warning signs that are obvious to hindsight. But also, yeah, it’s a matter of degree. I don’t think there’s a big qualitative divide, that would be stark and apparent if you could listen in on private thoughts.
People do sometimes encounter adversaries who are explicitly malicious towards them. (For a particularly stark example, consider an enemy spy during wartime.) Spies and traitors and turncoats are real phenomena. Sometimes, the person you're interacting with really is treating you as a device that they're trying to extract information or money from; explicit conscious thoughts about this are really what you'd hear if you could read their mind.
I also think that that's not what most of the bad actors in a given community are going to look like. It's easy, and perhaps comfortable, to say "they were just exploiting this community for access to young vulnerable partners" or "they were just exploiting this community for the purpose of reputation laundering" or whatever. But in real life, I bet that if you read their mind, the answer would be far messier, and look much more like they were making various good-faith efforts to live by the values that your community professes.
I think it's important to acknowledge that fact, and build community processes that can deal with bad actors anyway. (Which is a point that I attribute in large part to Aella.)
There's an analogy between the point I'm making here, and the one that Scott Alexander makes in The Media Very Rarely Lies*. Occasionally the media will literally fabricate stories, but usually not.
If our model is that there's a clear divide between people who are literally fabricating and people who are "merely" twisting words and bending truths, and that we mostly just have to worry about the former, then we'll miss most of the harm done. (And we’re likely to end up applying a double standard to misleading reporting done by our allies vs. our enemies, since we’re more inclined to ascribe bad intentions to our enemies.)
There's some temptation to claim that the truth-benders have crossed the bright red line into "lying", so that we can deploy the stronger mental defenses that we use against "liars".
But... that's not quite right; they aren't usually crossing that bright red line, and the places where they do cross that line aren’t necessarily the places where they’re misleading people the most. If you tell people to look out for the bright red line then you'll fail to sensitize them to the actual dangers that they're likely to face. The correct response is to start deploying stronger defenses against people who merely bend the truth.
(Despite the fact that lots of people bend the truth sometimes, like when their mom asks them if they’ve stopped dating blue-eyed people yet while implicitly threatening to feel a bunch of emotional pain if they haven’t, and they technically aren’t dating anyone right now (but of course they’d still date blue-eyed people given the opportunity) so they say “yes”. Which still counts as bending the truth! And differs only by a matter of degree! But which does not deserve a strong community response!)
(Though people do sometimes just make shit up, as is a separate harsh lesson.)
I think there's something similar going on with community bad actors. It's tempting to imagine that the local bad actors crossed bright red lines, and somehow hid that fact from everybody along the way; that they were mustache-twirling villains who were intentionally exploiting you while cackling about it in the depths of their mind. If that were true, it would activate a bunch of psychological and social defense mechanisms that communities often try to use to guard against bad actors.
But... historically, I think our bad actors didn't cross those bright red lines in a convenient fashion. And I think we need to be deploying the stronger community defenses anyway.
I don't really know how to do that (without causing a bunch of collateral damage from false positives, while not even necessarily averting false negatives much). But I hereby make a bid for focusing less on whether somebody is intentionally malicious.
I suggest minting a new word, for people who have the effects of malicious behavior, whether it's intentional or not. People who, if you step back and look at them, seem to leave a trail of misery in their wake, or a history of recklessness, or a pattern of negligence.
It's maybe fun to debate about whether they had mens rea, and the courts might care about the mens rea after it all blows up, but from our perspective, the main question is what behaviors they’re likely to engage in, and there turn out to be many really bad behaviors that don’t require malice at all.
I don't have any terminological suggestions that I love. My top idea so far is to repurpose the old word "malefactor" for someone who has a pattern of ill effects, regardless of their intent. (This in contrast with "enemy", which implies explicit ill intent.)
And for lack of a better word, I’ll suggest the word “maleficence” to describe the not-necessarily-malevolent mental state of a malefactor.
I think we should basically treat discussions about whether someone is malicious as recreation (when they do not explicitly have documentation of being a literal spy/traitor/etc., nor identify as an enemy), and I think that maleficence is what matters when deploying community (or personal) defense mechanisms.