LESSWRONG
LW

Kaj_Sotala
51058Ω5513035647161
Message
Dialogue
Subscribe

I've formerly worked for MIRI and what's now the Center on Long-Term Risk; I'm now making a living as an emotion coach and Substack writer. 

Most of my content becomes free eventually, but if you'd like to get a paid subscription to my Substack, you'll get it a week early and make it possible for me to write more.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
6Kaj's shortform feed
Ω
7y
Ω
98
Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss
Generative AI is not causing YCombinator companies to grow more quickly than usual (yet)
Kaj_Sotala6h20

It seems safe to assume that most YC companies were not using it much before the launch of ChatGPT (if only because the technology wasn’t available)

Strictly speaking the technology was available (I got a startup that I consulted for to adopt GPT-3 roughly a year before ChatGPT happened). That said, it wasn't very widely known, so your take still seems like a reasonable approximation.

Reply
Mech Interp Wiki Page and Why You Should Edit Wikipedia
Kaj_Sotala1d60

Wikipedia articles have traditionally been emphasized in LLM training. OpenAI never told us the dataset used to train GPT-4 or GPT-5, but the dataset used for training GPT-3 involved 3.4 repetitions of Wikipedia.

The Pile also has English Wikipedia repeated three times, which is a higher multiplier than any other subcomponent.

Reply
Defensiveness does not equal guilt
Kaj_Sotala5d30

Being defensive can certainly mean behaviors that go that extreme but I've seen it used to cover much milder and more acceptable behaviors too, such as merely insisting on one's innocence in a way where you are unwilling to admit having done any single thing wrong.

Claude Sonnet 4's explanation of what "acting defensive in conversation" means

When someone is described as acting defensive, they're typically engaging in behaviors that protect themselves from perceived criticism, blame, or attack. This usually involves:

Deflecting responsibility - They might redirect blame to others, make excuses, or refuse to acknowledge their role in a problem. Instead of saying "I made a mistake," they might say "Well, you didn't give me clear instructions" or "Anyone would have done the same thing."

Counterattacking - Rather than addressing the issue raised, they turn the focus back on the person confronting them. If criticized for being late, they might respond with "You're always nitpicking" or bring up the other person's past mistakes.

Minimizing or denying - They downplay the significance of their actions or outright deny that something happened. "It wasn't that big a deal" or "I never said that" are common defensive responses.

Emotional escalation - Their tone may become angry, hurt, or indignant. They might raise their voice, become sarcastic, or shut down entirely.

Justifying extensively - They provide lengthy explanations for why their behavior was reasonable or necessary, often missing the actual point being raised.

Taking things personally - They interpret feedback about specific actions as attacks on their character or competence.

Defensiveness usually stems from feeling threatened, vulnerable, or ashamed. While it's a natural protective response, it often prevents productive communication and problem-solving because the person isn't really listening to or engaging with the concerns being raised.

Reply
Defensiveness does not equal guilt
Kaj_Sotala5d31

When someone is showing you a fear that looking at the evidence will leave you thinking they're bad, this implies a belief that they are indeed bad -- at least by your interpretation, which is obviously the one that matters to you. [...]

Because the only way that fear can be reflectively stable is if they are guilty.

I think this is assuming that the people looking at the evidence can be trusted to make a fair and impartial assessment of it and not jump to any unjustified conclusions?

I do agree that if someone has strong reasons to believe that, and to believe that nobody will be motivated to take any of the information out of context and paint them in a bad light later, etc., then hiding information only makes sense if you are in fact guilty. 

But it's very often the case that people don't have reason to feel that secure, and have cause to believe that at least part of their audience will jump to conclusions, have all kinds of hostile motives, be inclined to treat one party's word as intrinsically more trustworthy than the other's, not have the time or interest to really think it through, etc..

In the kinds of examples that I gave in the original post where I'd gotten defensive despite not feeling guilty, it was exactly because the other party gave signs that they were not inclined to consider the evidence in a balanced way - if they wanted to listen to it at all.

Even if a person I'm talking to trusts me to fairly consider the evidence, if there are any other people witnessing the conversation, those others might still have hostile motives, making my interlocutor defensive. So it's not even the case that they necessarily expect the evidence to make them look bad by my interpretation, they can expect it to make them look bad by someone else's interpretation.

Reply
Defensiveness does not equal guilt
Kaj_Sotala5d42

Yeah, I'm starting with this part of your response because I agree and think it is good to have clear messaging on the most unambiguously one-directional ("guilty or not") pieces of evidence. 

That's a cool conversational move! Appreciate it.

What shouldn't happen is that onlookers give someone a pass because of reasoning that goes as follows:

Agree. When I wrote the post, I was thinking more of a case where someone does respond to the object-level claims but in a defensive way or with non-object-level arguments mixed in, not of a case where they entirely fail to present object-level-arguments.

Basically, the asymmetry is that innocent people can often (though not always) disclose information voluntarily that makes their innocence more clear/likely. That's the best strategy if it is available to you. It is never available to guilty people, but sometimes available to innocent people. 

I suspect we might disagree on exactly how frequently this strategy is available to innocent people. I do agree that it is sometimes available to innocent people, but there are also lots of situations where e.g. the innocent person can't offer any solid evidence that their version of the story is the correct one, or where they have some other reason not to share the full truth (e.g. protecting someone else's privacy or truth-telling requiring them to reveal something unrelated that they are embarrassed by or have a legal obligation not to reveal), or where the truth is complicated or unusual enough that third parties might not believe it, etc. 

Also, as long as the innocents are not fully convincing, many people might go "I can't tell who is telling the truth here so just out of caution I'll distrust everyone involved", which gives even innocent people a motive to leverage whatever extra weapons they have to increase the chances of being believed (or equivalently, the accuser not being believed).

Justifiably accused "problem people" will almost always attempt counterattacks in one form or another (if not calling into question the accuser's character, then at least their mental health and sanity) because they work so well as deflection.

Agree. But a relevant question is, do innocent people attempt counterattacks at a significantly lower rate? If both innocent and guilty people are roughly equally likely to attempt counterattacks, then just the presence of a counterattack isn't strong evidence. And as long as a counterattack is not less effective for an innocent person, you'd expect both innocent and guilty people to have a similar incentive to launch them.

WRT your last paragraph, I agree with your examples and think the difference probably comes from us thinking about different kinds of examples.

Reply
Defensiveness does not equal guilt
Kaj_Sotala5d75

After all, if you weren't guilty, you wouldn't need to defend yourself

This seems wrong to me? If someone says "X abused me" and X says nothing to defend themselves or refute it, people are likely to take the lack of a defense as an admission of guilt.

Reply
Defensiveness does not equal guilt
Kaj_Sotala5d51

When you say defensiveness, does that include something like "act as though you've been attacked viciously by a person who is biased against you because they're bad"?

Yeah.

The problem with the "immediately focus on maximally discrediting the accusers" is that is that it is awfully close to the tactic that actually guilty people might want to use to discredit or intimidate their accusers

Agree. But it's also a strategy that innocent people might want to use to show that the people accusing them don't have clean motives, or just something that they do automatically to defend themselves because they're under stress and it does work as a general-purpose defense strategy. So it doesn't seem like clear Bayesian evidence one way or the other?

I haven't thought this through in detail but my first thought would be to suspect that this is a strategy that weakly favors people who are actually innocent, assuming that the audience is reasonably discerning and it doesn't just degenerate into a popularity contest. In that while you can of course dig up dirt on anyone, being able to find accusation-relevant dirt ("this police accusing me has been known to take bribes and accuse innocent people before") seems more likely to happen in cases where you are in fact falsely accused.

Of course, if that's the only defense they offer and they don't bother refuting any of the actual accusations in any substantial way, that's certainly very suspicious. But then the suspicious thing is more the lack of an object-level response rather than the presence of a defensive response.

Reply
AI Induced Psychosis: A shallow investigation
Kaj_Sotala5d51

Maybe? My mental model of crackpots involves them writing very long manifestos.

Reply
AI Induced Psychosis: A shallow investigation
Kaj_Sotala6d102

If that makes a user unrealistic, then I'm unrealistic!

Reply1
AI companies have started saying safeguards are load-bearing
Kaj_Sotala7dΩ5194

I was amused when Claude Opus abruptly stopped generating a reply to me and shut down the chat when I had asked it how a fictional galactic empire might control its frontier planets. Given that it stopped generating in the middle of a sentence that was talking about "biological monitoring" and "enhanced", I surmised that the reference to the genetically engineered catboys/catgirls in the setting had triggered its bioengineering filters.

Reply
Load More
56Defensiveness does not equal guilt
5d
15
42Four types of approaches for your emotional problems
18d
5
226How anticipatory cover-ups go wrong
1mo
16
7Creative writing with LLMs, part 2: Co-writing techniques
1mo
0
36Creative writing with LLMs, part 1: Prompting for fiction
1mo
10
70LLM-induced craziness and base rates
2mo
2
80You can get LLMs to say almost anything you want
2mo
10
172Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
5mo
52
50Things I have been using LLMs for
7mo
6
158Don’t ignore bad vibes you get from people
7mo
50
Load More
Internal Family Systems
3y
(+68/-20)
Internal Family Systems
4y
(+306)
Internal Double Crux
4y
(+92)
Arguments As Soldiers
5y
(+473/-85)
AI Advantages
5y
Willpower
5y
(+6/-9)
Aumann's Agreement Theorem
5y
(+26/-501)