Zack_M_Davis - LessWrong

[Meta] New moderation tools and moderation guidelines

My guess is something like more than half of the authors to this site who have posted more than 10 posts that you commented on, about you, in particular. Eliezer, Scott Alexander, Jacob Falkovich, Elizabeth Van Nostrand, me, dozens of others.

You are making false claims. Two of these claims about the views of specific individuals are clearly contradicted by those individuals' own statements, as I exhibit below.

I reached out to Scott Alexander via Discord on 11 July 2025 to ask if he had "any specific feelings about Said Achmiz and whether he should be allowed to post on Less Wrong". Alexander issued this statement:

I have no direct opinion on him. I have heard his name as someone who's very confrontational, and I agree that this can make a website less pleasant, but I can't remember having any personal experience.

Separately, as I mentioned to you in our meeting of 26 June 2025, in a public comment of 9 October 2018, Jacob Falkovich wrote (bolding added):

Said, I have seen a lot of your comments on LW, on my posts and the posts of others. They are, by my standards, high on criticism and low on niceness. I personally formed an impression of you as disagreeable. Even though I have argued myself that LW should optimize for honesty over niceness, still the impression of you disagreeableness was colored negatively in my mind.

But now that you've stated that you're disagreeable on purpose, the negative effect flipped entirely to become positive. Instead of you being disagreeable by accident, it's intentional. I like diversity, and I support people who are on a mission to bring a new flavor to the community. Knowing this also makes it easier to take criticism from you - it's not that you hate me or what I write, it's just that you don't care if someone thinks you hate them and their writing. The Bayesian update in the two cases is very different!

If Anyone Builds It, Everyone Dies: Advertisement design competition

Zack_M_Davis3d42

If misalignment of LLM-like AI due to contamination from pretraining data is an issue, it would be better and more feasible to solve that by AI companies figuring out how to (e.g.) appropriately filter the pretraining data, rather than everyone else in the world self-censoring their discussions about how the future might go. (Superintelligence might not be an LLM, after all!) See the "Potential Mitigations" section in Alex Turner's post on the topic.

Generalized Hangriness: A Standard Rationalist Stance Toward Emotions

[+]Zack_M_Davis4d-22-13

Zetetic explanation

Zack_M_Davis5d20

Thanks, I see it now. I think the insult has more to do with the relationship-terminating aspect (the part where Alice sets a lifetime birthday budget, or the moderator says he's changing his stance towards allegedly low-interpretive-effort comments going forward) than the mere tracking of time costs. When my busy friend says she can only talk on the phone for twenty minutes (and that affects what I want to talk about with our limited time), it doesn't feel insulting because the budget is just for that call, not our entire relationship.

Effort is a cost, not a benefit.

I'm not convinced Ben was making that mistake (which I expect him to also be attuned to noticing, because he wrote about it two years earlier): I read it as, given unusual but not praiseworthy-in-itself effort expenditure, it makes sense to flag it.

[Meta] New moderation tools and moderation guidelines

Zack_M_Davis7d71

I'd rather judge on the standard of whether the outcome is good, rather than exclusively on the rules of behavior.

A key reason to favor behavioral rules over trying to directly optimize outcomes (even granting that enforcement can't be completely mechanized and there will always be some nonzero element of human judgement) is that act consequentialism doesn't interact well with game theory, particularly when one of the consequences involved is people's feelings.

If the popular kids in the cool kids' club don't like Goldstein and your only goal is to make sure that the popular kids feel comfortable, then clearly your optimal policy is to kick Goldstein out of the club. But if you have some other goal that you're trying to pursue with the club that the popular kids and Goldstein both have a stake in, then I think you do have to try to evaluate whether Goldstein "did anything wrong", rather than just checking that everyone feels comfortable. Just ensuring that everyone feels comfortable at all costs, without regard to the reasons why people feel uncomfortable or any notion that some reasons aren't legitimate grounds for intervention, amounts to relinquishing all control to anyone who feels uncomfortable when someone else doesn't behave exactly how they want.

Something I appreciate about the existing user ban functionality is that it is a rule-based mechanism. I have been persuaded by Achmiz and Dai's arguments that it's bad for our collective understanding that user bans prevent criticism, but at least it's a procedurally "fair" kind of badness that I can tolerate, not completely arbitrary tyranny. The impartiality really helps. Do you really want to throw away that scrap of legitimacy in the name of optimizing outcomes even harder? Why?

I think just trying to 'follow the rules' might not succeed at making everyone feel comfortable interacting with you

But I'm not trying to make everyone feel comfortable interacting with me. I'm trying to achieve shared maps that reflect the territory.

A big part of the reason some of my recent comments in this thread appeal to an inability or justified disinclination to convincingly pretend to not be judgmental is because your boss seems to disregard with prejudice Achmiz's denials that his comments are "intended to make people feel judged". In response to that, I'm "biting the bullet": saying, okay, let's grant that a commenter is judging someone; to what lengths must they go to conceal that, in order to prevent others from predictably feeling judged, given that people aren't idiots and can read subtext?

I think there's something much more fundamental at stake here, which is that an intellectual forum that's being held hostage to people's feelings is intrinsically hampered and can't be at the forefront of advancing the art of human rationality. If my post claims X, and a commenter says, "No, that's wrong, actually not-X because Y", it would be a non-sequitur for me to reply, "I'd prefer you engage with what I wrote with more curiosity and kindness." Curiosity and kindness are just not logically relevant to the claim! (If I think the commenter has misconstrued what I wrote, I could just say that.) It needs to be possible to discuss ideas without getting tone-policed to death. Once you start playing this game of litigating feelings and feelings about other people's feelings, there's no end to it. The only stable Schelling point that doesn't immediately dissolve into endless total war is to have rules and for everyone to take responsibility for their own feelings within the rules.

I don't think this is an unrealistic superhumanly high standard. As you've noticed, I am myself a pretty emotional person and tend to wear my heart on my sleeve. There are definitely times as recently as, um, yesterday, when I procrastinate checking this website because I'm scared that someone will have said something that will make me upset. In that sense, I think I do have some empathy for people who say that bad comments make them less likely to use the website. It's just that, ultimately, I think that my sensitivity and vulnerability is my problem. Censoring voices that other people are interested in hearing would be making it everyone else's problem.

[Meta] New moderation tools and moderation guidelines

Zack_M_Davis7d40

Was definitely not going to make an argument from authority, just trying to understand your world view.

Right. Sorry, I think I uncharitably interpreted "Do you think others agree?" as an implied "Who are you to disagree with others?", but you've earned more charity than that. (Or if it's odd to speak of "earning" charity, say that I unjustly misinterpreted it.)

the argument that persuaded me initially especially doesn't need to be good

Right. I tried to cover this earlier when I said "(a cleaned-up refinement of) my thought process" (emphasis added). When I wrote about eschewing "line[s] of reasoning other than the one that persuades me", it's persuades in the present tense because what matters is the justifactory structure of the belief, not the humdrum causal history.

you said 'thought', which maybe I should have criticized because it's not a thought. How annoyed you are by something isn't an intellectual position, it's a feeling. It's influenced by beliefs about the thing, but also by unrelated things

There's probably a crux somewhere near here. Your formulation of #4 seems bad because, indeed, my emotions shouldn't be directly relevant to an intellectual discussion of some topic. But I don't think that gives you license to say, "Ah, if emotions aren't relevant, therefore no harm is done by rewriting your comments to be nicer," because, as I've said, I think the nicewashing does end up distorting the content. The feelings are downstream of the beliefs and can't be changed arbitrarily.

Unnatural Categories Are Optimized for Deception

Zack_M_Davis7d20

Excellent question, thanks for commenting! (And for your patience.) The part of the original post that that Tweet is summarizing are the paragraphs after "Suppose I'm selling you some number of gold and silver bars [...]". As you observe, whether it's "lying" to use a category label depends on what the label is construed to "canonically" mean. The idea here is that, as far as signal processing goes, there's an isomorphism between "lying p% of the time" with respect to the maximally-informative categories, and choosing different categories that conceal information. So if the new categories aren't deceptive because the receiver knows about them, is lying therefore not deceptive if the speaker already has it "priced in" that the sender lies this-and-such proportion of the time? I discuss this problem further in "Maybe Lying Can't Exist?!" and "Comment on 'Deception is Cooperation'".

Zetetic explanation

Zack_M_Davis7d30

Sorry, I still don't think I get it. (I could guess, but I probably wouldn't get it right.) In the footnote of the comment you link, Pace said he was allocating 2 hours to the thread, and in a later comment, he said he'd spent 30 minutes so far. A little unconventional, but seems OK to me? (Everyone faces the problem of how to budget their limited time and attention; being transparent about how you're doing it shouldn't make it worse.)

Zetetic explanation

Zack_M_Davis7d30

I don't get it. What's insulting about someone disclosing how much time they spent writing something?

[Meta] New moderation tools and moderation guidelines

Zack_M_Davis9d11-17

and there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.

Great, so all you need to do is make a rule specifying what speech constitutes "retribution" or "counterpunishment" that you want to censor on those grounds.

Maybe the rule could be something like, "No complaining about being banned by a specific user (but commenting on your own shortform strictly about the substance of a post that you've been banned from does not itself constitute complaining about the ban)" or "No arguing against the existence on the user ban feature except in designated moderation threads (which get algorithmically deprioritized in the new Feed)."

It's your website! You have all the hard power! You can use the hard power to make the rules you want, and then the users of the website have a clear choice to either obey the rules or be banned from the site. Fine.

What I find hard to understand is why the mod team seems to think it's good for them to try to shape culture by means other than clear and explicit rules that could be neutrally enforced. Telling people to "stop optimizing in a fairly deep way" is not a rule because of how vague and potentially all-encompassing it is. Telling people to avoid "mak[ing] people feel judged or not" is not a rule because I don't have control over how other people feel.

"Don't tell people 'I'm judging you about X'" is a rule. I can do that.

What I can't do is convincingly pretend to be a person with a completely different personality such that people who are smart about subtext can't even guess from subtle details of my writing style that I might privately be judging them.

I mean, maybe I could if I tried very hard? But I have too much self-respect to try. If the mod team wants to force temperamentally judgemental people to convincingly pretend to be non-judgemental, that seems really crazy.

I know, the mods didn't say "We want temperamentally judgemental people to convincingly pretend to have a completely different personality" in those words; rather, Habryka said he wanted to "avoid a passive aggressive culture tak[ing] hold". I just don't see what the difference is supposed to be in practice.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments