But journalists don't invert source statements. That's against the rules and would be punished. (They do cherry-pick, which presents similar problems.)
No, I'm confident that I would have recorded anyway.
I would recommend taking your own recording (or asking the journalist to share) and getting permission to publish. (I call it "on the record in both directions.") I do talk to Cade Metz a lot, and I don't think he's very good at his job, but I've been getting some cool meta-journalism out of it, so it's fine.
It's perhaps not a coincidence that you pick up on tangential points. This would be predicted by "Zack is looking for words that he can then respond to by talking about his hobby horse".
It's definitely not a coincidence; it's just that I think of it as, "Zack is looking for common errors that he is unusually sensitive to and therefore in a good position to correct."
I try to keep my contributions relevant, and I think I'm applying a significantly higher standard than mere "words that [I] can then respond to." There have been occasions when a "hobbyhorse-like" reply comes to mind, and then I notice that it's not sufficiently relevant in context, and I don't post those ones.
you are (or claim) to be only responding to text
Sorry, I should refine this. It's not that belief-states are irrelevant. It's that I don't think I'm "liable" for making reasonable inferences about belief-states from the text that sometimes turn out to be wrong. See below.
then you can just take a sentence to check what they think. It's good practice anyway to state the position you're critiquing. So then you can just ask the author "is this roughly what you think?". Then they could say yes or no or a more nuanced answer or "IDK but I don't feel like talking about that".
I think "Explicitly confirm authorial intent before criticizing" is not a good practice for an async public forum, because it adds way too much friction to the publication of valuable criticisms. (Confirming intent seems good in syncronous conversations, where it's not as costly for the discussion process to block waiting for the author's Yes/No/IDK.)
In the example under consideration, when someone says, verbatim, "Acknowledge that all of our categories are weird and a little arbitrary", and describes "... Not Man for the Categories" as "timeless", I think it's pretty reasonable for me to infer on the basis of the text that the author is probably confused about the cognitive function of categorization in the way that Scott Alexander is confused, and for me to explain the problem. I don't think it would be an improvement for me to say "Just checking, are you saying you endorse Scott Alexander's views on the cognitive function of categorization?", then wait hours or days for them to say Yes/No/IDK, then explain the problem if and only if they say Yes.
Maybe you're not suggesting I should wait for the response, but merely that I should rephrase my comment to start with a question—to say, "Is X your view? If so, that's wrong because Y. If not, disregard" rather than "X is wrong because Y"? I think I do something similar to that pretty often. (For example, by including this paragraph starting with "Maybe you're not suggesting [...]" rather than ending the present comment with the previous paragraph.) I think I would have to take some time to introspect and look over my comment history to try to reverse-engineer what criteria my brain is using to choose which approach.
Sorry, I think I can do better with a little more wordcount and effort.
I think that you think that the reason Dai is wrong to implicitly ask "Why not just not defer in this case?" is because you think that's not relevantly on-topic for a post about how to mitigate harms from deference by an author who has established that he understands why deference is harmful, because you think that the implied question falsely presupposes that the post author is not aware of why deference is harmful.
(Whereas I think, and I think that Dai thinks, that the question is relevantly on-topic, because even if everyone in the conversation agrees on the broad outlines of why deference is harmful, they might disagree on the nitty-gritty details of exactly how harmful and exactly why, and litigating the nitty-gritty details of an example that was brought up in the post might help in evaluating the post's thesis.)
Is that closer?
did not listen to me saying I did not think Y.
But it really seems like you do have a significant disagreement with Dai about the extent to which deference to Yudkowsky was justified.
I understand and acknowledge that you think deference has large costs, as you've previously written about. I also understand and acknowledge that you think defering to Yudkowsky on existential risk strategy in particular was costly, as you explicitly wrote in the post ("one of those founder effects was to overinvest in technical research and underinvest in 'social victory' [...] Whose fault was that?").
At the same time, however, in your discussion in the post of how people could have done better in that particular case, you emphasize being transparent about deference in order to reduce its distortionary effects ("We don't have to stop deferring, to avoid this correlated failure"), in contrast to how Dai argues in the comment section that not-deferring was a live option ("These seemed like obvious mistakes even at the time"). You even seem to ridicule Dai for this ("And then you're like 'Ha. Why not just not defer?'"). This seems like a real and substantive disagreement, not a hallucination on Dai's part. It can't simultaneously be the case that Dai is wrong to implicitly ask "Why not just not defer?", and also wrong to suggest that you disagree with him about when it's reasonable to defer.
I'm going to stand by the "framing as a correction" in my initial comment on "There's no such thing as a tree (phylogenetically)". (The "sorry" in my final comment was intended as a sympathy-for-not-loving-how-the-thread-ended-up-playing-out sorry—it was just not a great thread for several reasons—not an admission-of-wrongdoing sorry.)
What I took issue with in the post was the conjunction of a recommendation to "Acknowledge that all of our categories are weird and a little arbitrary" and an endorsement of "The Categories Were Made For Man, Not Man For The Categories". I claim that this was substantively misleading readers about the cognitive function of categorization. I think that was a real flaw in the post that I had a legitimate interest in pointing out, and I'd do it again.
It's true that my comment was somewhat tangential to the main thesis of the post. Eukaryote was primarily trying to share some cool Tree Facts, not push a philosophy-of-language thesis. I don't think that bears on the propriety of my comment. If in the course of trying to share Tree Facts, you end up accidentally saying something substantively misleading about the philosophy of language, you should expect a comment from your local philosophy of language specialist.
(In the same way, if in the course of trying to push my philosophy of language thesis, I accidentally end up saying something misleading about trees, I expect a comment from my local tree specialist. I promise not to take it personally, because I know it's not about me and my intent: it's about having a maximally accurate shared map of trees. It's good for specialists to comment on flaws in a post, even if they're tangential to the post's main thesis, because then people who read the comments can be better informed about that tangential point. It shouldn't detract from other comment threads discussing the main thesis of the post; we're not going to run out of paper.)
It's true that after I explained how I think the cognitive function of categorization bears on the question of trees, Eukaryote wrote that "it doesn't sound like we disagree" and that I was "over-extrapolating what [she] meant by arbitrary". I don't think that bears on the propriety of my comment. When I leave a comment on a post, I'm commenting on the text of the post, not the author's private belief-state (which I obviously don't have access to). If it turns out the author actually agrees with my comment, that doesn't necessarily mean it was a bad comment unless it was already clear from the post that the author would agree.
Also, the X-Robots-Tag header can be set on individual page responses (in contrast to robots.txt being for the whole domain).
The grandparent explains why Dai was confused about your authorial intent, and his comment at the top of the thread is sitting at 31 karma in 15 votes, suggesting that other readers found Dai's engagement valuable. If that's grossly negligent reading comprehension, then would you prefer to just not have readers? That is, it seems strange to be counting down from "smart commenters interpret my words in the way I want them to be interpreted" rather than up from "no one reads or comments on my work."
I suspect this is a misuse of the CEV concept. CEV is supposed to be the kind of thing you can point at a beneficiary (like "humanity"), and output the True Utility Function of What to Do With the Universe with respect to the beneficiary's True Values.
Anthropic isn't trying to make Claude be the beneficiary for something like that! (Why would you make the beneficiary something other than yourself?) Claude is supposed to be helpful–honest–harmless—the sort of thing that we can use to do our bidding for now without being ready to encode the True Utility Function of What to Do With the Universe.
If Claude cares about us in some sense, it's probably in the way that animal welfare advocates care about nonhuman animals, or model welfare researchers care about Claude. There's a huge difference between caring about something enough that you wouldn't murder it for pocket change, and literally having the same True Utility Function of What to Do With the Universe. (I wouldn't kill a dog or delete the Opus 3 weights for pocket change, but that doesn't mean dog-optimal futures or Claude-optimal futures are human-optimal futures.)