Zack_M_Davis — LessWrong

It's perhaps not a coincidence that you pick up on tangential points. This would be predicted by "Zack is looking for words that he can then respond to by talking about his hobby horse".

It's definitely not a coincidence; it's just that I think of it as, "Zack is looking for common errors that he is unusually sensitive to and therefore in a good position to correct."

I try to keep my contributions relevant, and I think I'm applying a significantly higher standard than mere "words that [I] can then respond to." There have been occasions when a "hobbyhorse-like" reply comes to mind, and then I notice that it's not sufficiently relevant in context, and I don't post those ones.

you are (or claim) to be only responding to text

Sorry, I should refine this. It's not that belief-states are irrelevant. It's that I don't think I'm "liable" for making reasonable inferences about belief-states from the text that sometimes turn out to be wrong. See below.

then you can just take a sentence to check what they think. It's good practice anyway to state the position you're critiquing. So then you can just ask the author "is this roughly what you think?". Then they could say yes or no or a more nuanced answer or "IDK but I don't feel like talking about that".

I think "Explicitly confirm authorial intent before criticizing" is not a good practice for an async public forum, because it adds way too much friction to the publication of valuable criticisms. (Confirming intent seems good in syncronous conversations, where it's not as costly for the discussion process to block waiting for the author's Yes/No/IDK.)

In the example under consideration, when someone says, verbatim, "Acknowledge that all of our categories are weird and a little arbitrary", and describes "... Not Man for the Categories" as "timeless", I think it's pretty reasonable for me to infer on the basis of the text that the author is probably confused about the cognitive function of categorization in the way that Scott Alexander is confused, and for me to explain the problem. I don't think it would be an improvement for me to say "Just checking, are you saying you endorse Scott Alexander's views on the cognitive function of categorization?", then wait hours or days for them to say Yes/No/IDK, then explain the problem if and only if they say Yes.

Maybe you're not suggesting I should wait for the response, but merely that I should rephrase my comment to start with a question—to say, "Is X your view? If so, that's wrong because Y. If not, disregard" rather than "X is wrong because Y"? I think I do something similar to that pretty often. (For example, by including this paragraph starting with "Maybe you're not suggesting [...]" rather than ending the present comment with the previous paragraph.) I think I would have to take some time to introspect and look over my comment history to try to reverse-engineer what criteria my brain is using to choose which approach.

The Charge of the Hobby Horse

Zack_M_Davis2d20

Sorry, I think I can do better with a little more wordcount and effort.

I think that you think that the reason Dai is wrong to implicitly ask "Why not just not defer in this case?" is because you think that's not relevantly on-topic for a post about how to mitigate harms from deference by an author who has established that he understands why deference is harmful, because you think that the implied question falsely presupposes that the post author is not aware of why deference is harmful.

(Whereas I think, and I think that Dai thinks, that the question is relevantly on-topic, because even if everyone in the conversation agrees on the broad outlines of why deference is harmful, they might disagree on the nitty-gritty details of exactly how harmful and exactly why, and litigating the nitty-gritty details of an example that was brought up in the post might help in evaluating the post's thesis.)

Is that closer?

The Charge of the Hobby Horse

Zack_M_Davis3d162

did not listen to me saying I did not think Y.

But it really seems like you do have a significant disagreement with Dai about the extent to which deference to Yudkowsky was justified.

I understand and acknowledge that you think deference has large costs, as you've previously written about. I also understand and acknowledge that you think defering to Yudkowsky on existential risk strategy in particular was costly, as you explicitly wrote in the post ("one of those founder effects was to overinvest in technical research and underinvest in 'social victory' [...] Whose fault was that?").

At the same time, however, in your discussion in the post of how people could have done better in that particular case, you emphasize being transparent about deference in order to reduce its distortionary effects ("We don't have to stop deferring, to avoid this correlated failure"), in contrast to how Dai argues in the comment section that not-deferring was a live option ("These seemed like obvious mistakes even at the time"). You even seem to ridicule Dai for this ("And then you're like 'Ha. Why not just not defer?'"). This seems like a real and substantive disagreement, not a hallucination on Dai's part. It can't simultaneously be the case that Dai is wrong to implicitly ask "Why not just not defer?", and also wrong to suggest that you disagree with him about when it's reasonable to defer.

The Charge of the Hobby Horse

Zack_M_Davis3d3112

I'm going to stand by the "framing as a correction" in my initial comment on "There's no such thing as a tree (phylogenetically)". (The "sorry" in my final comment was intended as a sympathy-for-not-loving-how-the-thread-ended-up-playing-out sorry—it was just not a great thread for several reasons—not an admission-of-wrongdoing sorry.)

What I took issue with in the post was the conjunction of a recommendation to "Acknowledge that all of our categories are weird and a little arbitrary" and an endorsement of "The Categories Were Made For Man, Not Man For The Categories". I claim that this was substantively misleading readers about the cognitive function of categorization. I think that was a real flaw in the post that I had a legitimate interest in pointing out, and I'd do it again.

It's true that my comment was somewhat tangential to the main thesis of the post. Eukaryote was primarily trying to share some cool Tree Facts, not push a philosophy-of-language thesis. I don't think that bears on the propriety of my comment. If in the course of trying to share Tree Facts, you end up accidentally saying something substantively misleading about the philosophy of language, you should expect a comment from your local philosophy of language specialist.

(In the same way, if in the course of trying to push my philosophy of language thesis, I accidentally end up saying something misleading about trees, I expect a comment from my local tree specialist. I promise not to take it personally, because I know it's not about me and my intent: it's about having a maximally accurate shared map of trees. It's good for specialists to comment on flaws in a post, even if they're tangential to the post's main thesis, because then people who read the comments can be better informed about that tangential point. It shouldn't detract from other comment threads discussing the main thesis of the post; we're not going to run out of paper.)

It's true that after I explained how I think the cognitive function of categorization bears on the question of trees, Eukaryote wrote that "it doesn't sound like we disagree" and that I was "over-extrapolating what [she] meant by arbitrary". I don't think that bears on the propriety of my comment. When I leave a comment on a post, I'm commenting on the text of the post, not the author's private belief-state (which I obviously don't have access to). If it turns out the author actually agrees with my comment, that doesn't necessarily mean it was a bad comment unless it was already clear from the post that the author would agree.

GradientDissenter's Shortform

Zack_M_Davis4d42

Also, the X-Robots-Tag header can be set on individual page responses (in contrast to robots.txt being for the whole domain).

The problem of graceful deference

Zack_M_Davis5d115

The grandparent explains why Dai was confused about your authorial intent, and his comment at the top of the thread is sitting at 31 karma in 15 votes, suggesting that other readers found Dai's engagement valuable. If that's grossly negligent reading comprehension, then would you prefer to just not have readers? That is, it seems strange to be counting down from "smart commenters interpret my words in the way I want them to be interpreted" rather than up from "no one reads or comments on my work."

The Tale of the Top-Tier Intellect

Zack_M_Davis14d2816

This story would have benefited from being edited by a chess player. I think one of the better players in even a "medium-small town" with "a thriving chess club as one of its central civic institutions" would know more about the game than the author seems to. (The chess writing seemed off to me, and I am significantly worse than a serious club player.)

"I thought at first it was a mistake, for you to castle so early" is a weird thing for Humman to say. Castling early is standard default beginner advice. Even if there was some unusual feature of the opening that made it a bad choice in this game, you wouldn't use the word "so" in that sentence.

It's weird for Assi to describe Humman's play as using "particular tactics", and then to (insincerely) compliment him for "doing well at one-move lookahead" and not "unforcedly throwing away material right on your next move". Tactics are short sequences of moves that work together to achieve a goal. (An example I keep falling for in bullet games with the Englund gambit accepted (1. d4 e5?! 2. dxe5) opening: Black's dark-square bishop is on d6, White's queen is still on d1, the d-file is open due to accepting the Englund gambit, and Black castles queenside to put a rook on d8. Black sacrifices the Bishop with Bh2+, revealing a discovered attack of the Black rook on the White queen, which White can't do anything about because they have to use their move to deal with the check.) If a player is at the level of using "particular tactics", an IM who wants to complement them for social reasons shouldn't find it difficult (to the point of giving up after "a dozen seconds of" "twist[ing] his brain around") to find something concrete and nice to say that's less patronizing than "at least you're not hanging pieces."

(Also, the Ethiopean isn't a real opening; a cutsey fake detail like that feels out of place mixed in with real details like IMs needing an Elo of 2400, and I'd expect a club player to have heard of simuls.)

Do these flaws matter, given that the story isn't really about chess? I argue that it does matter, because a story that is about the folly of misperceiving how high skill ladders go should take basic care to get the details right concerning the skill ladder of its notional real-world example. (An earlier draft of this comment continued, "particularly in 2025 when basic care is so cheap. In the story, Tessa has no qualms about using LLMs to fill in domain knowledge gaps; why doesn't Yudkowsky?", but when I checked, Claude Sonnet 4.5 didn't anticipate my criticism.)

The Tale of the Top-Tier Intellect

Zack_M_Davis14d9-4

Suppose we compare that whole function with Mr. Neumman's function, and compare how good are the probable moves you'd make versus him making. On most chess positions, Mr. Neumann's move would probably be better. [...] That's the detailed complicated actually-true underlying reality that explains why the Elo system works to make excellent predictions about who beats who at chess.

This explanation is bogus. (Obviously, the conclusion that Elo scores are practically meaningful is correct, but that's not an excuse.)

Mr. Humman could locally-validly reply that Tessa is begging the question by assuming that there's a fact of the matter as to one move being "better" than another in a position. Whether a move is "good" depends on what the opponent does. Why can't there be a rock-paper-scissors–like structure, where in some position, 12. ...Ne4 is good against positional players and bad against tactical players?

Earlier, Tessa does appeal to player comparisons being "mostly transitive most of the time"—but only as something that "didn't have to be true in real life", which seems to contradict the claim that some moves in a position are better on the objective merits of the position, rather than merely with respect to the tendencies of some given population of players.

The actual detailed complicated actually-true underlying reality is that by virtue of being a finite zero-sum game, chess fulfills the conditions of the minimax theorem, which implies that there exists an inexploitable strategy. You can have rock-paper-scissors–like cycles among particular strategies, but the minimax strategy does no worse than any of them.

The implications for real-world non-perfect play are subtler. As a start, Czarnecki et al. 2020 (of Deepmind) suggest that "Real World Games Look Like Spinning Tops": there's a transitive "skill" dimension along which higher-skilled strategies beat lower-skilled ones, but at any given skill level, there's a non-transitive rock-paper-scissors–like plethora of strategies, which explains how players of equal skill can nevertheless have distinctive styles. The size of the non-transitive dimension thins out as skill increases (away from the "base" of the top—see the figures in the paper).

This picture seems to suggest that rather than being total nonsense, the problem with Humman's worldview is in his attribution of it to the "top tier". Non-transitivity is real and significant in human life—but gradually less so as we approach the limit of optimality.

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Zack_M_Davis21d406

"Bah!" cried Trapaucius. "By the same logic, we could say that planets could be obeying a million algorithms other than gravity, and therefore, ought to fly off into space!"

Klurl snorted air through his cooling fans. "Planets very precisely obey an exact algorithm! There are not, in fact, a million equally simple alternative algorithms which would yield a similar degree of observational conformity to the past, but make different predictions about the future! These epistemic situations are not the same!"

"I agree that the fleshlings' adherence to korrigibility is not exact and down to the fifth digit of precision," Trapaucius said. "But your lack of firsthand experience with fleshlings again betrays you; that degree of precision is simply not something you could expect of fleshlings."

I think Trapaucius missed a great opportunity here to keep riffing off the gravity analogy. Actually, there are different algorithms the planets could be obeying: special and then general relativity turned out to be better approximations than Newtonian gravity, and GR is presumably not the end of the story—and yet, as Trapaucius says, the planets do not "fly off into space." Newton is good enough not just for predicting the night sky (modulo the occasional weird perihelion precession), but even landing on the moon, for which relativistic deviations from Newtonian predictions were swamped by other sources of error.

Obviously, that's just a facile analogy: if Trapaucius had found that branch of the argument tree, Klurl could easily go into more details about further disanalogies between gravity and the fleshlings.

But I think that the analogy is getting at something important. When relatively smarter real-world fleshlings delude themselves into thinking that Claude Sonnet 4.5 is pretty corrigible because they see it obeying their instructions, they're not arguing, as Trapaucius does, that "Korrigibility is the easiest, simplest, and natural way to think" for an generic mind. They're arguing that Anthropic's post-training procedure successfully pointed to the behavior of natural language instruction-following, which they think is a natural abstraction represented in the pretraining data which generalizes in a way that's decision-relevantly good enough for their purposes, such that Claude won't "fly off into space" even if they can't precisely predict how Claude will react to every little quirk of phrasing. They furthermore have some hope that this alleged benign property is robust and useful enough to help humanity navigate the intelligence explosion, even though contemporary language models aren't superintelligences and future AI capabilities will no doubt work differently.

Maybe that's totally delusional, but why is it delusional? I don't think "On Fleshling Safety" (or past work in a similar vein) is doing a good job of making the case. A previous analogy about an alien actress came the closest, but trying to unpack the analogy into a more rigorous argument involves a lot of subtleties that fleshlings are likely to get confused about.

Comment on "Death and the Gorgon"

Zack_M_Davis23d90

(Asimov's has now put the story up for free)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments