That is, "flatness" in the loss landscape is about how many nearby-in-parameterspace models achieve similar loss, and you can get that by error-correction, not just by using fewer parameters (such that it takes fewer bits of evidence to find that setting)? Cool!
It seems that using SLT one could give a generally correct treatment of MDL. However, until such results are established
It looks like the author contributed to achieving this in October 2025's "Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory"?
(Self-review.) I was having fun with a rhetorical device in this one, which didn't land for all readers; I guess that's how it goes sometimes.
To try to explain what I was trying to do here in plainer words: I feel like a lot of people who read this website but don't read textbooks walk away with an intuitive picture of deep learning as being like evolving an animal to do your bidding, which is scary because evolution is not controllable.
That was strikingly not the intuitive picture I got from reading standard academic tutorial material on the topic in late 2023 and early 2024. As a lifetime Less Wrong reader reading Simon Prince's Understanding Deep Learning (2023), one gets the sense that Prince isn't really thinking about "AI" as people on this website understand it, even if the term gets used.
It's a computational statistics book. Prince is writing about a class of techniques for fitting statistical models to data. The fact that statistics happens to have some impressive applications doesn't make it about summoning a little animal.
The difference in views seemed worth writing about. I was inspired by some Tweets by Charles Foster:
At some point I switched from seeing neural networks as arcane devices to seeing them as moldable variants of "boring" building blocks from signal processing, feedback control, associative learning, & functional programming. Like some kind of function approximation plastic/epoxy
Tbh "neural network" is maybe too suggestive of a term. Like it anchors people on vague intuitions about emergence/agency rather than on mechanistic thinking. Call them "nonlinear coupling networks" or "plastic basis functions" or go back to "parallel distributed processors"
Sorry, I worry that I'm still not succeeding at conveying the intuition: probably some people reading this review comment are shaking their head, disappointed that I seem to be trotting out the AI skeptic's ignorant "It's just math; math can't hurt you" canard. So to be clear (and I think I was clear enough about this in the post; see, e.g., the final paragraph), I absolutely agree that math can kill you, obviously. I'm just saying that after I read the math, the summoning-a-little-animal mental image didn't seem faithful to the math: you should be thinking about how the model's outputs interpolate the training data, not how the little animal's behavior unpredictably fails to reflect some putative utility function that "outer" training failed to instill.
... I'm still not communicating the thing, am I? You know what? Forget it. Don't read this review and don't read this post. Read Prince 2023 or Bishop and Bishop 2024. Read textbooks!
(Self-review.) I think this post was underappreciated. At the time, I didn't want to emphasize the social–historical angle because it seemed like too much of a distraction from the substantive object-level point, but I think this post is pointing at a critical failure in how the so-called "rationalist" movement has developed over time.
At the end of the post, I quote Steven Kaas writing in 2008: "if you're interested in producing truth, you will fix your opponents' arguments for them." I see this kind of insight as at the core of what made the Sequences so valuable: a clear articulation of how a monomaniacal focus on the truth implies counterintuitive social behavior. Normatively, it shouldn't be unusual for people to volunteer novel arguments that support their interlocutor's belief—that's just something you'd do naturally in the course of trying to figure out the right answer—but it is unusual, because most disagreements are actually disguised conflicts.
And yet less than a decade later (as documented by Rob Bensinger in the post that this post responds to), we see Eliezer Yudkowsky proclaiming that "Eliezer and Holden are both on record as saying that 'steelmanning' people is bad and you should stop doing it"—a complete inversion of Kaas's advice! (Kaas didn't use the specific jargon term "steelmanning", but that's obviously inessential.)
For clarity, I want to recap that one more time in fewer words, to distill the essence of the inversion—
In 2008, the community wisdom was that fixing your interlocutor's arguments for them (what was not yet called "steelmanning") was a good thing. The warrant cited for this advice was that it's something you do "if you're interested in producing truth".
In 2017, the community wisdom was that fixing your interlocutor's arguments for them (by then known as "steelmanning") was "bad and you should stop doing it" (!!). The warrant cited for this advice was that "Eliezer and Holden" (who?) "are both on record as saying" it.
Why? What changed? How could something that was considered obviously good in 2008, be considered bad in 2017? Did no one else notice? Are we not supposed to notice? I have my own tentative theories, but I'm interested in what Raymond Arnold and Ruby Bloom think (relavant to the topic of "[keeping] alive the OG vision of improving human rationality").
I was recently struck by this passage in a memoir by Douglass Hubbard, describing in adjacent paragraphs his ability to afford a domestic servant but not furniture (!) in 1970s Rhodesia:
Between the two of us, we scraped together enough cash to buy a Rhodesian-manufactured refrigerator and electric range. There wasn't much left from our monthly paychecks, so I decided to see if a furniture store might extend us at least short-term credit on a bed. The next day, one of the senior patrol officers in the Enqueries section sold us a small table with two straight-back chairs. Flushed with success at this budgetary win, I walked into central Salisbury to a furniture store and spoke to management about the credit and finance plans they were advertising in their display window. Knowing exactly what a single-bar patrol officer made, the manager declined my application. I was offended. Ultimately, my bank extended modest overdraft facilities to me, but only after Ormonde Power acted as guarantor.
We moved in with our table, two chairs, and a bed—and with some basic crockery acquired at a trade store. My issue uniforms took up an entire hall closet. We were soon joined by an irascible old Shona domestic named Enoch. A friend of the Cullingworth's servant, Peter, Enoch was said to run a tight ship and understood how to launder and maintain uniforms. After long days at work, we could expect to arrive home to a clean house.
if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it'd be a pretty close call (I'd probably pick Claude, but it depends on the details of the setup)
I suspect this is a misuse of the CEV concept. CEV is supposed to be the kind of thing you can point at a beneficiary (like "humanity"), and output the True Utility Function of What to Do With the Universe with respect to the beneficiary's True Values.
Anthropic isn't trying to make Claude be the beneficiary for something like that! (Why would you make the beneficiary something other than yourself?) Claude is supposed to be helpful–honest–harmless—the sort of thing that we can use to do our bidding for now without being ready to encode the True Utility Function of What to Do With the Universe.
If Claude cares about us in some sense, it's probably in the way that animal welfare advocates care about nonhuman animals, or model welfare researchers care about Claude. There's a huge difference between caring about something enough that you wouldn't murder it for pocket change, and literally having the same True Utility Function of What to Do With the Universe. (I wouldn't kill a dog or delete the Opus 3 weights for pocket change, but that doesn't mean dog-optimal futures or Claude-optimal futures are human-optimal futures.)
But journalists don't invert source statements. That's against the rules and would be punished. (They do cherry-pick, which presents similar problems.)
No, I'm confident that I would have recorded anyway.
I would recommend taking your own recording (or asking the journalist to share) and getting permission to publish. (I call it "on the record in both directions.") I do talk to Cade Metz a lot, and I don't think he's very good at his job, but I've been getting some cool meta-journalism out of it, so it's fine.
It's perhaps not a coincidence that you pick up on tangential points. This would be predicted by "Zack is looking for words that he can then respond to by talking about his hobby horse".
It's definitely not a coincidence; it's just that I think of it as, "Zack is looking for common errors that he is unusually sensitive to and therefore in a good position to correct."
I try to keep my contributions relevant, and I think I'm applying a significantly higher standard than mere "words that [I] can then respond to." There have been occasions when a "hobbyhorse-like" reply comes to mind, and then I notice that it's not sufficiently relevant in context, and I don't post those ones.
you are (or claim) to be only responding to text
Sorry, I should refine this. It's not that belief-states are irrelevant. It's that I don't think I'm "liable" for making reasonable inferences about belief-states from the text that sometimes turn out to be wrong. See below.
then you can just take a sentence to check what they think. It's good practice anyway to state the position you're critiquing. So then you can just ask the author "is this roughly what you think?". Then they could say yes or no or a more nuanced answer or "IDK but I don't feel like talking about that".
I think "Explicitly confirm authorial intent before criticizing" is not a good practice for an async public forum, because it adds way too much friction to the publication of valuable criticisms. (Confirming intent seems good in syncronous conversations, where it's not as costly for the discussion process to block waiting for the author's Yes/No/IDK.)
In the example under consideration, when someone says, verbatim, "Acknowledge that all of our categories are weird and a little arbitrary", and describes "... Not Man for the Categories" as "timeless", I think it's pretty reasonable for me to infer on the basis of the text that the author is probably confused about the cognitive function of categorization in the way that Scott Alexander is confused, and for me to explain the problem. I don't think it would be an improvement for me to say "Just checking, are you saying you endorse Scott Alexander's views on the cognitive function of categorization?", then wait hours or days for them to say Yes/No/IDK, then explain the problem if and only if they say Yes.
Maybe you're not suggesting I should wait for the response, but merely that I should rephrase my comment to start with a question—to say, "Is X your view? If so, that's wrong because Y. If not, disregard" rather than "X is wrong because Y"? I think I do something similar to that pretty often. (For example, by including this paragraph starting with "Maybe you're not suggesting [...]" rather than ending the present comment with the previous paragraph.) I think I would have to take some time to introspect and look over my comment history to try to reverse-engineer what criteria my brain is using to choose which approach.
This post was a useful source of intuition when I was reading about singular learning theory the other week (in order to pitch it to an algebraic geometer of my acquaintance along with gifting her a copy of If Anyone Builds It), but I feel like it "buries the lede" for why SLT is cool. (I'm way more excited about "this generalizes minimum description length to neural networks!" than "we could do developmental interpretability maybe." De gustibus?)