I'm writing a book about epistemology. It's about The Problem of the Criterion, why it's important, and what it has to tell us about how we approach knowing the truth.
I've also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.
AI is much more impressive but not much more useful. They improved on many things they were explicitly optimised for (coding,
I feel like this dramatically understates what progress feels like for programmers.
It's hard to understand what a big deal 2025 was. Like if in 2024 my gestalt was "damn, AI is scary, good thing it hallucinates so much that it can't do much yet", in 2025 it was "holy shit, AI is scary useful!". AI really started to make big stride in usefulness in Feb/March of 2025 and it's just kept going.
I think the trailing indicators tell a different story, though. What they miss is that we're rapidly building products at lower unit operating costs that are going to start generating compounding returns soon. It's hard for me to justify this beyond saying I know what I and my friends are working on and things are gonna keep accelerating in 2026 because of it.
The experience of writing code is also dramatically transformed. A year ago if I wanted some code to do something it mostly meant I was going to sit at the keyboard and write code in my editor. Now it means sitting at my keyboard, writing a prompt, getting some code out, running it, iterating a couple times, and calling it a day, all with writing minimal code myself. It's basically the equivalent of going from writing assemble to a modern language like JavaScript in a single year, something that actually took us 40.
This will be missing in the stats, but there was some real reader fatigue on my end. It was a lot to keep up with, and I'm extremely glad we're back to a lower volume of posts daily!
I continue to be excited about this class of approaches. To explain why is roughly to give an argument for why I think self-other overlap is relevant to normative reasoning, so I will sketch that argument here:
But this sketch is easier explained than realized. We don't know exactly how humans come to care about others, so we don't know how to instrument this in AI. We also know that human care for others is not perfect because evil exists (in that humans sometimes intentionally violate norms with the intent to harm others), so just getting AI that cares is not clearly a full solution to alignment. But, to the extent that humans are aligned, it seems to be because they care about what others care about, and this research is an important step in the direction of building AI that cares about other agents, like humans.
I continue to really like this post. I hear people referencing the concept of "hostile telepaths" in conversation sometimes, and they've done it enough that I forgot it came from this post! It's a useful handle for a concept that, in particular, can be difficult for the type of person who is likely to read LessWrong to deal with because they themselves lack strong, detailed models of how others think, and so while they can feel the existence of hostile telepaths, lack a theory of what's going on (or at least did until this post explained it).
Similar in usefulness to Aella's writing about frame control.
I like this post, but I'm not sure how well it's aged. I don't hear people talking about being "triggered" so much anymore, and while I like this post's general point, its framing seems less centrally useful now that we are less inundated with "trigger warnings". Maybe that's because this post represents something of a moment as people began to realize that protecting themselves from being "triggered" was not necessarily a good thing, and we've as a cultural naturally shifted away from protecting everyone from anything bad ever happening to them. So while I like it, I have a hard time recommending it for inclusion.
I wish this had actually been posted with the full text on LessWrong, but I stand by its core claim that "boundaries" are a confused category as commonly understood. I don't have much to add other than Chris did an excellent job of explicating the issues with the usual understanding of "boundaries" and helps to reform a clearer understanding of how people relate and why "boundaries" can often lead to relational dysfunction.
This post still stands out to me as making an important and straightforward point about observer dependence of knowledge that is still, in my view, under appreciated (enough so that I wrote a book about it and related epistemological ideas!). I continue to think this is quite important for understanding AI, and in particular addressing interpretability concerns as they relate to safety, since lacking a general theory of why and how generalization happens, we may risk mistakes in building aligned AIs if they categorize the world in usual ways that we don't anticipate or understand.
While I lack the neuroscience knowledge to directly assess how correct the model in this post is, I do have a lot of meditation experience (7k+ hours), and what I can say is that it generally comports with my experiences of what happens when I meditate. I see this post as an important step towards developing a general theory of how meditation works, and while it's not a complete theory, it makes useful progress in exploring these ideas so that we may later develop a complete one.
I like this post because it delivers a simple, clear explanation for why seemingly prosaic AI risks might be just as dangerous as more interesting ones. It's not a complicated argument to make, but the metaphor of the rock and knife cuts through many possible objectives by showing that sometimes simple things really are just as dangerous as more advanced ones. And having the proof by example, we have to work harder to object that prosaic AI is not that dangerous.
done