I'm writing a book about epistemology. It's about The Problem of the Criterion, why it's important, and what it has to tell us about how we approach knowing the truth.
I've also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.
I continue to be excited about this class of approaches. To explain why is roughly to give an argument for why I think self-other overlap is relevant to normative reasoning, so I will sketch that argument here:
But this sketch is easier explained than realized. We don't know exactly how humans come to care about others, so we don't know how to instrument this in AI. We also know that human care for others is not perfect because evil exists (in that humans sometimes intentionally violate norms with the intent to harm others), so just getting AI that cares is not clearly a full solution to alignment. But, to the extent that humans are aligned, it seems to be because they care about what others care about, and this research is an important step in the direction of building AI that cares about other agents, like humans.
I continue to really like this post. I hear people referencing the concept of "hostile telepaths" in conversation sometimes, and they've done it enough that I forgot it came from this post! It's a useful handle for a concept that, in particular, can be difficult for the type of person who is likely to read LessWrong to deal with because they themselves lack strong, detailed models of how others think, and so while they can feel the existence of hostile telepaths, lack a theory of what's going on (or at least did until this post explained it).
Similar in usefulness to Aella's writing about frame control.
I like this post, but I'm not sure how well it's aged. I don't hear people talking about being "triggered" so much anymore, and while I like this post's general point, its framing seems less centrally useful now that we are less inundated with "trigger warnings". Maybe that's because this post represents something of a moment as people began to realize that protecting themselves from being "triggered" was not necessarily a good thing, and we've as a cultural naturally shifted away from protecting everyone from anything bad ever happening to them. So while I like it, I have a hard time recommending it for inclusion.
I wish this had actually been posted with the full text on LessWrong, but I stand by its core claim that "boundaries" are a confused category as commonly understood. I don't have much to add other than Chris did an excellent job of explicating the issues with the usual understanding of "boundaries" and helps to reform a clearer understanding of how people relate and why "boundaries" can often lead to relational dysfunction.
This post still stands out to me as making an important and straightforward point about observer dependence of knowledge that is still, in my view, under appreciated (enough so that I wrote a book about it and related epistemological ideas!). I continue to think this is quite important for understanding AI, and in particular addressing interpretability concerns as they relate to safety, since lacking a general theory of why and how generalization happens, we may risk mistakes in building aligned AIs if they categorize the world in usual ways that we don't anticipate or understand.
While I lack the neuroscience knowledge to directly assess how correct the model in this post is, I do have a lot of meditation experience (7k+ hours), and what I can say is that it generally comports with my experiences of what happens when I meditate. I see this post as an important step towards developing a general theory of how meditation works, and while it's not a complete theory, it makes useful progress in exploring these ideas so that we may later develop a complete one.
I like this post because it delivers a simple, clear explanation for why seemingly prosaic AI risks might be just as dangerous as more interesting ones. It's not a complicated argument to make, but the metaphor of the rock and knife cuts through many possible objectives by showing that sometimes simple things really are just as dangerous as more advanced ones. And having the proof by example, we have to work harder to object that prosaic AI is not that dangerous.
It's perhaps overdetermined that I liked this post; it's about how some of my favorite topics are connected: rationality, religion, and predictive processing.
That said, I think it's good at explaining those connections in a way that's clear to readers (much clearer than I'm often able to achieve). My recollection is that the combination of this post and talking with zhukeepa at LessOnline was enough to push me in the direction of later advocating that rationalists should be more religious.
This post gets filed in my brain under "the world contains a surprising amount of detail" and contains an attached note stating "and those details are connected in surprising ways because the details of the world are densely distributed in concept space".
It's one thing to understand, in theory, that the world is deeply interconnected. It's another to actually see those connections feel how unexpected they often are. This post is useful evidence as part of a bundle of similar facts to build up a picture about how the world fits together, and I wish there were more posts like it.
This will be missing in the stats, but there was some real reader fatigue on my end. It was a lot to keep up with, and I'm extremely glad we're back to a lower volume of posts daily!