A lot of “red line” talk assumed that a capability shows up, everyone notices, and something changes. We keep seeing the opposite; capability arrives, and we get an argument about definitions after deployment, after it should be clear that we're well over the line. We’ve Already Crossed The Lines! Karl...
The modern world is incredibly insecure along a wide variety of dimensions - because it’s not a problem. Usually. No-one is trying to exploit the security of your email server, most of the time, so it’s fine if it is unpatched. No-one is trying to hack the internet alarm clock...
There’s a strong argument that humans should stop trying to build more capable AI systems, or at least slow down progress. The risks are plausibly large but unclear, and we’d prefer not to die. But the roadmaps of the companies pursuing these systems envision increasingly agentic AI systems taking over...
In the previous two posts (first, second) we laid out our take on AI alignment, which involves conservative philosophy and the political school of thought of Agonistic Democracy. We also suggested an approach to AI alignment in which the conflicts between multiple agents lead to an AI system that has...
In our previous post, we outlined a view of AI alignment we disagree with as a central assumption in current discussions of AI alignment, and suggested that it might be useful to push in a different direction, which we started to outline. Here, we’ll point out that we think alignment...
Worldbuilding is critical for understanding the world and how the future could go - but it’s also useful for understanding counterfactuals better. With that in mind, when people talk about counterfactuals in AI development, they seem to assume that safety would always have been a focus. That is, there’s a...
Current plans for AI alignment (examples) come from a narrow, implicitly filtered, and often (intellectually, politically, and socially) liberal standpoint. This makes sense, as the vast majority of the community, and hence alignment researchers, have those views. We, the authors of this post, belong to the minority of AI Alignment...