Epistemic status: Reasonably confident in the basic mechanism.
Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.
Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.
I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .
Too much compression
A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping with a familiar one, and our mind helpfully simplifies it down to the familiar. Just as numerical rounding smooths away precision, conceptual rounding quietly erases crucial differences. Let's start with an example.
No, This Isn't The Old Demons Story Again
Possibly the worst case is a cluster of overlapping but distinct models of what may be the problem inside of AIs: "mesa-optimizers," "optimization demons," "subagents," and "inner alignment."
- Mesa-optimizers: Learned optimizers arising within AI training because often the easiest solution to a problem is to find an optimizer.
- Optimization demons: Optimizers arise in dynamical systems under strong optimization pressures.
- Subagents: Distinct internal agent-like parts with possibly conflicting objectives.
- Inner alignment: Unclear. I would prefer it to mean a general concern with aligning an AI's internal structure.
At different points in time, different frames were in the spotlight, while others suffered from being rounded to the locally salient ones.
Compressing these into one of the frames leads to serious misunderstandings. For example, it seems fairly natural to start with an AI system with partially conflicted subagents or for such conflict to arise from learning internal conflicts from humans. Rounding this to 'mesa-optimizers' makes it very difficult to reason about—and vice versa.
My guess is due to the conceptual rounding errors, only a minority of alignment researchers has coherent versions of all the overlapping frames.
The Compression Trade-off
Our brains are constantly performing a delicate balancing act. On one side, we have the crushing complexity of reality. On the other, we have our limited cognitive resources and the need to actually get things done. Compression isn't just useful – it's necessary for thought.
Distinct but overlapping ideas are costly: they consume mental resources, so our minds naturally compress. In Rounding Errors, the bits lost are too costly.
An overlapping model of what may be going on is Recognition Overshoot—our brains constantly try to map new things onto familiar patterns. In some cases, the differences are discarded.
More of this
This isn't isolated but is a fairly common source of confusion and error.
- On LessWrong, almost any idea from representational alignment or convergent abstractions risks getting rounded off to Natural Abstractions
- Instrumental convergence vs. Power-seeking
- Embedded agency vs. Embodied cognition vs. Situated agents
- Various stories about recursive feedback loops vs. Intelligence explosion
This was just AI safety, just LessWrong.
What Can We Do?
The solution isn't to never round off concepts – that way lies madness and an inability to actually think about anything.
My default suggestion is to increase metacognitive awareness. When you notice the feeling 'Ah, I know this—it's the old idea again,' you can use it as a trigger for deeper reconsideration: ‘I might be conceptually rounding right now. What's getting lost?’
Once recognized, avoiding Conceptual Rounding Errors seems tractable:
- Explicitly Tag Differences: When encountering new ideas, consciously articulate their differences from what you already know. Force yourself to put distinctions into words.
- Active Decompression: Revisit compressed concepts and expand them back into their nuanced forms.
- Distinct Predictions: Remember edge cases where the models differ.
One surprisingly effective technique I've found is simple visualization. Draw the ideas. Put them on paper. Our visual system often maintains distinctions our verbal system compresses away.
When It Matters
Sometimes precision doesn't matter much. If you're deciding where to get lunch, rounding "restaurant with good Southeast Asian fusion cuisine" to "Thai place" probably won't ruin your day.
But in domains where we're trying to think carefully about complex systems – like AI alignment – these small distinctions can compound into major confusions.
That said, I should note: if you're reading this and thinking "Isn't this just another paraphrase of Bucket Errors or Fallacies of Compression?" – well, maybe. But maybe it's worth checking what distinctions you might be rounding away.
Link to the source of the quote?