Epistemic status: Reasonably confident in the basic mechanism.

Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.

Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.

I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .

Too much compression

A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping with a familiar one, and our mind helpfully simplifies it down to the familiar. Just as numerical rounding smooths away precision, conceptual rounding quietly erases crucial differences. Let's start with an example.

No, This Isn't The Old Demons Story Again

Possibly the worst case is a cluster of overlapping but distinct models of what may be the problem inside of AIs: "mesa-optimizers," "optimization demons," "subagents," and "inner alignment."

  • Mesa-optimizers: Learned optimizers arising within AI training because often the easiest solution to a problem is to find an optimizer.
  • Optimization demons: Optimizers arise in dynamical systems under strong optimization pressures.
  • Subagents: Distinct internal agent-like parts with possibly conflicting objectives.
  • Inner alignment: Unclear. I would prefer it to mean a general concern with aligning an AI's internal structure.

At different points in time, different frames were in the spotlight, while others suffered from being rounded to the locally salient ones.

Compressing these into one of the frames leads to serious misunderstandings. For example, it seems fairly natural to start with an AI system with partially conflicted subagents or for such conflict to arise from learning internal conflicts from humans. Rounding this to 'mesa-optimizers' makes it very difficult to reason about—and vice versa. 

My guess is due to the conceptual rounding errors, only a minority of alignment researchers has coherent versions of all the overlapping frames.

The Compression Trade-off

Our brains are constantly performing a delicate balancing act. On one side, we have the crushing complexity of reality. On the other, we have our limited cognitive resources and the need to actually get things done. Compression isn't just useful – it's necessary for thought.

Distinct but overlapping ideas are costly: they consume mental resources, so our minds naturally compress. In Rounding Errors, the bits lost are too costly.

An overlapping model of what may be going on is Recognition Overshoot—our brains constantly try to map new things onto familiar patterns. In some cases, the differences are discarded.

More of this

This isn't isolated but is a fairly common source of confusion and error. 

  • On LessWrong, almost any idea from representational alignment or convergent abstractions risks getting rounded off to Natural Abstractions
  • Instrumental convergence vs. Power-seeking
  • Embedded agency vs. Embodied cognition vs. Situated agents
  • Various stories about recursive feedback loops vs. Intelligence explosion

This was just AI safety, just LessWrong.

What Can We Do?

The solution isn't to never round off concepts – that way lies madness and an inability to actually think about anything. 

My default suggestion is to increase metacognitive awareness. When you notice the feeling 'Ah, I know this—it's the old idea again,' you can use it as a trigger for deeper reconsideration: ‘I might be conceptually rounding right now. What's getting lost?’

Once recognized, avoiding Conceptual Rounding Errors seems tractable:

  • Explicitly Tag Differences: When encountering new ideas, consciously articulate their differences from what you already know. Force yourself to put distinctions into words.
  • Active Decompression: Revisit compressed concepts and expand them back into their nuanced forms.
  • Distinct Predictions: Remember edge cases where the models differ.

One surprisingly effective technique I've found is simple visualization. Draw the ideas. Put them on paper. Our visual system often maintains distinctions our verbal system compresses away.

When It Matters

Sometimes precision doesn't matter much. If you're deciding where to get lunch, rounding "restaurant with good Southeast Asian fusion cuisine" to "Thai place" probably won't ruin your day.

But in domains where we're trying to think carefully about complex systems – like AI alignment – these small distinctions can compound into major confusions.

That said, I should note: if you're reading this and thinking "Isn't this just another paraphrase of Bucket Errors or Fallacies of Compression?" – well, maybe. But maybe it's worth checking what distinctions you might be rounding away.

New Comment
10 comments, sorted by Click to highlight new comments since:

I find that for me, and I get the vibe that for many others as well, there's often a slight sense of moral superiority happening when conceptual rounding happens. Like "aha, I'm better than you for knowing more and realizing that your supposedly novel idea has already been done". 

If I notice myself having that slight smug feeling, it's a tip-off that I'm probably rounding off because some part of me wants to feel superior, not because the rounding is necessarily correct.

I may feel smug if the "novel idea" is basically a worse version of an existing one, but there are more interesting possibilities to probe for.

  1. The novel idea is a meaningful extension/generalization of an existing concept. E.g., Riemann --> Lebesgue integration
  2. The novel idea is equivalent to an existing concept but formulated differently. E.g., Newton and Leibniz versions of calculus.
  3. The novel idea is a more detailed explanation of an existing concept. E.g., chemical bonding --> molecular orbital theory.

Less likely to be rounded away:

  1. The novel idea overlaps with existing concepts but is neither a subset nor an extension. E.g., General Relativity and Quantum Mechanics.
  2. The novel idea applies existing concepts to a new domain. E.g., applying information theory to DNA.
  3. The novel idea synthesizes multiple existing concepts into a greater whole. E.g., Darwinian evolution as a combination of Malthusian population dynamics and natural variation.
  4. The novel idea provides a unifying framework for previously disconnected concepts. E.g., Maxwell's equations unifying electricity, magnetism, and optics.

Nearly all conceptual rounding errors will not be anything as grand as the extreme examples I gave, but often there is still something worth examining.

I think there's something to be said for this straightforward counterargument here: conceptual rounding off is performing the valuable service of lumping together ideas that are not that different and only seem importantly different to their authors/supporters rather than actually meaningfully different ideas.

An idea should either be precisely defined enough that it's clear why it can't be rounded off (once the precise definition is known), or it's a vague idea and it either needs to become more precise to avoid being rounded or it is inherently vague and being vague there can't be much harm from rounding because it already wasn't clear where its boundaries were in concept space.

So... there surely are things like (overlapping, likely non-exhaustive):

  • Memetic Darwinian anarchy - concepts proliferating without control, trying to carve out for themselves new niches in the noosphere or grab parts of real estate belonging to incumbent concepts.
  • Memetic warfare - individuals, groups, egregores, trying to control the narrative by describing the same thing in the language of your own ideology, yadda yadda.
  • Independent invention of the same idea - in which case it's usually given different names (but also, plausibly, since some people may grow attached to their concepts of choice, they might latch onto trivial/superficial differences and amplify that, so that one or more instances of this multiply independently invented concept now is now morphed into something else than what it "should be").
  • Memetic rent seeking - because introducing a new catchy concept might marginally bump up your h-index.

So, as usual, the law of equal and opposite advice applies.

Still, the thing Jan describes is real and often a big problem.

I also think I somewhat disagree with this:

An idea should either be precisely defined enough that it's clear why it can't be rounded off (once the precise definition is known), or it's a vague idea and it either needs to become more precise to avoid being rounded or it is inherently vague and being vague there can't be much harm from rounding because it already wasn't clear where its boundaries were in concept space.

Meanings are often subtle, intuited but not fully grasped, in which case a (premature) attempt to explicitize them risks collapsing their reference to the important thing they are pointing at. Many important concepts are not precisely defined. Many are best sorta-defined ostensively: "examples of X include A, B, C, D, and E; I'm not sure what it makes all of them instances of X, maybe it's that they share the properties Y and Z ... or at least my best guess is that Y and Z are important parts of X and I'm pretty sure that X is a Thing™".

Eliezer has a post (I couldn't find it at the moment) where he noticed that the probabilities he gave were inconsistent. He asks something like, "Would I really not behave as if God existed if I believed that P(Christianity)=1e-5?" and then, "Oh well, too bad, but I don't know which way to fix it, and fixing it either way risks losing important information, so I'm deciding to live with this lack of consistency for now."

The seemingly small differences might matter hugely. See the long debate over what caused scurvy and how to prevent/cure it. 
 

When the Royal Navy changed from using Sicilian lemons to West Indian limes, cases of scurvy reappeared. The limes were thought to be more acidic and it was therefore assumed that they would be more effective at treating scurvy. However, limes actually contain much less vitamin C and were consequently much less effective.

Furthermore, fresh fruit was substituted with lime juice that had often been exposed to either air or copper piping. This resulted in at least a partial removal of vitamin C from the juice, thus reducing its effectiveness.

The discovery that fresh meat was able to cure scurvy was another reason why people no longer treated the condition with fresh fruit. This discovery led to the belief that perhaps scurvy was not caused by a dietary problem at all. Instead, it was thought to be the result of a bacterial infection from tainted meat. In fact, the healing properties of fresh meat come from the high levels of vitamin C it contains.

Finally, the arrival of steam shipping substantially reduced the amount of time people spent at sea, therefore the difficulties in carrying enough fresh produce were reduced. This decreased the risk of scurvy so that less effective treatments, such as lime juice, proved effective enough to deal with the condition most of the time.

TBC the scurvy story was very complicated. Im not actually sure the above summary covers it completely accurately. Though it illustrates some of the factors. Conceptual lumping was a real impediment to figuring out was going on!

 


 

Link to the source of the quote?

A google of the first paragraph takes you quickly to https://www.bluesci.co.uk/posts/forgotten-knowledge

Why do you think “rounding errors” occur?

  • I expect cached thoughts to often look from the outside similar to “rounding errors”: someone didn’t listen to some actual argument, because they patter-matched it to something else they already have an opinion on/answer to.
  • The proposed mitigations shouldn’t really work. E.g., with explicitly tagging differences, if you “round off” an idea you hear to something you already know, you won’t feel it’s new and won’t do the proposed system-2 motions. Maybe a thing to do instead is checking whether what you’re told is indeed the idea you know when encountering already known ideas.

Also, I’m not convinced by the examples.

  • On LessWrong, almost any idea from representational alignment or convergent abstractions risks getting rounded off to Natural Abstractions
  • Instrumental convergence vs. Power-seeking
  • Embedded agency vs. Embodied cognition vs. Situated agents
  • Various stories about recursive feedback loops vs. Intelligence explosion

I’ve only noticed something akin to the last one. It’s not very clear in what sense people would round off instrumental convergence to power-seeking (and are there examples severe power-seeking was rounded off to instrumental convergence in an invalid way?), or “embodied cognition” to embedded agency.

Would appreciate links if you have any!

Different framings of mathematically equivalent paradigms can vary wide in how productive they are in practice. Im really not a big of rounding. 

Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.

Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.

 

Some questions regarding these contexts:

-Is it true that you can deduce that "not much progress" is being made? In (pure) maths, it is sometimes very useful to be able to connect to points of view/notions (e.g. (co)homological theories, to name the most obvious example that comes to mind).

-What is the goal of such interactions? Is it truly to point out relevant related work? To dismiss other people's ideas for {political/tribal/ego-related} motives? Other?

As for possible fixes:

-Maintain a collective {compendium/graph/whatever data structure is relevant} of important concepts, with precise enough definitions, and comparison information (examples and/or theoretical arguments) between similar, but not identical, ideas.

Or rather: acknowledging that the AI Safety community(ies) is/are terrible at coordination, devise a way of combining/merging such {compendiums/graphs/whatever}, for it is unlikely that only one emerges...

Curated and popular this week