Math That Clicks: Look for Two-Way Correspondences

TurnTrout

Andrew Critch made this explicit to me.

Suppose you want to formalize "information". What does it mean to "gain information about something"?

You know about probability distributions, and you remember that Shannon came up with information theory, but you aren't allowed to look that up, because you're stranded on a desert island, and your "rescuers" won't take you to safety until you can formalize this.

Don't you hate it when this happens?

Anyways, you're starving, and so you think:

Suppose I'm trying to figure out the bias on a coin someone is flipping for me. I start with a uniform prior over $θ \in [0, 1]$ , and observe the process for a while. As I learn more, I become more confident in the true hypothesis,^[1] and so my credence increases.

So maybe, I gain information about a hypothesis when the Bayesian update increases the hypothesis's probability!

You sanity-check your guess by checking this against remembered instances of information gain: memorizing facts in history class, a friend telling you their address, and so on.

Your stomach rumbles. This guess should be good enough. You tell your would-be rescuers your final answer...

🚨 No! Wrong! 🚨 That's not how you should discover new math. True, you formalized a guess that was prompted by examples of the thing-in-question (information gain) – there was a one-way correspondence from intuitive concept $⟹$ math. That's not enough.

Recovering the intuition from the math

Don't think for a second that having math representing your thoughts means you've necessarily made progress – for the kind of problems I'm thinking about right now, the math has to sing with the elegance of the philosophical insight you're formalizing.

~ How I Do Research

I claim that you should create math such that the math suggests the intuitive concept it's formalizing. If you obfuscate any suggestive notation and show the math to someone who already knows the intuitive concept, would the concept jump back out at them?

Above, information gain was formalized as increase in credence; roughly,

$Info-gain (hypothesis, observation) := P (h ∣ o) - P (h) .$

Assuming we already know about probability and arithmetic and function notation, what does the obfuscated version bring to mind?

$met32vfs (e1ht, jt3n) := P (e 1 h t ∣ j t 3 n) - P (e 1 h t) .$

When this value is high, you're just becoming more confident in something. What if you gain information that makes you confused?

What if you thought Newtonian mechanics was The Truth, and then Mercury's perihelion shows up with a super-wack precession. Are you gonna say you've "lost information about the Newtonian hypothesis" with a straight face? Uh, awkward...

Heuristic: show that only your quantity achieves obviously desirable properties

In the case of information, the winner is "formalize for-sure properties that the quantity should obey", like

The more sure you are something will go a certain way, the less you expect to learn from it, and
You can't have "negative" information, and
Totally predictable events give you 0 information, and
Information is additive across independent events.

By proving that only your function meets the desiderata, this decomposes "does this formalize the right thing?" into "do we really need the desiderata, and are they correctly formalized?". Decomposition is crucial here in a Debate sense: disagreement can be localized to smaller pieces and simpler claims.

You can't always pull this off, but when you can, it's awesome.

Probability theory: Cox's theorem
Utility theory: VNM utility theorem
Special relativity: given^[2] "observers can only measure relative (not absolute) velocity" and "all observers agree that light has a fixed speed $c$ in vacuum", special relativity follows logically.

Heuristic: compound concepts

If you have a really solid concept, like the expected-value operator $E [\cdot]$ , and another really solid concept, like "surprisal $- log p$ at seeing an event of probability $p$ ", you can sometimes build another intuitively correct concept: expected surprisal, or entropy: $E_{x \sim p (X)} [- log P (x)]$ . Just make sure the types match up!

More examples

Sometimes, you can't mathematically prove the math is right, given desiderata, but you can still estimate how well the math lets a fresh-eyed reader infer the intuitive concept. Here are some examples which mathematically suggest their English names.

Distance metrics obey basic properties for quantifying exactly how "close" two points are.
Contractions bring points closer together.
Derivatives capture instantaneous rate of change.
Stochastic processes capture how a system evolves over time.
Value functions capture the expected value of following a policy.

Why might the second correspondence be good?

By itself, "intuition $⟹$ math" can anchor you on partial answers and amplify confirmation bias; in my fictional example above, counterexamples to "information = change in credence" were not considered.

Here's my advice: Whenever possible, clarify your assumed desiderata. Whenever possible, prove your construction is basically the only way to get all the desiderata. Whenever possible, build the math out of existing correct pieces of math which correspond to the right building blocks (expected + surprisal = expected surprisal).

If something feels wrong, pay attention, and then mentally reward yourself for noticing the wrongness. When you're done, give the math a succinct, clear name.

Make the math as obviously correct as possible. In my personal and observed experience, this advice is not followed by default, even though you'd obviously agree with it.

Strive for clearly correct math, informative names, and unambiguously specified assumptions and desiderata. Strive for a two-way correspondence: the concept should jump out of the math.

Assuming they actually are flipping a coin with a stationary bias (a Bernoulli process). ↩︎
And some basic background assumptions, like "the laws of physics are spatially, temporally, and rotationally invariant", etc. ↩︎

[-]DanielFilan5y20

Contractions bring points closer together.

I'm actually really bothered by this one because that's not what a contraction mapping is. A contraction mapping isn't just something that brings points closer together, it's a mapping where there's some factor such that for any pair of points, their distance gets multiplied by a factor of at most $γ$ . So, if your function brings all points closer together but e.g. the distance between points 1 and 2 gets multiplied by 0.9, the distance between 2 and 3 gets multiplied by 0.99, the distance between 3 and 4 gets multiplied by 0.999, etc, then that's called a "short map" or a "metric map", not a contraction, and the contraction mapping theorem fails to hold (counterexample left to the reader's imagination).

[-]TurnTrout5y20

Wikipedia says if , it's a "non-expansive map". But yes, contraction maps have some Lipschitz constant $γ$ that enforces the behavior you describe. However, notice we still have the math $⟹$ "intuitive contraction" here, so it has the reverse-direction correspondence. Intriguingly, we're missing part of "intuitive contraction = bring things closer together" for the $γ \leq 1$ case, as you point out, so we don't have the forward direction fulfilled.

I guess it has a bunch of names: the link at the top of the wikipedia page is on the words "non-expansive map", at the bottom it's "short map", and the title of the wikipedia page for the thing it calls it a "metric map", and also lists the name "weak contraction". So strange that this simple definition would be so little-used and often-named!

Just like how open maps turn out to be way less useful in topology than continuous maps.

LESSWRONG
LW

39