Contractions bring points closer together.
I'm actually really bothered by this one because that's not what a contraction mapping is. A contraction mapping isn't just something that brings points closer together, it's a mapping where there's some factor such that for any pair of points, their distance gets multiplied by a factor of at most . So, if your function brings all points closer together but e.g. the distance between points 1 and 2 gets multiplied by 0.9, the distance between 2 and 3 gets multiplied by 0.99, the distance between 3 and 4 gets multiplied by 0.999, etc, then that's called a "short map" or a "metric map", not a contraction, and the contraction mapping theorem fails to hold (counterexample left to the reader's imagination).
Wikipedia says if , it's a "non-expansive map". But yes, contraction maps have some Lipschitz constant that enforces the behavior you describe. However, notice we still have the math "intuitive contraction" here, so it has the reverse-direction correspondence. Intriguingly, we're missing part of "intuitive contraction = bring things closer together" for the case, as you point out, so we don't have the forward direction fulfilled.
I guess it has a bunch of names: the link at the top of the wikipedia page is on the words "non-expansive map", at the bottom it's "short map", and the title of the wikipedia page for the thing it calls it a "metric map", and also lists the name "weak contraction". So strange that this simple definition would be so little-used and often-named!
Andrew Critch made this explicit to me.
Suppose you want to formalize "information". What does it mean to "gain information about something"?
You know about probability distributions, and you remember that Shannon came up with information theory, but you aren't allowed to look that up, because you're stranded on a desert island, and your "rescuers" won't take you to safety until you can formalize this.
Don't you hate it when this happens?
Anyways, you're starving, and so you think:
You sanity-check your guess by checking this against remembered instances of information gain: memorizing facts in history class, a friend telling you their address, and so on.
Your stomach rumbles. This guess should be good enough. You tell your would-be rescuers your final answer...
🚨 No! Wrong! 🚨 That's not how you should discover new math. True, you formalized a guess that was prompted by examples of the thing-in-question (information gain) – there was a one-way correspondence from intuitive concept ⟹ math. That's not enough.
Recovering the intuition from the math
I claim that you should create math such that the math suggests the intuitive concept it's formalizing. If you obfuscate any suggestive notation and show the math to someone who already knows the intuitive concept, would the concept jump back out at them?
Above, information gain was formalized as increase in credence; roughly,
Info-gain(hypothesis, observation):=P(h∣o)−P(h).
Assuming we already know about probability and arithmetic and function notation, what does the obfuscated version bring to mind?
met32vfs(e1ht, jt3n):=P(e1ht∣jt3n)−P(e1ht).
When this value is high, you're just becoming more confident in something. What if you gain information that makes you confused?
What if you thought Newtonian mechanics was The Truth, and then Mercury's perihelion shows up with a super-wack precession. Are you gonna say you've "lost information about the Newtonian hypothesis" with a straight face? Uh, awkward...
Heuristic: show that only your quantity achieves obviously desirable properties
In the case of information, the winner is "formalize for-sure properties that the quantity should obey", like
By proving that only your function meets the desiderata, this decomposes "does this formalize the right thing?" into "do we really need the desiderata, and are they correctly formalized?". Decomposition is crucial here in a Debate sense: disagreement can be localized to smaller pieces and simpler claims.
You can't always pull this off, but when you can, it's awesome.
Heuristic: compound concepts
If you have a really solid concept, like the expected-value operator E[⋅], and another really solid concept, like "surprisal −logp at seeing an event of probability p", you can sometimes build another intuitively correct concept: expected surprisal, or entropy: Ex∼p(X)[−logP(x)]. Just make sure the types match up!
More examples
Sometimes, you can't mathematically prove the math is right, given desiderata, but you can still estimate how well the math lets a fresh-eyed reader infer the intuitive concept. Here are some examples which mathematically suggest their English names.
Why might the second correspondence be good?
By itself, "intuition ⟹ math" can anchor you on partial answers and amplify confirmation bias; in my fictional example above, counterexamples to "information = change in credence" were not considered.
Here's my advice: Whenever possible, clarify your assumed desiderata. Whenever possible, prove your construction is basically the only way to get all the desiderata. Whenever possible, build the math out of existing correct pieces of math which correspond to the right building blocks (expected + surprisal = expected surprisal).
If something feels wrong, pay attention, and then mentally reward yourself for noticing the wrongness. When you're done, give the math a succinct, clear name.
Make the math as obviously correct as possible. In my personal and observed experience, this advice is not followed by default, even though you'd obviously agree with it.
Strive for clearly correct math, informative names, and unambiguously specified assumptions and desiderata. Strive for a two-way correspondence: the concept should jump out of the math.
Assuming they actually are flipping a coin with a stationary bias (a Bernoulli process). ↩︎
And some basic background assumptions, like "the laws of physics are spatially, temporally, and rotationally invariant", etc. ↩︎