You might enjoy
https://www.ams.org/journals/bull/2004-41-03/S0273-0979-04-01026-2/S0273-0979-04-01026-2.pdf
which explains the role that the resulting problem (representing homology class of manifolds by submanifolds/cobordisms) played in inspiring the work of René Thom on cobordism, stable homotopy theory, singularity theory...
Here are two more closely related results in the same circle of ideas. The first one gives a description (a kind of fusion of Dold-Thom and Eilenberg-Steenrod) of homology purely internal to homotopy theory, and the second explains how homological algebra falls out of infinity-category theory:
All the frames you are mentioning are good for intuition. I would say the deepest one is 4. and that everything falls into place cleanly once you formulate things in the language of infinity-category theory (at the price of a lot of technicalities to establish the "right" language). For example,
Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.
Well, I can certainly emphasize with the feeing that compromising on a core part of your identity is threatening ;-)
More seriously, what you are describing as empathy seems to be asking the question:
"What if my mind was transported into their bodies?"
rather than
"What if I was (like) them, including all the relevant psychological and emotional factors?"
The latter question should lead feelings of disgust iff the target experiences feelings of disgust.
Of course, empathy is all the more difficult when the person you are trying to emphasize with is ...
Historically commutative algebra came out of algebraic number theory, and the rings involved - Z,Z_p, number rings, p-adic local rings... - are all (in the modern terminology) Dedekind domains.
Dedekind domains are not always principal, and this was the reason why mathematicians started studying ideals in the first place. However, the structure of finitely generated modules over Dedekind domains is still essentially determined by ideals (or rather fractional ideals), reflecting to some degree the fact that their geometry is simple (1-dim regular Noetherian domains).
This could explain why there was a period where ring theory developed around ideals but the need for modules was not yet clarified?
Modules are just much more flexible than ideals. Two major advantages:
BTW the geometric perspective might sound abstract (and setting it up rigorously definitely is!) but it is many ways more concrete than the purely algebraic one. For instance, a quasicoherent sheaf is in first approximation a collection of vector spaces (over varying "residue fields") glued together in a nice way over the topological space Spec(R), and this clarifies a lot how and when questions about modules can be reduced to ordinary linear algebra over fields.
Some of my favourite topics in pure mathematics! Two quick general remarks:
There is another interesting connection between computation and bounded treewidth: the control flow graphs of programs written in languages "without goto instructions" have uniformly bounded treewidth (e.g. <7 for goto-free C programs). This is due to Thorup (1998):
https://www.sciencedirect.com/science/article/pii/S0890540197926973
Combined with graphs algorithms for bounded treewidth graphs, this has apparently been used in the analysis of compiler optimization and program verification problems, see the recent reference:
Nice!
I would add the following, which is implicit in the presentation: this phenomenon of real representations is not specific to finite groups. Real irreducible representations of a group are always neatly divided into three types: real, complex or quaternionic. This is [Schur\'s lemma](https\://ncatlab\.org/nlab/show/Schur\%27s\+lemma\#statement) together with the fact that the real division algebras are exactly R, C and the quaternions H.
(Should ML interpretability people care about infinite groups to begin with - unlike mathematicians, who ...
On 1., you should consider that, for people who don't know much about QFT and its relationship with SFT (like, say, me 18 months ago), it is not at all obvious that QFT can be applied beyond quantum systems!
In my case, the first time I read about "QFT for deep learning" I dismissed it automatically because I assumed it would involve some far-fetched analogies with quantum mechanics.
but in fact you can also understand the theory on a fine-grained level near an impurity by a more careful form of renormalization, where you view the nearest several impurities as discrete sources and only coarsegrain far-away impurities as statistical noise.
Where could I read about this?
Thanks a lot for writing this! Some clarifying questions:
For sufficiently nice regular, 1-dimensional Bayesian models, Edgeworth-type asymptotic expansions for the Bayesian posterior have been derived in
https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-41/issue-3/Asymptotic-Expansions-Associated-with-Posterior-Distributions/10.1214/aoms/1177696963.full
Q: How can I use LaTeX in these comments? I tried to follow https://www.lesswrong.com/tag/guide-to-the-lesswrong-editor#LaTeX but it does not seem to render.
Here is the simplest case I know, which is a sum of dependent identically distributed variables. In physical terms, it is about the magnetisation of the 1d Curie-Weiss (=mean-field Ising) model. I follow the notation of the paper https://arxiv.org/abs/1409.2849 for ease of reference, this is roughly Theorem 8 + Theorem 10:
Let $M_n=\sum_{i=1}^n \sigma(i)$ be the sum of n dependent Bernouilli rando...
I mentioned samples and expectations for the TLBP because it seems possible (and suggested by the role of degeneracies in SLT) that different samples can correspond to qualitatively different degradations of the model. Cartoon picture : besides the robust circuit X of interest, there are "fragile" circuits A and B, and most samples at a given loss scale degrade either A or B but not both.
I agree that there is no strong reason to overindex on the Watanabe temperature, which is derived from an idealised situation: global Bayesian inference, degeneracies exactly at the optimal parameters, "relatively finite variance", etc. The scale you propose seems quite natural but I will let LLC-practitioners comment on that.
Is the following a fair summary of the thread ~up to "Natural degradation" from the SLT persepctive?
A closely related perspective on fluctuations of sequences of random variables has been studied recently in pure probability theory under the name of "mod-Gaussian convergence" (and more generally "mod-phi convergence"). Mod-Gaussian convergence of a sequence of RVs or random vectors is just the right amount of control over the characteristic functions - or in a useful variant, the whole complex Laplace transforms - to imply a clean description of the fluctuations at various scales (CLT, Edgeworth expansion, "normality zone", local CLT, moderate deviations...
Very nice!
Conversely, it may be possible to identify practical situations where some of these aphorisms are sub-optimal, which could help point out the limitations of applying AIT to real agents?