Simon Pepin Lehalleur

LESSWRONG
LW

Simon Pepin Lehalleur — LessWrong

Cognitive Tech from Algorithmic Information Theory

Very nice!

Conversely, it may be possible to identify practical situations where some of these aphorisms are sub-optimal, which could help point out the limitations of applying AIT to real agents?

Dalcy's Shortform

Simon Pepin Lehalleur4mo30

You might enjoy

https://www.ams.org/journals/bull/2004-41-03/S0273-0979-04-01026-2/S0273-0979-04-01026-2.pdf

which explains the role that the resulting problem (representing homology class of manifolds by submanifolds/cobordisms) played in inspiring the work of René Thom on cobordism, stable homotopy theory, singularity theory...

Dalcy's Shortform

Simon Pepin Lehalleur4mo70

Here are two more closely related results in the same circle of ideas. The first one gives a description (a kind of fusion of Dold-Thom and Eilenberg-Steenrod) of homology purely internal to homotopy theory, and the second explains how homological algebra falls out of infinity-category theory:

Consider functors E:S_* --> S_* from the infinity-category of spaces to itself which commutes with filtered colimits, carries pushout squares to pullback squares, sends the one-point space to itself, and the 0-sphere (aka two points!) to a discrete space. Then A=E(

... (read more)

Dalcy's Shortform

Simon Pepin Lehalleur4mo70

All the frames you are mentioning are good for intuition. I would say the deepest one is 4. and that everything falls into place cleanly once you formulate things in the language of infinity-category theory (at the price of a lot of technicalities to establish the "right" language). For example,

singular homology with coefficients in A can be characterised as the unique colimit-preserving infinity-functor from the infinity-category of spaces/homotopy types/infinity-grpoids/anima to the derived infinity-category of abelian groups which sends a one-poin

... (read more)

Natural Latents: Latent Variables Stable Across Ontologies

Simon Pepin Lehalleur7mo50

Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.

My Empathy Is Rarely Kind

Simon Pepin Lehalleur8mo5135

Well, I can certainly emphasize with the feeing that compromising on a core part of your identity is threatening ;-)

More seriously, what you are describing as empathy seems to be asking the question:

"What if my mind was transported into their bodies?"

rather than

"What if I was (like) them, including all the relevant psychological and emotional factors?"

The latter question should lead feelings of disgust iff the target experiences feelings of disgust.

Of course, empathy is all the more difficult when the person you are trying to emphasize with is ... (read more)

Zach Furman's Shortform

Simon Pepin Lehalleur9mo40

Historically commutative algebra came out of algebraic number theory, and the rings involved - Z,Z_p, number rings, p-adic local rings... - are all (in the modern terminology) Dedekind domains.

Dedekind domains are not always principal, and this was the reason why mathematicians started studying ideals in the first place. However, the structure of finitely generated modules over Dedekind domains is still essentially determined by ideals (or rather fractional ideals), reflecting to some degree the fact that their geometry is simple (1-dim regular Noetherian domains).

This could explain why there was a period where ring theory developed around ideals but the need for modules was not yet clarified?

Zach Furman's Shortform

Simon Pepin Lehalleur9mo82

Modules are just much more flexible than ideals. Two major advantages:

Richer geometry. An ideal is a closed subscheme of Spec(R), while modules are quasicoherent sheaves. An element x of M is a global section of the associated sheaf, and the ideal Ann(x) corresponds to the vanishing locus of that section. This leads to a nice geometric picture of associated primes and primary decomposition which explains how finitely generated modules are built out of modules R/P with P prime ideal (I am not an algebraist at heart, so for me the only way to remember the st

... (read more)

Zach Furman's Shortform

Simon Pepin Lehalleur9mo80

BTW the geometric perspective might sound abstract (and setting it up rigorously definitely is!) but it is many ways more concrete than the purely algebraic one. For instance, a quasicoherent sheaf is in first approximation a collection of vector spaces (over varying "residue fields") glued together in a nice way over the topological space Spec(R), and this clarifies a lot how and when questions about modules can be reduced to ordinary linear algebra over fields.

Zach Furman's Shortform

Simon Pepin Lehalleur9mo90

Some of my favourite topics in pure mathematics! Two quick general remarks:

I don't hold such a strong qualitative distinction between the theory of group actions, and in particular linear representations, and the theory of modules. They are both ways to study an object by having it act on auxiliary structures/geometry. Because there are in general fewer tools to study group actions than modules, a lot of pure mathematics is dedicated to linearizing the former to the latter in various ways.
There is another perspective on modules over commutative rings which

... (read more)

When Are Results from Computational Complexity Not Too Coarse?

Simon Pepin Lehalleur1y60

There is another interesting connection between computation and bounded treewidth: the control flow graphs of programs written in languages "without goto instructions" have uniformly bounded treewidth (e.g. <7 for goto-free C programs). This is due to Thorup (1998):

https://www.sciencedirect.com/science/article/pii/S0890540197926973

Combined with graphs algorithms for bounded treewidth graphs, this has apparently been used in the analysis of compiler optimization and program verification problems, see the recent reference:

https://dl.acm.org/doi/abs/10.114... (read more)

The absolute basics of representation theory of finite groups

Simon Pepin Lehalleur1y30

Nice!

I would add the following, which is implicit in the presentation: this phenomenon of real representations is not specific to finite groups. Real irreducible representations of a group are always neatly divided into three types: real, complex or quaternionic. This is [Schur\'s lemma](https\://ncatlab\.org/nlab/show/Schur\%27s\+lemma\#statement) together with the fact that the real division algebras are exactly R, C and the quaternions H.

(Should ML interpretability people care about infinite groups to begin with - unlike mathematicians, who ... (read more)

Renormalization Redux: QFT Techniques for AI Interpretability

Simon Pepin Lehalleur1y10

On 1., you should consider that, for people who don't know much about QFT and its relationship with SFT (like, say, me 18 months ago), it is not at all obvious that QFT can be applied beyond quantum systems!

In my case, the first time I read about "QFT for deep learning" I dismissed it automatically because I assumed it would involve some far-fetched analogies with quantum mechanics.

Renormalization Redux: QFT Techniques for AI Interpretability

Simon Pepin Lehalleur1y10

but in fact you can also understand the theory on a fine-grained level near an impurity by a more careful form of renormalization, where you view the nearest several impurities as discrete sources and only coarsegrain far-away impurities as statistical noise.

Where could I read about this?

Renormalization Redux: QFT Techniques for AI Interpretability

Simon Pepin Lehalleur1y120

Thanks a lot for writing this! Some clarifying questions:

In this context, is QFT roughly a shorthand for "statistical field theory, studied via the mathematical methods of Euclidean QFT"? Or do you expect intuitions from specifically quantum phenomena to play a role?
There is a community of statistical physicists who use techniques from statistical mechanics of disordered systems and phase transitions to study ML theory, mostly for simple systems (linear models, shallow networks) and simple data distributions (Gaussian data, student-teacher model with a sim

... (read more)

The Laws of Large Numbers

Simon Pepin Lehalleur1y30

For sufficiently nice regular, 1-dimensional Bayesian models, Edgeworth-type asymptotic expansions for the Bayesian posterior have been derived in

https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-41/issue-3/Asymptotic-Expansions-Associated-with-Posterior-Distributions/10.1214/aoms/1177696963.full

The Laws of Large Numbers

Simon Pepin Lehalleur1y30

Q: How can I use LaTeX in these comments? I tried to follow https://www.lesswrong.com/tag/guide-to-the-lesswrong-editor#LaTeX but it does not seem to render.

Here is the simplest case I know, which is a sum of dependent identically distributed variables. In physical terms, it is about the magnetisation of the 1d Curie-Weiss (=mean-field Ising) model. I follow the notation of the paper https://arxiv.org/abs/1409.2849 for ease of reference, this is roughly Theorem 8 + Theorem 10:

Let $M_n=\sum_{i=1}^n \sigma(i)$ be the sum of n dependent Bernouilli rando... (read more)

Dmitry's Koan

Simon Pepin Lehalleur1y10

I mentioned samples and expectations for the TLBP because it seems possible (and suggested by the role of degeneracies in SLT) that different samples can correspond to qualitatively different degradations of the model. Cartoon picture : besides the robust circuit X of interest, there are "fragile" circuits A and B, and most samples at a given loss scale degrade either A or B but not both.

I agree that there is no strong reason to overindex on the Watanabe temperature, which is derived from an idealised situation: global Bayesian inference, degeneracies exactly at the optimal parameters, "relatively finite variance", etc. The scale you propose seems quite natural but I will let LLC-practitioners comment on that.

Dmitry's Koan

Simon Pepin Lehalleur1y50

Is the following a fair summary of the thread ~up to "Natural degradation" from the SLT persepctive?

Current SLT-inspired approaches are right to consider samples of the "tempered local Bayesian posterior" provided by SGLD as natural degradations of the model.
However they mostly only use those samples (at a fixed Watanabe temperature) to compute the expectation of the loss and the resulting LLC, because that is theoretically grounded by Watanabe's work.
You suggest instead to compute, using those sampled weights, the expectations of more complicated

... (read more)

The Laws of Large Numbers

Simon Pepin Lehalleur1y100

A closely related perspective on fluctuations of sequences of random variables has been studied recently in pure probability theory under the name of "mod-Gaussian convergence" (and more generally "mod-phi convergence"). Mod-Gaussian convergence of a sequence of RVs or random vectors is just the right amount of control over the characteristic functions - or in a useful variant, the whole complex Laplace transforms - to imply a clean description of the fluctuations at various scales (CLT, Edgeworth expansion, "normality zone", local CLT, moderate deviations... (read more)