LESSWRONG
LW

All of Guillaume Corlouer's Comments + Replies

What's the Right Way to think about Information Theoretic quantities in Neural Networks?

Answer by Guillaume CorlouerJan 21, 202520

Another perspective would be too look at the activations of an autoregressive deep learning model, e.g. a transformer, during inference as a stochastic process: the collection of activation $(X_{t})$ at some layer as random variables indexed by time t, where t is token position.

One could for example look at mutual information between the history $X_{t}^{-} = (X_{t}, X_{t - 1}, . . .)$ and the future of the activations $X_{t + 1}$ , or look at (conditional) mutual information between the past and future of subprocesses of $X_{t}$ (note: transfer entropy can be a... (read more)

Is AI Physical?

Guillaume Corlouer2mo10

An interesting analogy, closer to ML, would be to look at neuroscience. It's an older field than ML, and it seems that the physics perspective has been fairly productive, even though not successful at providing a grand unified theory of cognition yet. Some examples:

Using methods from electric circuits to explain neurons (Hodgkin-Huxley model, cable theory)
Dynamical systems to explain phenomena like synchronization in neuronal oscillations (ex: Kuramoto model)
Ising models to model some collective behaviour of neurons
Information theory is commonly used

... (read more)

2Lauren Greenspan1mo

Thanks for the recommendation! The pathways of scientific progress here seem very interesting (for example: physics -> neuro -> AI -> ... v. physics -> AI -> neuro -> ...), particularly if we think about feeding back between experimental and theoretical support to build up a general understanding. Physics is really good at fitting theories together in a mosaic -- at a large scale you have a nice picture of the universe, and the tiles (theories) fit together but aren't part of the same continuous picture, allowing for some separation between different regimes of validity. It's not a perfect analogy, but it says something about physics' ability to split the difference between reductionism and emergence. It would be nice to have a similar picture in neuroscience (and AI), though this might be more difficult.

1MichaelCorey1mo

The debate on AI’s physical nature is fascinating. While AI itself is a set of algorithms, it relies on physical infrastructure—hardware, servers, and energy. Some argue that AI "exists" within digital frameworks but lacks tangible form. The philosophical implications are worth exploring. If you're interested in writing about AI or tech advancements, GradesFixer https://gradesfixer.com/blog/best-literary-topics-for-writing-essays-that-will-be-popular-in-2025/ has a great collection of literary topics that will be trending in 2025. Their resource on best literary topics can help refine your arguments and structure your thoughts effectively.

The purposeful drunkard

Guillaume Corlouer2mo30

Interesting! Perhaps one way to not be fooled by such situations could be to use a non-parametric statistical test. For example, we could apply permutation testing: by resampling the data to break its correlation structure and performing PCA on each permuted dataset, we can form a null distribution of eigenvalues. Then, by comparing the eigenvalues from the original data to this null distribution, we could assess whether the observed structure is unlikely under randomness. Specifically, we’d look at how extreme each original eigenvalue is relative to those... (read more)

Towards Measures of Optimisation

Guillaume Corlouer2y20

Right, I got confused because I thought your problem was about trying to define a measure of optimisation power - for ex analogous to the Yudkowsky measure - that was also referring to a utility function, while being invariant from scaling and translation but this is different from asking

"what fraction of the default expected utility comes from outcomes at least as good as this one?’"

Towards Measures of Optimisation

Guillaume Corlouer2yΩ110

What about optimisation power of $x^{'}$ as a measure of outcome that have utility greater than the utility of $x^{'}$ ?

Let $U_{x^{'}}$ be the set of outcome with utility greater than $x^{'}$ according to utility function $u$ :

U_{x^{'}} := {x \in X | u (x) \geq u (x^{'})}

The set $U_{x^{'}}$ is invariant under translation and non-zero rescaling of the utility function $u$ and we define the optimisation power of the outcome $x$ ' according to utility function $u$ as:

O P_{u} (x^{'}) := - log (\int_{U_{x^{'}}} p) = - log (\int_{x \in U_{x^{'}}} p (x) d x)

This does not suffer from compar... (read more)

2Alexander Gietelink Oldenziel2y

Yeah this is the expectation of the Yudkowsky measure I think?

"Brain enthusiasts" in AI Safety

Guillaume Corlouer3y60

Nice recommendations! In addition to brain enthusiasts being useful for empirical work, there also are theoretical tools from system neuroscience that could be useful for AI safety. One area in particular would be for interpretability: if we want to model a network at various levels of "emergence", recent development in information decomposition and multivariate information theory to move beyond pairwise interaction in a neural network might be very useful. Also see recent publications to model synergestic information and dynamical independance to pe... (read more)

2Jan3y

Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment). But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the activations, transforming everything into a useful form, ...) and then there are so many parameters to look at. Giving researchers structured access to the model via a research API could solve a lot of those difficulties and appears like something that totally should exist (although there is of course the danger of accelerating progress on the capabilities side also).

Metalignment: Deconfusing metaethics for AI alignment.

Guillaume Corlouer6yΩ110

Thanks for all the useful links! I'm also always happy to receive more feedback.

I agree that the sense in which I use metaethics in this post is different from what academic philosophers usually call metaethics. I have the impression that metaethics, in academic sense, and metaphilosophy are somehow related. Studying what morality itself is, how to select ethical theories and what is the process behind ethical reasoning seems not independent. For example if moral nihilism is more plausible then it seems to be less likely that there is some meaningful... (read more)

Metalignment: Deconfusing metaethics for AI alignment.

Guillaume Corlouer6y10

Sure, I'm happy to read/discuss your ideas about this topic.

Metalignment: Deconfusing metaethics for AI alignment.

Guillaume Corlouer6y10

I am not sure about what computer aided analysis mean but one possibility could be to have formal ethical theories and prove some theorem inside their formal framework. But this raises questions about the sort of formal framework that one could use to 'prove theorems' under ethics in a meaningful way.

1Teerth Aloke6y

Till this point, I have heard the idea of an ethics axiomatic system several times. But, no suggestion of what such axioms could be. Computer aided analysis in the sense of an automated theorem checker to search for contradictions in the system.