Neuroscientist turned Interpretability Researcher. Starting Simplex, an AI Safety Research Org.
This sounds right to me, but importantly it also matters what you are trying to understand (and thus compress). For AI safety, the thing we should be interested in is not the weights directly, but the behavior of the neural network. The behavior (the input-output mapping) is realized through a series of activations. Activations are realized through applying weights to inputs in particular ways. Weights are realized by setting up an optimization problem with a network architecture and training data. One could try compressing at any one of those levels, and of course they are all related, and in some sense if you know the earlier layer of abstraction you know the later one. But in another sense, they are fundamentally different, in exactly how quickly you can retrieve the specific piece of information, in this case the one we are interested in - which is the behavior. If I give you the training data, the network architecture, and the optimization algorithm, it still takes a lot of work to retrieve the behavior.
Thus, the story you gave about how accessibility matters also explains layers of abstraction, and how they relate to understanding.
Another example of this is a dynamical system. The differential equation governing it is quite compact: $\dot{x}=f(x)$. But the set of possible trajectories can be quite complicated to describe, and to get them one has to essentially do all the annoying work of integrating the equation! Note that this has implications for compositionality of the systems: While one can compose two differential equations by e.g. adding in some cross term, the behaviors (read: trajectores) of the composite system do not compose! and so one is forced to integrate a new system from scratch!
Now, if we want to understand the behavior of the dynamical system, what should we be trying to compress? How would our understanding look different if we compress the governing equations vs. the trajectories?
Yes, I'm thinking of that line of work. I actually think the first few paragraphs of this paper does a better job of getting the vibes I want (and I should emphasize these are vibes that I have, not any kind of formal understanding). So here's my try at a cached explanation of the concept of amortized inference I'm trying to evoke:
A lot of problems are really hard, and the algorithmic/reasoning path from the question to the answer are many steps. But it seems that in some cases humans are much faster than that (perhaps by admitting some error, but even so, they are both fast and quite good at the task). The idea is that in these settings a human brain is performing amortized inference - because they've seen similar examples of the input/output relation of the task before, they can use that direct mapping as a kind of bootstrap for the new task at hand, saving a lot of inference time.
Now that i've typed that out it feels maybe similar to your stuff about heuristics?
Big caveat here: it's quite possible I'm misunderstanding amortized inference (maybe @jessicata can help here?), as well as reaching with the connection to your work.
I've been trying to get my head around how to theoretically think about scaling test time compute, CoT, reasoning, etc. One frame that keeps on popping into my head is that these methods are a type of un-amortization.
In a more standard inference amortization setup one would e.g. train directly on question/answer pairs without the explicit reasoning path between the question and answer. In that way we pay an up-front cost during training to learn a "shortcut" between question and answers, and then we can use that pre-paid shortcut during inference. And we call that amortized inference.
In the current techniques for using test time compute we do the opposite - we pay costs during inference in order to explicitly capture the path between question and answer.
Uncertainties and things I would like to see:
Excited to read what you share!
Some personal reflections on the last year, and some thoughts for next:
Happy New Year everyone!
I suppose it depends on what one wants to do with their "understanding" of the system? Here's one AI safety case I worry about: if we (humans) don’t understand the lower-level ontology that gives rise to the phenomenon that we are more directly interested in (in this case I think thats something like an AI systems behavior/internal “mental” states - your "structurally what", if I'm understanding correctly, which to be honest I'm not very confident I am), then a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.
I very much agree that structurally what matters a lot, but that seems like half the battle to me.
I think I disagree, or need some clarification. As an example, the phenomenon in question is that the physical features of children look more or less like combinations of the parents features. Is the right kind of abstraction a taxonomy and theory of physical features at the level of nose-shapes and eyebrow thickness? Or is it at the low-level ontology of molecules and genes, or is it in the understanding of how those levels relate to eachother?
Or is that not a good analogy?
Thanks. I really like this task!
It's hard for me to interpret these results without some indication of how good these networks actually are at the task though. E.g. it is possible that even though a network could solve a length=N task once out of however many attempts you made, that it just got lucky, or is running some other heuristic that just happens to work for that one time. I understand why you were interested in how things scale with length of problem given your interest in recurrence and processing depth. But would it be hard to make a plot where x axis is length of problem, and y axis is accuracy or loss?
Thanks, this is helpful. I'm still a bit unclear about how to use the word/concept "amortized inference" correctly. Is the first example you gave, of training an AI model on (query, well-thought guess), an example of amortized inference, relative to training on (query, a bunch of reasoning + well-thought out guess)?