Zoom Out: Distributions in Semantic Spaces
(This article is edited and expanded on from a comment I made to someone in the newly starting BAIF slack community. Thanks for the inspiration 🙏) Introduction In this article I present an alternative paradigm for Mechanistic Interpretability (MI). This paradigm may turn out to be better or worse, or naturally combine with the standard paradigm I often see implicitly extended from Chis Olah's "Zoom-In". I've talked about this concept before, in various places. Someday I may collect them and try to present a strong case including a survey of paradigms in MI literature. For now, here is a relatively short introduction to the concept assuming some familiarity with ML and MI. Afaik, Chris Olah originally introduced the concepts of "features" and "circuits" in "Zoom-In", as a suggestion for a direction for exploration, not as a certainty. It worked very well for thinking about things like "circle" and "texture" detectors, which I think are a natural, but incorrect way of understanding what is going on. New Mechanistic Interpretability Paradigm? I have been developing an alternate paradigm I'm not currently sure anyone else is talking about. It is now common to think of the collective inputs or outputs of network layers as vectors rather than individual signals. The concept which I am uncertain anyone is focusing on, is that each vector is representative of a semantic space in which distributions live. Input Space For example, in a cat-dog-labeling net, the input space is images and there are two distributions living in this space. The cat-distribution is all possible images that are of cats. We can make some claims about that distribution, such as the idea that it is continuous and connected. The same thing is true of the dog-distribution, but additionally, the dog-distribution may be connected to the cat distribution in several spaces containing the set of images that are ambiguous, maybe a cat, maybe a dog. There is also implicitly a distribution of images th
Oh! That's where it is. Thank you : )