Or: Towards Bayesian Natural Language Semantics In Terms Of Interoperable Mental Content
Or: Towards a Theory of Interoperable Semantics
You know how natural language “semantics” as studied in e.g. linguistics is kinda bullshit? Like, there’s some fine math there, it just ignores most of the thing which people intuitively mean by “semantics”.
When I think about what natural language “semantics” means, intuitively, the core picture in my head is:
- I hear/read some words, and my brain translates those words into some kind of internal mental content.
- The mental content in my head somehow “matches” the mental content typically evoked in other peoples’ heads by the same words, thereby allowing us to communicate at all; the mental content is “interoperable” in some sense.
That interoperable mental content is “the semantics of” the words. That’s the stuff we’re going to try to model.
The main goal of this post is to convey what it might look like to “model semantics for real”, mathematically, within a Bayesian framework.
But Why Though?
There’s lots of reasons to want a real model of semantics, but here’s the reason we expect readers here to find most compelling:
The central challenge of ML interpretability is to faithfully and robustly translate the internal concepts of neural nets into human concepts (or vice versa). But today, we don’t have a precise understanding of what “human concepts” are. Semantics gives us an angle on that question: it’s centrally about what kind of mental content (i.e. concepts) can be interoperable (i.e. translatable) across minds.
Later in this post, we give a toy model for the semantics of nouns and verbs of rigid body objects. If that model were basically correct, it would give us a damn strong starting point on what to look for inside nets if we want to check whether they’re using the concept of a teacup or free-fall or free-falling teacups. This potentially gets us much of the way to calculating quantitative bounds on how well the net’s internal concepts match humans’, under conceptually simple (though substantive) mathematical assumptions.
Then compare that to today: Today, when working on interpretability, we’re throwing darts in the dark, don’t really understand what we’re aiming for, and it’s not clear when the darts hit something or what, exactly, they’ve hit. We can do better.
Overview
In the first section, we will establish the two central challenges of the problem we call Interoperable Semantics. The first is to characterize the stuff within a Bayesian world model (i.e. mental content) to which natural-language statements resolve; that’s the “semantics” part of the problem. The second aim is to characterize when, how, and to what extent two separate models can come to agree on the mental content to which natural language resolves, despite their respective mental content living in two different minds; that’s the “interoperability” part of the problem.
After establishing the goals of Interoperable Semantics, we give a first toy model of interoperable semantics based on the “words point to clusters in thingspace” mental model. As a concrete example, we quantify the model’s approximation errors under an off-the-shelf gaussian clustering algorithm on a small-but-real dataset. This example emphasizes the sort of theorems we want as part of the Interoperable Semantics project, and the sorts of tools which might be used to prove those theorems. However, the example is very toy.
Our second toy model sketch illustrates how to construct higher level Interoperable Semantics models using the same tools from the first model. This one is marginally less toy; it gives a simple semantic model for rigid body nouns and their verbs. However, this second model is more handwavy and has some big gaps; its purpose is more illustration than rigor.
Finally, we have a call to action: we’re writing this up in the hopes that other people (maybe you!) can build useful Interoperable Semantics models and advance our understanding. There’s lots of potential places to contribute.
What’s The Problem?
Central Problem 1: How To Model The Magic Box?
So we have two agents, Alice and Carol. (Yes, Bob is also hanging around, we’ll get to him shortly). We’re in a Bayesian framework, so they each have a probabilistic world model. Carol is telling a story and says “... and then the fox jumped!”. Alice hears these words, and updates her internal world model to include the fox jumping.[1]
Somewhere in the middle there, some magic box in Alice’s brain needed to turn the words “and then the fox jumped” into some stuff which she can condition on - e.g. things like , where is some random variable in Alice's world model. That magic box is the semantics box: words go in, mental content comes out. The first central problem of Interoperable Semantics, in a Bayesian frame, is how to model the box. The semantics of ‘and then the fox jumped’ is the stuff which Alice conditions on, like : i.e. the Bayesian “mental content” into which natural language is translated by the box.
Subproblem: What’s The Range Of Box Outputs?
A full Bayesian model of semantics would probably involve a full human-like world model. After all, the model would need to tell us exactly which interoperable-variable values are spit out by a magic semantics box for any natural language text.
A shorter-term intermediate question is: what even is the input set (i.e. domain) and output set (i.e. range) of the semantics box? Its inputs are natural language, but what about the outputs?
We can already give a partial answer: because we’re working in a Bayesian frame, the outputs of the semantics box need to be assignments of values to random variables in the world model, like . (Note that functions of random variables and data structures over random variables are themselves random variables, so the semantics box could e.g. output assignments of values to a whole list of functions of random variables.)
… but that’s only a partial answer, because the set of semantic outputs must be far smaller than the set of assignments of values to random variables. We must think of the children.
SubSubProblem: What Can Children Attach Words To?
Key observation: children need only a handful of examples - sometimes even just one - in order to basically-correctly learn the meaning of a new word (or short phrase[2].) So there can’t be that many possible semantic targets for a word.
For instance, we noted earlier that any function of random variables in Alice’s model is itself a random variable in Alice’s model - e.g. if is a (real-valued) random variable, then so is . Now imagine that a child’s model includes their visual input as a 1 megabit random variable, and any function of that visual input is a candidate semantic target. Well, there are functions of 1 million bits, so the child would need bits in order to pick out one of them. In other words: the number of examples that hypothetical child would need in order to learn a new word would be quite dramatically larger than the number of atoms in the known universe.
Takeaway: there can’t be that many possible semantic targets for words. The set of semantic targets for words (in humans) is at least exponentially smaller than the set of random variables in an agent’s world model.
That’s the main problem of interest to us, for purposes of this post: what’s the set of possible semantic targets for a word?[3]
Summary So Far
Roughly speaking, we want to characterize the set of things which children can attach words to, in a Bayesian frame. Put differently, we want to characterize the set of possible semantic targets for words. Some constraints on possible answers:
- Since we’re in a Bayesian frame, any semantic targets should be assignments of values (e.g. ) to random variables in an agent’s model. (Note that this includes functions of random variables in the agent’s model, and data structures of random variables in the agent’s model.)
- … but the set of possible semantic targets for words must be exponentially smaller than the full set of possible assignments of values to random variables.
So we know the set we’re looking for is a relatively-small subset of the whole set of assignments to random variables, but we haven’t said much yet about which subset it is. In order to narrow down the possibilities, we’ll need (at least) one more criterion… which brings us to Central Problem 2.
Central Problem 2: How Do Alice and Bob “Agree” On Semantics?
Let’s bring Bob into the picture.
Bob is also listening to Carol’s story. He also hears “... and then the fox jumped!”, and a magic semantics box in his brain also takes in those words and spits out some stuff which Bob can condition on - e.g. things like , where is some random variable in Bob’s world model.
Now for the key observation which will constrain our semantic model: in day-to-day practice, Alice and Bob mostly agree, in some sense, on what sentences mean. Otherwise, language couldn’t work at all.
Notice that in our picture so far, the output of Alice’s semantics-box consists of values of some random variables in Alice’s model, and the output of Bob’s semantics-box consists of values of some random variables in Bob’s model. With that picture in mind, it’s unclear what it would even mean for Alice and Bob to “agree” on the semantics of sentences. For instance, imagine that Alice and Bob are both Solomonoff inductors with a special module for natural language. They both find some shortest program to model the world, but the programs they find may not be exactly identical; maybe Alice and Bob are running slightly different Turing machines, so their shortest programs have somewhat different functions and variables internally. Their semantics-boxes then output values of variables in those programs. If those are totally different programs, what does it even mean for Alice and Bob to “agree” on the values of variables in these totally different programs?
The second central problem of Interoperable Semantics is to account for Alice and Bob’s agreement. In the Bayesian frame, this means that we should be able to establish some kind of (approximate) equivalence between at least some of the variables in the two agents’ world models, and the outputs of the magic semantics box should only involve those variables for which we can establish equivalence.
In other words: not only must our model express the semantics of language in terms of mental content (i.e. values of random variables, in a Bayesian frame), it must express the semantics of language in terms of interoperable mental content - mental content which has some kind of (approximate) equivalence to other agents’ mental content.
Summary: Interoperable Semantics
In technical terms: we’d ultimately like a model of (interoperable mental content)-valued semantics, for Bayesian agents. The immediate challenge which David and I call Interoperable Semantics is to figure out a class of random variables suitable for the “interoperable mental content” in such a model, especially for individual words. Specifically, we want a class of random variables which
- is rich enough to reasonably represent the semantics of most words used day-to-day in natural language, but
- small enough for each (word -> semantics) mapping to be plausibly learnable with only a few examples, and
- allows us to establish some kind of equivalence of variables across Bayesian agents in the same environment.
Beyond that, we of course want our class of random variables to be reasonably general and cognitively plausible as an approximation - e.g. we shouldn’t assume some specific parametric form.
At this point, we’re not even necessarily looking for “the right” class of random variables, just any class which satisfies the above criteria and seems approximately plausible.
The rest of this post will walk through a couple initial stabs at the problem. They’re pretty toy, but will hopefully illustrate what a solution could even look like in principle, and what sort of theorems might be involved.
First (Toy) Model: Clustering + Naturality
As a conceptual starting point, let’s assume that words point to clusters in thingspace in some useful sense. As a semantic model, the “clusters in thingspace” conceptual picture is both underspecified and nowhere near rich enough to support most semantics - or even the semantics of any single everyday word. But it will serve well as a toy model to illustrate some foundational constraints and theorems involved in Interoperable Semantics. Later on, we’ll use the “clusters in thingspace” model as a building block for a richer class of variables.
With that in mind: suppose Alice runs a bog-standard Bayesian clustering algorithm on some data. (Concrete example: below we’ll use an off-the-shelf expectation-maximization algorithm for a mixture of Gaussians with diagonal covariance, on ye olde iris dataset.) Out pop some latents: estimated cluster labels and cluster parameters. Then Bob runs his Bayesian clustering algorithm on the data - but maybe he runs a different Bayesian clustering algorithm than Alice, or maybe he runs the same algorithm with different initial conditions, or maybe he uses a different random subset of the data for training. (In the example below, it’s the ‘different-initial-conditions’ option.)
Insofar as words point to clusters in thingspace and the project of Interoperable Semantics is possible at all, we should be able to establish some sort of equivalence between the clusters found by Alice and Bob, at least under some reasonably-permissive conditions which Alice and Bob can verify. In other words, we should be able to take the cluster labels and/or parameters as the “set of random variables” representing interoperable mental content.
Equivalence Via Naturality
We want a set of random variables which allows for some kind of equivalence between variables in Alice’s model and variables in Bob’s model. For that, we’ll use the machinery of natural latents.
Here are the preconditions we need:
- Predictive Agreement: once Alice and Bob have both trained their clustering algorithms on the data, they must agree on the distribution of new data points. (This assumption does not require that they train on the same data, or that they use the same clusters to model the distribution of new data points.)
- Mediation: under both Alice and Bob’s trained models, (some subset of) the “features” must be independent within each cluster, i.e. the cluster label mediates between features.
- Redundancy: under both Alice and Bob’s trained models, the cluster label must be estimable to high confidence while ignoring any one feature.
The second two conditions (mediation and redundancy) allow for approximation via KL-divergence; see Natural Latents: The Math for the details. Below, we’ll calculate the relevant approximation errors for an example system.[4] We do not currently know how to handle approximation gracefully for the first condition; the first thing we tried didn’t work for that part.
So long as those conditions hold (approximately, where relevant), the cluster label is a(n approximate) natural latent, and therefore Alice’s cluster label is (approximately) isomorphic to Bob’s cluster label. (Quantitatively: each agent’s cluster label has bounded entropy given the other agent’s cluster label, with the bound going to zero linearly as the approximation error for the preconditions goes to zero.)
So, when the preconditions hold, we can use assignments of values to the cluster label (like e.g. “cluster_label = 2”) as semantic targets for words, and have an equivalence between the mental content which Alice and Bob assign to the relevant words. Or, in English: words can point to clusters.
A Quick Empirical Check
In order for cluster equivalence to apply, we needed three preconditions:
- Predictive Agreement: Alice and Bob must agree on the distribution of new data points.
- Mediation: under both Alice and Bob’s trained models, (some subset of) the “features” must be independent within each cluster, i.e. the cluster label mediates between features.
- Redundancy: under both Alice and Bob’s trained models, the cluster label must be estimable to high confidence while ignoring any one feature.
We don’t yet know how to usefully quantify approximation error for the first precondition. But we can quantify the approximation error for the mediation and redundancy conditions under a small-but-real model. So let’s try that for a simple clustering model: mixture of gaussians, with diagonal covariance, trained on ye olde iris dataset.
The iris dataset contains roughly 150 points in a 4 dimensional space of flower-attributes[5]. For the mixture of gaussians clustering, we David used the scikit implementation. Github repo here, which consists mainly of the methods to estimate the approximation errors for the mediation and redundancy conditions.
How do we estimate those approximation errors? (Notation: is one sample, is the cluster label for that sample.)
- Mediation: under this model, mediation holds exactly. The covariance is diagonal, so (under the model) the features are exactly independent within each cluster. Approximation error is 0.
- Redundancy: as somewhat-sneakily stated, our redundancy condition includes two pieces
- Actual Redundancy Condition: How many bits of information about are lost when dropping variable : , which can be rewritten as . Since the number of values of is the number of clusters, that last expression is easy to estimate by sampling a bunch of -values and averaging the for each of them.
- Determinism: Entropy of given [6]. Again, we sample a bunch of -values, and average the entropy of conditional on each -value.
Here are the redundancy approximation errors, in bits, under dropping each of the four components of , after two different training runs of the model:
Redundancy Error (bits) | Drop (0,) | Drop (1,) | Drop (2,) | Drop (3,) |
First run (“Alice”) | 0.0211 | 0.011 | 0.048 | 0.089 |
Second run (“Bob”) | 0.034 | 0.004 | 0.031 | 0.177 |
Recall that we use these approximation errors to bound the approximation error of isomorphism between the two agents’ cluster labels. Specifically, if we track the ’s through the proofs in Natural Latents: The Math, we’ll find that:
- Alice’s natural latent is a deterministic function of Bob’s to within (sum of Alice’s redundancy errors) + (entropy of Alice’s label given ) + (entropy of Bob’s label given )
- Bob’s natural latent is a deterministic function of Alice’s to within (sum of Bob’s redundancy errors) + (entropy of Alice’s label given ) + (entropy of Bob’s label given )
(Note: the proofs as-written in Natural Latents: The Math assume that the redundancy error for each component of is the same; dropping that assumption is straightforward and turns into ; thus the sum of redundancy errors.) In the two runs of our clustering algorithm above, we find:
- Sum of redundancy errors in the first run is 0.168 bits
- Sum of redundancy errors in the second run is 0.246 bits
- Entropy of label given in the first run is 0.099 bits
- Entropy of label given in the second run is 0.099 bits
So, ignoring the differing distribution over new data points under the two models, we should find:
- The first model’s cluster label is a deterministic function of the second to within 0.366 bits (i.e. entropy of first label given second is at most 0.366 bits)
- The second model’s cluster label is a deterministic function of the first to within 0.444 bits.
Though the differing distribution over new data points is still totally unaccounted-for, we can estimate those conditional entropies by averaging over the data, and we actually find:
- Entropy of first model’s cluster label given second model’s: 0.222 bits
- Entropy of second model’s cluster label given first model’s, is also: 0.222 bits
The entropies of the cluster labels under the two models are 1.570 and 1.571 bits, respectively, so indeed each model’s cluster label is approximately a deterministic function of the other, to within reasonable error (~0.22 bits of entropy out of ~1.57 bits).
Strengths and Shortcomings of This Toy Model
First, let’s walk through our stated goals for Interoperable Semantics. We want a class of (assignments of values to) random variables which:
- is rich enough to reasonably represent the semantics of most words used day-to-day in natural language, but
- small enough for each (word -> semantics) mapping to be plausibly learnable with only a few examples, and
- allows us to establish some kind of equivalence of variables across Bayesian agents in the same environment.
When the preconditions hold, how well does the cluster label variable fit these requirements?
We already discussed the equivalence requirement; that one works (as demonstrated numerically above), insofar as the preconditions hold to within reasonable approximation. The main weakness is that we don’t yet know how to handle approximation in the requirement that our two agents have the same distribution over new data points.
Can the (word -> semantics) mapping plausibly be learned with only a few examples? Yes! Since each agent already calculated the clusters from the data (much like a child), all that’s left to learn is which cluster gets attached to which word. So long as the clusters don’t overlap much (which turns out to be implied by the mediation and redundancy conditions), that’s easy: we just need ~1 example from each cluster with the corresponding word attached to it.
Is the cluster label rich enough to reasonably represent the semantics of most words used day-to-day in natural language? Lol no. What’s missing?
Well, we implicitly assumed that the two agents are clustering data in the same (i.e. isomorphic) space, with the same (i.e. isomorphic) choice of features (axes.) In order to “do semantics right”, we’d need to recurse: find some choice of features which comes with some kind of equivalence between the two agents.
Would equivalence between choice of features require yet another assumption of equivalence, on which we also need to recurse? Probably. Where does it ground out? Usually, I (John) am willing to assume that agents converge on a shared notion of spacetime locality, i.e. what stuff is “nearby” other stuff in the giant causal web of our universe. So insofar as the equivalence grounds out in a shared notion of which variables are local in space and time, I’m happy with that. Our second toy model will ground out at that level, though admittedly with some big gaps in the middle.
Aside: What Does “Grounding In Spacetime Locality” Mean?
The sort of machinery we’re using (i..e natural latents) needs to start from some random variable which is broken into components . The machinery of natural latents doesn’t care about how each component is represented; replacing with something isomorphic to doesn’t change the natural latents at all. But it does matter which components we break into.
I’m generally willing to assume that different agents converge on a shared notion of spacetime locality, i.e. which stuff is “near” which other stuff in space and time. With that assumption, we can break any random variable into spacetime-local components - i.e. each component represents the state of the world at one point in space and time. Thanks to the assumed convergent notion of spacetime locality, different agents agree on that decomposition into components (though they may have different representations of each component).
So when we talk about “grounding in spacetime locality”, we mean that our argument for equivalence of random variables between the two agents should start, at the lowest level, from the two agents having equivalent notions of how to break up their variables into components each representing state of the world at a particular place and time.
Second (Toy) Model Sketch: Rigid Body Objects
Our first model was very toy, but we walked through the math relatively thoroughly (including highlighting the “gaps” which our theorems don’t yet cover, like divergence of the agents’ predictive distributions). In this section we’ll be less rigorous, but aim to sketch a more ambitious model. In particular, we’ll sketch an Interoperable Semantic model for words referring to rigid-body objects - think “teacup” or “pebble”. We still don’t expect this model to be fully correct, even for rigid-body objects, but it will illustrate how to build higher-level semantic structures using building blocks similar to the clustering model.
First we’ll sketch out the model and argument for interoperability (i.e. naturality of the latents) for just one rigid-body object - e.g. a teacup. We’ll see that the model naturally involves both a “geometry” and a “trajectory” of the object. Then, we’ll introduce clusters of object-geometries as a model of (rigid body) noun semantics, and clusters of object-trajectories as a model of verb semantics.
Note that there will be lots of handwaving and some outright gaps in the arguments in this model sketch. We’re not aiming for rigor here, or even trying to be very convincing; we’re just illustrating how one might build up higher-level semantic structures.
The Teacup
Imagine a simple relatively low-level simulation of a teacup moving around. The teacup is represented as a bunch of particles. In terms of data structures, the code tracks the position and orientation of each particle at each time.
There’s a lot of redundancy in that representation; an agent can compress it a lot while still maintaining reasonable accuracy. For instance, if we approximate away vibrations/bending (i.e. a rigid body approximation), then we can represent the whole set of particle-trajectories using only:
- The trajectory of one reference particle’s position and orientation
- The initial position and orientation of each particle, relative to the reference particle
We’ll call these two pieces “trajectory” and “geometry”. Because the teacup’s shape stays the same, all the particles are always in the same relative positions, so this lower-dimensional factorization allows an agent to compress the full set of particle-trajectories.
Can we establish naturality (i.e. mediation + redundancy) for the geometry and trajectory, much like we did for clusters earlier?
Here’s the argument sketch for the geometry:
- Under the rigid body approximation, the geometry is conserved over time. So, if we take to be the state of all the particles at time , the geometry is redundantly represented over ; it should approximately satisfy the redundancy condition.
- If the teacup moves around randomly enough for long enough, the geometry will be the only information about the cup at an early time which is relevant to much later times. In that case, the geometry mediates between and for times and sufficiently far apart; it approximately satisfies the mediation condition.
So, intuitively, the geometry should be natural over the time-slices of our simulation.
Next, let’s sketch the argument for the trajectory:
- Under the rigid body approximation, if I know the trajectory of any one particle, that tells me the trajectory of the whole teacup, since the particles always maintain the same relative positions and orientations. So, the trajectory should approximately satisfy the redundancy condition over individual particle-trajectories.
- If I know the trajectory of the teacup overall, then under the rigid body approximation I can calculate the trajectory of any one particle, so the particle trajectories are technically all independent given the teacup trajectory. So, the trajectory should approximately satisfy the mediation condition over individual particle-trajectories.
That means the trajectory should be natural over individual particle trajectories.
Now, there’s definitely some subtlety here. For instance: our argument for naturality of the trajectory implicitly assumes that the geometry was also known; otherwise I don’t know where the reference particle is relative to the particle of interest. We could tweak it a little to avoid that assumption, or we could just accept that the trajectory is natural conditional on the geometry. That’s the sort of detail which would need to be nailed down in order to turn this whole thing into a proper Interoperable Semantics model.
… but we’re not aiming to be that careful in this section, so instead we’ll just imagine that there’s some way to fill in all the math such that it works.
Assuming there’s some way to make the math work behind the hand-wavy descriptions above, what would that tell us?
We get a class of random variables in our low-level simulation: “geometry” (natural latent over time-slices), and “trajectory” (natural latent over individual particle-trajectories). Let’s say that the simulation is Alice’s model. Then for the teacup’s geometry, naturality says:
- If two Alice and Bob both make the same predictions about the low-level particles constituting the teacup…
- and those predictions match a rigid body approximation reasonably well, but have lots of uncertainty over long-term motion…
- and there’s some variable in Alice’s model which both mediates between the particle-states at far-apart times and can be reconstructed from particle-states at any one time…
- and there’s some variable in Bob’s model which also both mediates between the particle-states at far-apart times and can be reconstructed from particle-states at any one time…
- … then Alice’s variable and Bob’s variable give the same information about the particles. Furthermore, if the two variables’ values can both be approximated reasonably well from the full particle-trajectories, then they’re approximately isomorphic.
In other words: we get an argument for some kind of equivalence between “geometry” in Alice’s model and “geometry” in Bob’s model, insofar as their models match predictively. Similarly, we get an argument for some kind of equivalence between “trajectory” in Alice’s model and “trajectory” in Bob’s model.
So:
- We have a class of random variables in Alice’s model and Bob’s model…
- … which isn’t too big (roughly speaking, natural latents are approximately unique, so there’s approximately just the two variables)
- … and for which we have some kind of equivalence between Alice and Bob’s variables.
What we don’t have, yet, is enough expressivity for realistic semantics. Conceptually, we have a class of interoperable random variables which includes e.g. any single instance of a teacup, and the trajectory of that teacup. But our class of interoperable random variables doesn’t contain a single variable representing the whole category of teacups (or any other category of rigid body objects), and it’s that category which the word “teacup” itself intuitively points to.
So let’s add a bit more to the model.
Geometry and Trajectory Clusters
Now we imagine that Alice’s model involves lots of little local models running simulations like the teacup, for different rigid-body objects around her. So there’s a whole bunch of geometries and trajectories which are natural over different chunks of the world.
Perhaps Alice notices that there’s some cluster-structure to the geometries, and some cluster-structure to the trajectories. For instance, perhaps there’s one cluster of similarly-shaped rigid-body geometries which one might call “teacups”. Perhaps there’s also one cluster of similar trajectories which one might call “free fall”. Perhaps this particular geometry-cluster and trajectory-cluster are particularly fun when combined.
Hopefully you can guess the next move: we have clusters, so let’s apply the naturality conditions for clusters from the first toy model. For both the geometry-clusters and the trajectory-clusters, we ask:
- Are (some subset of) the “features” approximately independent within each single cluster?
- Can the cluster-label of a point be estimated to high confidence ignoring any one of (the same subset of) the “features”?
If yes to both of these, then we have naturality, just like in the first toy model. Just like the first toy model, we then have an argument for approximate equivalence between Alice and Bob’s clusters, assuming they both satisfy the naturality conditions and make the same predictions.
(Note that we’ve said nothing at all about what the “features” are; that’s one of the outright gaps which we’re handwaving past.)
With all that machinery in place, we make the guess:
- (rigid body) nouns typically refer to geometry clusters; example: “teacup”
- (rigid body) verbs typically refer to trajectory clusters; example: “free fall”
With that, we have (hand-wavily)
- a class of random variables in each agent’s world model…
- which we expect to typically be small enough for each (word -> semantics) mapping to be learnable with only a handful of examples (i.e. a word plus one data point in a cluster is enough to label the cluster with the word)...
- and we have a story for equivalence of these random variables across two agents’ models…
- and we have an intuitive story on how this class of random variables is rich enough to capture the semantics of many typical rigid body nouns and verbs, like “teacup” or “free fall”.
Modulo handwaving and (admittedly large) gaps, we have hit all the core requirements of an Interoperable Semantics model.
Strengths and Shortcomings of This Toy Model
The first big strength of this toy model is that it starts from spacetime locality: the lowest-level natural latents are over time-slices of a simulation and trajectories of particles. (The trajectory part is not quite fully grounded, since “particles” sure are an ontological choice, but you could imagine converting the formulation from Lagrangian to Eulerian; that’s an already-reasonably-common move in mathematical modeling. In the Eulerian formulation, the ontological choices would all be grounded in spacetime locality.)
The second big strength is expressivity. We have a clear notion of individual (rigid body) objects and their geometries and trajectories, nouns and verbs point to clusters of geometries and trajectories, this all intuitively matches our day-to-day rigid-body models relatively nicely.
The shortcomings are many.
First, we reused the clustering machinery from the first toy model, so all the shortcomings of that model (other than limited expressivity) are inherited. Notably, that includes the “what features?” question. We did ground geometries in spacetime locality and trajectories in particles (which are a relatively easy step up from spacetime locality), so the “what features” question is handled for geometries and trajectories. But then we cluster geometries, and cluster trajectories. What are the “features” of a rigid body object’s geometry, for clustering purposes? What are the “features” of a rigid body object’s trajectory, for clustering purposes? We didn’t answer either of those questions. The underdetermination of feature choice at the clustering stage is probably the biggest “gap” which we’ve ignored in this model.
Second, when defining geometries and trajectories, note that we defined the geometry to be a random variable which “both mediates between the particle-states at far-apart times and can be reconstructed from particle-states at any one time”. That works fine if there’s only one rigid body object, but if there’s multiple rigid body objects in the same environment, then the “geometry” under that definition would be a single variable summarizing the geometry of all the rigid body objects. That’s not what we want; we want distinct variables for the geometry (and trajectory) of each object. So the model needs to be tweaked to handle multiple rigid bodies in the same environment.
Third, obviously we were very handwavy and didn’t prove anything in this section.
Fourth, obviously the model is still quite limited in expressivity. It doesn’t handle adjectives or adverbs or non-rigid-body nouns or …
Summary and Call To Action
The problem we call Interoperable Semantics is to find some class of random variables (in a Bayesian agent’s world model) which
- is rich enough to reasonably represent the semantics of most words used day-to-day in natural language, but
- small enough for each (word -> semantics) mapping to be plausibly learnable with only a few examples, and
- allows us to establish some kind of equivalence of variables across Bayesian agents in the same environment.
Beyond that, we of course want our class of random variables to be reasonably general and cognitively plausible as an approximation - e.g. we shouldn’t assume some specific parametric form.
At this point, we’re not even necessarily looking for “the right” class of random variables, just any class which satisfies the above criteria and seems approximately plausible.
That, we claim, is roughly what it looks like to “do semantics for real” - or at least to start the project.
Call To Action
We’re writing up this post now because it’s maybe, just barely, legible enough that other people could pick up the project and make useful progress on it. There’s lots of potential entry points:
- Extend the methods used in our toy models to handle more kinds of words and phrases:
- other kinds of nouns/verbs
- adjectives and adverbs
- subject/object constructions
- etc.
- Fill some of the gaps in the arguments
- Find some other arguments to establish equivalence across agents
- Take the toy models from this post, or some other Interoperable Semantics models, and go look for the relevant structures in real models and/or datasets (either small scale or large scale)
- Whatever other crazy stuff you come up with!
We think this sort of project, if it goes well, could pretty dramatically accelerate AI interpretability, and probably advance humanity’s understanding of lots of other things as well. It would give a substantive, quantitative, and non-ad-hoc idea of what stuff interpretability researchers should look for. Rather than just shooting in the dark, it would provide some actual quantitative models to test.
Thank you to Garret Baker, Jeremy Gillen, and Alexander Gietelink-Oldenziel for feedback on a draft of this post.
- ^
In this post, we’ll ignore Gricean implicature; our agents just take everything literally. Justification for ignoring it: first, the cluster-based model in this post is nowhere near the level of sophistication where lack of Gricean implicature is the biggest problem. Second, when it does come time to handle Gricean implicature, we do not expect that the high-level framework used here - i.e. Bayesian agents, isomorphism between latents - will have any fundamental trouble with it.
- ^
When we say “word” or “short phrase”, what we really mean is “atom of natural language.”
- ^
A full characterization of interoperable mental content / semantics requires specifying the possible mappings of larger constructions, like sentences, into interoperable mental content, not just words. But once we characterize the mental content which individual words can map to (i.e. their ‘semantic targets’,) we are hopeful that the mental content mapped to by larger constructions (e.g. sentences,) will usually be straightforwardly constructable from those smaller pieces. So if we can characterize “what children can attach words to”, then we’d probably be most of the way to characterizing the whole range of outputs of the magic semantics box.
Notably, going from words to sentences and larger constructs is the focus of the existing academic field of “semantics”. What linguists call “semantics” is mostly focused on constructing semantics of sentences and larger constructs from the semantics of individual words (“atoms”). From their standpoint, this post is mostly about characterizing the set of semantic values of atoms, assuming Bayesian agents.
- ^
For those who read Natural Latents: The Math before this post, note that we added an addendum shortly before this post went up. It contains a minor-but-load-bearing step for establishing approximate isomorphism between two agents’ natural latents.
- ^
Sepal length, sepal width, petal length, and petal width in case you were wondering, presumably collected from a survey of actual flowers last century.
- ^
Remember that addendum we mentioned in an earlier footnote? The determinism condition is for that part.
Yeah, this is an open problem that's on my radar. I currently have two main potential threads on it.
First thread: treat each bit in the representation of quantities as distinct random variables, so that e.g. the higher-order and lower-order bits are separate. Then presumably there will often be good approximate natural latents (and higher-level abstract structures) over the higher-order bits, moreso than the lower-order bits. I would say this is the most obvious starting point, but it also has a major drawback: "bits" of a binary number representation are an extremely artificial ontological choice for purposes of this problem. I'd strongly prefer an approach in which magnitudes drop out more naturally.
Thus the second thread: maxent. It continues to seem like there's probably a natural way to view natural latents in a maxent form, which would involve numerically-valued natural "features" that get added together. That would provide a much less artificial notion of magnitude. However, it requires figuring out the maxent thing for natural latents, which I've tried and failed at several times now (though with progress each time).