Generalised models: imperfect morphisms and informational entropy

Stuart_Armstrong

I've defined generalised models $M$ as being given by $F$ , a set of features, and an (unnormalised) probability distribution $Q$ over $W = 2^{¯ ¯¯ ¯ F}$ , the set of possible worlds defined by the all values of those features.

To make these into a category, a morphism $r$ between $M_{0} = (F_{0}, Q_{0})$ and $M_{1} = (F_{1}, Q_{1})$ was defined to be a relation $r$ between $W_{0}$ and $W_{1}$ (ie a subset of $W_{0} \times W_{1}$ ), that obeyed certain conditions with respect to the $Q$ 's.

If $r$ obeys those conditions, we can construct the "underlying model" for the morphism, a generalised model $M_{r} = (F_{0} ⊔ F_{1}, Q_{r})$ , with $Q_{r}$ non-zero only on $r \in W_{0} \times W_{1}$ .

That underlying model result basically says:

There is an underlying reality $M_{r}$ of which $M_{0}$ and $M_{1}$ are different, consistent, facets.

That's well and good, but we also want to allow imperfect correspondences, where $Q_{0}$ or $Q_{1}$ (or both) might be known to be in error. After all, when we transitioned from Newtonian mechanics to relativity, it wasn't because there was an underlying reality that both were facets of. Instead, we realised that Newtonian mechanics and relativity were approximately though not perfectly equivalent in low-energy situations, and that when they diverged, relativity was overall more accurate.

We'd want to include these cases as generalised model, and measure how "imperfect" the morphism between them is, i.e. how much $Q_{0}$ and $Q_{1}$ diverge from being in perfect correspondence. We'll also look at how the $Q$ and the feature sets are related - how much information $Q$ caries relative to $F$ .

Imperfect morphisms

So we'll loosen the definition of "morphisms" so that the $Q_{0}$ and $Q_{1}$ need not correspond exactly to each other or an underlying reality. If we have a relation $r$ between $W_{0}$ and $W_{1}$ , here are different $Q$ -consistency requirements that we could put on $r$ to make it into a morphism (the previous requirement has been renamed " $Q$ -preserving", condition 5):

General binary relations $r$ ; no connection assumed between $r$ and the $Q$ s.
$Q$ -relational: for all $w_{0} \in W_{0}$ with $Q_{0} (w_{0}) > 0$ , there exists at least one $w_{1} \in W_{1}$ with both $Q_{1} (w_{1}) > 0$ and $(w_{0}, w_{1}) \in r$ .
$Q$ -functional: for all $w_{0} \in W_{0}$ with $Q_{0} (w_{0}) > 0$ , there exists a unique $w_{1} \in W_{1}$ with both $Q_{1} (w_{1}) > 0$ and $(w_{0}, w_{1}) \in r$ .
$Q$ -birelational: $r$ and $r^{- 1}$ are both $Q$ -relational.
$Q$ -preserving: the same conditions as presented here. For all $w_{0} \in W_{0}$ and $w_{1} \in W_{1}$ , $Q_{0} (w_{0}) \leq Q_{1} (r (w_{0}))$ and $Q_{1} (w_{1}) \leq Q_{0} (r^{- 1} (w_{1}))$ .
$Q$ -isomorphic: $r$ is $Q$ -preserving; both $r$ and $r^{- 1}$ are $Q$ -functional.

The general results about these morphisms classes are (proof found in this footnote^[1]):

All the above conditions are associative (hence define classes of morphisms for different categories with the same objects): if $r$ and $p$ are $Q$ -relational, $Q$ -functional, $Q$ -relational, $Q$ -birelational, $Q$ -preserving, $Q$ -isomorphic or disconnected from $Q$ entirely, then so is $p r = p \circ r$ .
Every morphism fulfils the conditions of the morphisms above it on the list, except that $Q$ -preserving and $Q$ -birelational need not imply $Q$ -functional.
If $r$ is $Q$ -isomorphic, then we can pair up each non-measure zero elements $w_{0} \in W_{0}$ and $w_{1} \in W_{0}$ so that $(w_{0}, w_{1}) \in r$ and $Q_{0} (w_{0}) = Q_{1} (w_{1})$ .

Examples of morphisms

Here are four relations:

A coarsening is when multiple worlds get related to a single world, thus losing details. A refinement is the opposite: a single world gets related to multiple worlds, thus adding more details. An inclusion adds more worlds to the set. Its opposite, a restriction, removes worlds.

In terms of morphisms, if we assume that all the worlds in those sets have non-zero probability, then coarsenings, refinements, and inclusions are all $Q$ -relational and $Q$ -functional. Restrictions are neither. Coarsenings and refinements are $Q$ -birelational, while inclusions and restrictions are not.

As for $Q$ -preserving, coarsenings are $Q$ -preserving if, for all $w_{1} \in W_{1}$ , the sum of $Q_{0} (w_{0})$ for all $(w_{0}, w_{1}) \in r$ , is equal to $Q_{1} (w_{1})$ . Similarly, refinements are $Q$ -preserving if, for all $w_{0} \in W_{0}$ , the sum of the $Q_{1} (w_{1})$ for all $(w_{0}, w_{1}) \in r$ , is equal $Q_{0} (w_{0})$ . None of these four relations are $Q$ -isomorphic (unless some of the worlds in the diagrams are of measure $0$ ).

What about Bayesian updates? If we start with $M_{0} = ({f} ⊔ F, Q_{0})$ and want to update on $f = c$ for some constant $c$ , then this corresponds to relating $M_{0}$ to $M_{1} = {F, Q_{1}}$ such that $(w_{0}, w_{1}) \in r$ if $w_{0} = (w_{1}, f = c)$ . We'll also require $Q_{0} (w_{0}) = Q_{1} (w_{1})$ on any such worlds - and assume that that defines $Q_{1}$ entirely (we're ignoring renormalisation here, since we don't assume that $Q_{0}$ and $Q_{1}$ have measure $1$ ).

This $r$ is clearly a restriction, and hence does not meet any of the $Q$ -consistency conditions. However, we can define Bayesian updates as relations $r$ such that $r^{- 1}$ is $Q$ -functional and injective. They do form a category when seen this way (since $(r q)^{- 1} = q^{- 1} r^{- 1}$ ).

Comparing the $Q$ 's

Given two generalised models $M_{0} = (F_{0}, Q_{0})$ and $M_{1} = (F_{1}, Q_{1})$ , with a relation $r$ between them, we'll now compare $Q_{0}$ and $Q_{1}$ . We'll do this by defining a length operator $L$ that gives the "length" of $r$ , which is a measure of the divergence between $Q_{0}$ and $Q_{1}$ "along" the relation $r$ .

Let $(Q_{0}^{'}, Q_{1}^{'})$ be a pair of probability distributions on $W_{1}$ and $W_{1}$ , respectively. We'll say the pair is $r$ -compatible if $r$ is a $Q$ -preserving morphism between $(F_{0}, Q_{0}^{'})$ and $(F_{1}, Q_{1}^{'})$ .

Since $Q_{i}$ and $Q_{i}^{'}$ are distributions over the same $W_{i}$ , we can compare their $l_{1}$ norm, defined as: $| | Q_{i} - Q_{i}^{'} | |_{1} = \sum_{w_{i} \in W_{i}} | Q_{i} (w_{i}) - Q_{i}^{'} (w_{i}) |$ . Then define $L (r, Q_{0}^{'}, Q_{1}^{'})$ as the sum of the $l_{1}$ -distances to $Q_{0}$ and $Q_{1}$ :

$L (r, Q_{0}^{'}, Q_{1}^{'}) = | | Q_{0} - Q_{0}^{'} | |_{1} + | | Q_{1} - Q_{1}^{'} | |_{1} .$

We'll define $L (r)$ to be the minimum^[2] value of this norm among all the $r$ -compatible $(Q_{0}^{'}, Q_{1}^{'})$ . Because of its definition, it's immediately obvious that $L (r) = L (r^{- 1})$ . The other key properties - proved here^[3] - are that:

If $r$ is $Q$ -relational, then there exists a $Q_{1}^{'}$ such that $L (r) = L (r, Q_{0}, Q_{1}^{'})$ ; ie we can use $Q_{0}$ itself rather than finding a $Q_{0}^{'}$ .
If $r$ and $p$ are $Q$ -birelational, the $L$ is a sensible length operator, in that $L (p r) \leq L (r) + L (p)$ .
$r$ is $Q$ -preserving iff $L (r) = 0$ .

It's that last property that makes $L$ such a useful distance metric: it measures the extent to which $M_{0}$ and $M_{1}$ fail to be aspects of the same underlying reality.

Relating features and probability distributions

In the above we've been looking at the relationship between $r$ and $Q$ , but have looked little at the features. Here we'll look at some of the relations between features and probability distributions. The idea is to measure how related the features are to each other.

There are several candidates for a measure of this types, but the most interesting seems to be a generalisation of mutual information. For any feature $F \in F$ , we have the marginal distribution $Q_{F}$ of $Q$ over the feature $F$ . Then if $H$ is the informational entropy of a probability distribution/random variable, we can define a measure over $Q$ given $F$ as:

$- H (Q) + \sum F \in F H (Q_{F}) .$

If we define $Q_{F}$ as the product distribution $\prod_{F \in F} Q_{F}$ , then that can also be defined as $D_{K L} (Q | | Q_{F})$ , where $D_{K L}$ is the KL-divergence from $Q_{F}$ to $Q$ .

Example: gas laws

As an illustration, consider the ideal gas laws: $P V = n R T$ , where $P$ is pressure, $V$ is volume, $n$ is the amount of substance, $R$ is the ideal gas constant, and $T$ is the temperature. We'll consider a simple model with constant amount of substance, setting $n R = 1$ , so the law reduces to:

$P V = T$

We'll allow these variables to take a few values: $P$ and $V$ are integers that range from $1$ to $4$ , while $T$ is an integer that ranges from $1$ to $16$ . The probability distribution $Q$ will give uniform probability^[4] $1 / 16$ to each $(P = p, V = v, T = p v)$ .

In this case, $Q$ is uniform among the $16$ worlds where it is non-zero; hence

$H (Q) = {log}_{2} (16) = 4.$

This characterises $Q$ , but doesn't touch the features $F$ . So, let $Q_{P}$ , $Q_{V}$ , and $Q_{P}$ be the marginal distributions over the features. Both $Q_{P}$ and $Q_{V}$ are uniform over $4$ elements, so $H (Q_{P}) = H (Q_{V}) = 2$ . As for $Q_{T}$ , it is $1 / 16$ over ${1, 9, 16}$ , $2 / 16$ over ${2, 3, 6, 8, 12}$ , and $3 / 16$ over ${4}$ . Some calculations then establish that $H (Q_{T})$ is $(54 - 3 {log}_{2} (3)) / 16$ . Then the KL-divergence from $Q_{F} = Q_{V} Q_{P} Q_{T}$ to $Q$ is:

$\begin{matrix} D_{K L} (Q | | Q_{F}) & = - 4 + 2 + 2 + \frac{54 - 3 {log}_{2} (3)}{16} = \frac{54 - 3 {log}_{2} (3)}{16} \approx 3.08. \end{matrix}$

Let us add another variable $T^{'}$ to the feature set, which is just equal to $T$ , but with another name, and see how things change. Then $H (Q)$ is unchanged, and $D_{K L} (Q | | Q_{F})$ adds another $\frac{54 - 3 {log}_{2} (3)}{16}$ , corresponding to $H (T^{'})$ .

We've already shown that $Q$ -preserving morphisms are associative. The composition of general binary relations is known to be associative too.

So now assume that $r : M_{0} \to M_{1}$ and $p : M_{1} \to M_{2}$ are both $Q$ -relational. Let $w_{0} \in W_{0}$ be such that $Q_{0} (w_{0}) > 0$ . Then, because r is $Q$ -relational, there exists a $w_{1} \in W_{0}$ with $Q_{1} (w_{1}) > 0$ and $(w_{0}, w_{1}) \in r$ . Then since $p$ is $Q$ -relational, there exists a $w_{2} \in W_{2}$ with $Q_{2} (w_{2}) > 0$ and $(w_{1}, w_{2}) \in p$ . Combining the two gives $(w_{0}, w_{2}) \in p r$ . Thus $Q$ -relational morphisms are associative. Applying the same argument to $r^{- 1}$ shows that $Q$ -birelational morphisms are also associative. A variant of the same argument with "there exists a unique $w_{1} \in W_{0}$ " instead of "there exists a $w_{1} \in W_{0}$ " shows that $Q$ -functional morphisms are also associative. Hence $Q$ -isomorphic $r$ are also functional.

Now assume that $r$ is $Q$ -preserving and let $w_{0} \in W_{0}$ be such that $Q_{0} (w_{0}) > 0$ . Then there exists an underlying model $M_{r} = (F_{0} ⊔ F_{1}, Q_{r})$ such that $Q_{0} (w_{0}) = \sum_{(w_{0}, w_{1}) \in r} Q_{r} (w_{0}, w_{1})$ . Since this sum is greater than zero, there exists a $w_{1}$ such that $Q_{r} (w_{0}, w_{1}) > 0$ . Then since $Q_{1} (w_{1}) = \sum_{(w_{0}^{'}, w_{1}) \in r} Q_{r} (w_{0}^{'}, w_{1})$ and all the terms are non-negative, $Q_{1} (w_{1}) > 0$ . The same argument works if we started with a $w_{1} \in W_{1}$ such that $Q_{0} (w_{1}) > 0$ ; thus $r$ must be $Q$ -birelational. This proves that $Q$ -preserving implies $Q$ -birelational.

It's trivial that $Q$ -birelational implies $Q$ -relational, and that $Q$ -functional also implies $Q$ -relational, since "exists a unique" is strictly stronger than "exists a". By definition, $Q$ -isomorphic implies $Q$ -preserving (hence $Q$ -birelational and $Q$ -relational) and $Q$ -functional.

Now let $r$ be $Q$ -isomorphic, and let $w_{0}$ be such that $Q_{0} (w_{0}) > 0$ . Since $r$ is $Q$ -functional, there exists a unique $w_{1}$ with $(w_{0}, w_{1}) \in r$ and $Q_{1} (w_{1}) > 0$ . Since $r^{- 1}$ is also $Q$ -functional, there are no other $w_{0}^{'}$ with $(w_{0}^{'}, w_{1}) \in r$ and $Q_{0} (w_{0}^{'}) > 1$ . So, among the elements of non-zero measure, $w_{0}$ and $w_{1}$ are related only to each other. Then since $r$ is $Q$ -preserving, $Q_{0} (w_{0}) \leq Q_{1} (r (w_{0}) = Q_{1} (w_{1}) + 0$ , and $Q_{1} (w_{1}) \leq Q_{0} (r^{- 1} (w_{0})) = Q_{0} (w_{0}) + 0$ . Hence $Q_{0} (w_{0}) = Q_{1} (w_{1})$ . ↩︎
This will be a minimum, not an infimum. Let $(Q_{0}^{'}, Q_{1}^{'})$ be an $r$ -compatible pair with $L (r, Q_{0}^{'}, Q_{1}^{'}) = μ$ . Then if we restrict to pairs $(Q_{0}^{'}, Q_{1}^{'})$ with $L (r, Q_{0}^{'}, Q_{1}^{'}) \leq μ$ , this is a compact non-empty set, so $L (r, -, -)$ must reach its minimum on this set. ↩︎
For $r$ a relation between $M_{0} = (F_{0}, Q_{0})$ and $M_{1} = (F_{1}, Q_{1})$ , let $Q_{r}$ be the set of $r$ -compatible $(Q_{0}^{'}, Q_{1}^{'})$ that minimises $L (r, Q_{0}^{'}, Q_{1}^{'})$ . Hence $L (r, -, -) = L (r)$ on $Q_{r}$ .

So we have this non-empty $Q_{r}$ ; what we'll show is that, if $r$ is $Q$ -relational, then there is a $Q_{1}^{'}$ such that $(Q_{0}, Q_{1}^{'}) \in Q_{r}$ . We'll need the following lemma:
- Lemma A: For any $(Q_{0}^{'}, Q_{1}^{'})$ in $Q_{r}$ with $| | Q_{0} - Q_{0}^{'} | |_{1} > 0$ , we can find another pair $(Q_{0}^{''}, Q_{1}^{''}) \in Q_{r}$ with $Q_{0}^{''}$ closer (in the $l_{1}$ norm) to $Q_{0}$ than $Q_{0}^{'}$ is.
To prove the lemma, pick any $(Q_{0}^{'}, Q_{1}^{'}) \in Q_{r}$ with $| | Q_{0} - Q_{0}^{'} | |_{1} > 0$ . That $l_{1}$ norm is a sum of positive terms, so there must exist a $w_{0} \in W_{0}$ with $| Q_{0} (w_{0}) - Q_{0}^{'} (w_{0}) | > 0$ .

Assume first that $Q_{0} (w_{0}) < Q_{0}^{'} (w_{0})$ . Then, since $Q_{0}^{'} (w_{0}) > 0$ and $r$ is a $Q$ -preserving morphism between $Q_{0}^{'}$ and $Q_{1}^{'}$ , there is an underlying model $M_{r} = (F_{0} \cup F_{1}, Q_{r})$ so that $Q_{0}^{'} (w_{1}) = \sum_{(w_{0}, w_{1}) \in r} Q_{r} (w_{0}, w_{1}) > 0$ . Thus there is a $(w_{0}, w_{1}) \in r$ with $Q_{r} (w_{0}, w_{1}) > 0$ . Pick $ϵ > 0$ to be less than $Q_{r} (w_{0}, w_{1})$ and $| Q_{0} (w_{0}) - Q_{0}^{'} (w_{0}) |$ , and define:

$\begin{matrix} Q_{0}^{''} (w_{0}) & = Q_{0}^{'} (w_{0}) - ϵ Q_{1}^{''} (w_{1}) & = Q_{1}^{'} (w_{1}) - ϵ, \end{matrix}$

with $Q_{0}^{''} = Q_{0}^{'}$ and $Q_{1}^{''} = Q_{1}^{'}$ on all other points. Because $Q_{0}^{''} (w_{0})$ is closer to $Q_{0} (w_{0})$ (by $ϵ$ ) than $Q_{0}$ is, $| | Q_{0} - Q_{0}^{''} | |_{1} = | | Q_{0} - Q_{0}^{''} | |_{1} - ϵ$ . Furthermore, $r$ is $Q$ -preserving between $Q_{0}^{''}$ and $Q_{1}^{''}$ (the underlying model has the same $Q_{r}$ except increased by $ϵ$ on $(w_{0}, w_{1})$ ), and $| | Q_{1} - Q_{1}^{''} | |_{1}$ has gone up by at most $ϵ$ over $| | Q_{1} - Q_{1}^{'} | |_{1}$ ; thus the sum $| | Q_{0} - Q_{0}^{''} | |_{1} + | | Q_{1} - Q_{1}^{''} | |_{1}$ has not increased. Hence $(Q_{0}^{''}, Q_{1}^{''}) \in Q_{r}$ .

Now consider the other case: $Q_{0} (w_{0}) > Q_{0}^{'} (w_{0})$ . Then since $Q_{0} (w_{0}) > 0$ and $r$ is $Q$ -relational, there must exist a $(w_{0}, w_{1}) \in r$ . We define $Q_{0}^{''}$ and $Q_{1}^{''}$ as above, except that we add $ϵ$ instead of subtracting it; the rest of the argument is the same. This proves the lemma $□$ .

Back to the main proof. Since $Q_{r}$ is compact, the $l_{1}$ distance to $Q_{0}$ must reach a minimum on $Q_{r}$ . By lemma A, this minimum can only be $0$ (since if it were greater than $0$ , we could find a pair with a smaller $l_{1}$ distance to $Q_{0}$ ). If $(Q_{0}^{'}, Q_{1}^{'}) \in Q_{r}$ is a pair on which it reaches this minimum, we must have $| | Q_{0} - Q_{0}^{'} | |_{l} = 0$ , ie $Q_{0}^{'} = Q_{0}$ .

Now assume $r$ is $Q$ -birelational between $M_{0}$ and $M_{1}$ , while $p$ is $Q$ -birelational between $M_{1}$ and $M_{2}$ . Then $L (r) + L (p) = L (r^{- 1}) + L (p)$ . Since $r^{- 1}$ and $p$ are $Q$ -relational, there exists $Q_{0}^{'}$ and $Q_{2}^{'}$ such that this is $L (r^{- 1}, Q_{1}, Q_{0}^{'}) + L (p, Q_{1}, Q_{2}^{'})$ , and $(Q_{0}^{'}, Q_{1})$ is $r$ -compatible, while $(Q_{1}, Q_{2}^{'})$ is $p$ -compatible.

This implies that $(Q_{0}^{'}, Q_{2}^{'})$ is $p r$ -compatible, and thus $L (p r, Q_{0}^{'}, Q_{2}^{'}) \geq L (p r)$ . However, $L (r^{- 1}, Q_{1}, Q_{0}^{'}) + L (p, Q_{1}, Q_{2}^{'}) = | | Q_{0} - Q_{0}^{'} | |_{1} + 0 + 0 + | | Q_{2} - Q_{2}^{'} | |_{1} = L (p r, Q_{0}^{'}, Q_{2}^{'})$ , giving our result:

$L (p r) \leq L (r) + L (p) .$

Finally, notice that $L (r) = 0$ implies that there exists an $r$ compatible $(Q_{0}^{'}, Q_{1}^{'})$ with $| | Q_{0} - Q_{0}^{'} | |_{1} = 0$ and $| | Q_{1} - Q_{1}^{'} | |_{1} = 0$ . Thus $(Q_{0}, Q_{1})$ themselves are $r$ -compatible, ie $r$ is $Q$ -preserving. Conversely, if $(Q_{0}, Q_{1})$ are $r$ -compatible, then $L (r) \leq | | Q_{0} - Q_{0} | |_{1} + | | Q_{1} - Q_{1} | |_{1} = 0$ , and thus $L (r) = 0$ . ↩︎
Note that this means that many values of $T$ are impossible in this model, such as $7$ and $10$ , which cannot be expressed as the product of integers $4$ or less. ↩︎

LESSWRONG
LW

LESSWRONG
LW

9

Generalised models: imperfect morphisms and informational entropy

9

Ω 6

Imperfect morphisms

Examples of morphisms

Comparing the $Q$ 's

Relating features and probability distributions

Example: gas laws

9

Ω 6

9

Ω 6

9

Generalised models: imperfect morphisms and informational entropy

9

Ω 6

Imperfect morphisms

Examples of morphisms

Comparing the Q's

Relating features and probability distributions

Example: gas laws

9

Ω 6

9

Ω 6

Comparing the $Q$ 's