A Preface
During the 1990’s, a significant stream of research existed around how people process information, which combined very different streams in psychology and related areas with explicit predictive models about how actual cognitive processes differ from the theoretical ideal. This is not only the literature by Kahneman and Tversky about cognitive biases, but includes research about memory, perception, scope insensitivity, and other areas. The rationalist community is very familiar with some of this literature, but fewer are familiar with a masterful synthesis produced by Richards Heuer for the intelligence community in 1999[1], which was intended to start combating these problems, a goal we share. I’m hoping to put together a stream of posts based on that work, potentially expanding on it, or giving my own spin – but encourage reading the book, Psychology of Intelligence Analysis, itself (PDF) as well[2]. (This essay is based on Chapter 3.)
This will hopefully be my first set of posts, so feedback is especially welcome, both to help me refine the ideas, and to refine my presentation.
Entropy, Pressure, and Metaphorical States of Matter
Eliezer recommends updating incrementally but has noted that it’s hard. The central point, that it is hard to do so, is one that some in our community have experienced and explicated, but there is deep theory I’ll attempt to outline, via an analogy, that I think explains how and why it occurs. The problem is that we are quick to form opinions and build models, because humans are good at pattern finding. We are less quick to discard them, due to limited mental energy. This is especially true when the pressure of evidence doesn’t shift overwhelmingly and suddenly.
I’ll attempt to answer the question of how this is true by stretching a metaphor and create an intuition pump for thinking about how our minds might be perform some think using uncertainty.
Frozen Perception
Heuer notes a stream of research about perception, and notes that “once an observer has formed an image – that is, once he or she has developed a mind set or expectation concerning the phenomenon being observed – this conditions future perceptions of that phenomenon.” This seems to follow a standard Bayesian practice, but in fact, as Eliezer noted, people fail to update. The following set of images, which Heuer reproduced from a 1976 book by Robert Jervis, show exactly this point;
Looking at each picture, starting on the left, and moving to the right, you see a face slowly change. At what point does the face no longer seem to appear? (Try it!) For me, it’s at about the seventh image that it’s clear it morphed into a sitting, bowed figure. But what if you start at the other end? The woman is still clearly there long past the point where we see a face, starting in the other direction. What’s going on?
We seem to attach too strongly to our first approach, decision, or idea. Specifically, our decision seems to “freeze” once it get to one place, and needs much more evidence to start moving again. This has an analogue in physics, to the notion of freezing, which I think is more important than it first appears.
Entropy
To analyze this, I’ll drop into some basic probability theory, and physics, before (hopefully) we come out on the other side with a conceptually clearer picture. First, I will note that cognitive architecture has some way of representing theories, and implicitly assigns probabilities to various working theories. This is some sort of probability distribution over sample theories. Any probability distribution has a quantity called entropy[3], which is simply the probability of each state, multiplied by the logarithm of that probability, summed over all the states. (The probability is less than 1, so the logarithm is negative, but we traditionally flip the sign so entropy is a positive quantity.)
Need an example? Sure! I have two dice, and they can each land on any number, 1-6. I’m assuming they are fair, so each has probability of 1/6, and the logarithm (base 2) of 1/6 is about -2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585. (With two dice, I have 36 possible combinations, each with probability 1/36, log(1/36) is -5.17, so the entropy is 5.17. You may have notices that I doubled the number of dice involved, and the entropy doubled – because there is exactly twice as much that can happen, but the average entropy is unchanged.) If I only have 2 possible states, such as a fair coin, each has probability of 1/2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1. An unfair coin, with a ¼ probability of tails, and a ¾ probability of heads, has an entropy of 0.81. Of course, this isn’t the lowest possible entropy – a trick coin with both sides having heads only has 1 state, with entropy 0. So unfair coins have lower entropy – because we know more about what will happen.
Freezing, Melting, and Ideal Gases under Pressure
In physics, this has a deeply related concept, also called entropy, which in the form we see it on a macroscopic scale, just temperature. If you remember your high school science classes, temperature is a description of how much molecules move around. I’m not a physicist, and this is a bit simplified[4], but the entropy of an object is how uncertain we are about its state – gasses expand to fill their container, and the molecules could be anywhere, so they have higher entropy than a liquid, which stays in its container, which still has higher entropy than a solid, where the molecules don’t more much, which still has higher entropy than a crystal, where the molecules are sort of locked into place.
This partially lends intuition to the third law of thermodynamics; “the entropy of a perfect crystal at absolute zero is exactly equal to zero.” In our terms above, it’s like that trick coin – we know exactly where everything is in the crystal, and it doesn’t move. Interestingly, a perfect crystal at 0 Kelvin cannot exist in nature; no finite process can reduce entropy to that point; like infinite certainty, infinitely exact crystals are impossible to arrive at, unless you started there. So far, we could build a clever analogy between temperature and certainty, telling us that “you’re getting warmer” means exactly the opposite of what it does in common usage – but I think this is misleading[5].
In fact, I think that information in our analogy doesn’t change the temperature; instead, it reduces the volume! In the analogy, gases can become liquids or solids either by lowering temperature, or by increasing pressure – which is what evidence does. Specifically, evidence constrains the set of possibilities, squeezing our hypothesis space. The phrase “weight of evidence” is now metaphorically correct; it will actually constrain the space by applying pressure.
I think that by analogy, this explains the phenomenon we see with perception. While we are uncertain, information increases pressure, and our conceptual estimate can condense from uncertain to a relatively contained liquid state – not because we have less probability to distribute, but because the evidence has constrained the space over which we can distribute it. Alternatively, we can settle on a lower energy state on our own, unassisted by evidence. If our minds too-quickly settle on a theory or idea, the gas settles into a corner of the available space, and if we fail to apply enough energy to the problem, our unchallenged opinion can even freeze into place.
Our mental models can be liquid, gaseous, or frozen in place – either by our prior certainty, our lack of energy required to update, or an immense amount of evidential pressure. When we look at those faces, our minds settle into a model quickly, and once there, fail to apply enough energy to re-evaporate our decision until the pressure of the new pictures is relatively immense. If we had started at picture 3 or 6, we could much more easily update away from our estimates; our minds are less willing to let the cloud settle into a puddle of probable answers, much less freeze into place. We can easily see the face, or the woman, moving between just these two images.
When we begin to search for a mental model to describe some phenomena, whether it be patterns of black and white on a page, or the way in which our actions will affect a friend, I am suggesting we settle into a puddle of likely options, and when not actively investing energy into the question, we are likely to freeze into a specific model.
What does this approach retrodict, or better, forbid?
Because our minds have limited energy, the process of maintaining an uncertain stance should be difficult. This seems to be borne out by personal and anecdotal experience, but I have not yet searched the academic literature to find more specific validation.
We should have more trouble updating away from a current model than we do arriving at that new model from the beginning. As Heuer puts it, “Initial exposure to… ambiguous stimuli interferes with accurate perception even after more and better information becomes available.” He notes that this was shown in Brunder and Potter, 1964 “Interference in Visual Recognition,” and that “the early but incorrect impression tends to persist because the amount of information necessary to invalidate a hypothesis is considerably greater than the amount of information required to make an initial interpretation.”
Potential avenues of further thought
The pressure of evidence should reduce the mental effort needed to switch models, but “leaky” hypothesis sets, where a class of model is not initially considered, should allow the pressure to metaphorically escape into the larger hypothesis space.
There is a potential for making this analogy more exact, but discussing entropy in graphical models (Bayesian Networks), especially in sets of graphical models with explicit uncertainty attached. I don’t have the math needed for this, but would be interested in hearing from those who did.
[1] I would like to thank both Abram Demski (Interviewed here) from providing a link to this material, and my dissertation chair, Paul Davis, who was able to point me towards how this has been used and extended in the intelligence community.
[2] There is a follow up book and training course which is also available, but I've neither read it nor seen it online. A shorter version of the main points of that book is here (PDF), which I have only glanced through.
[3] Eliezer discusses this idea in Entropy and short codes, but I’m heading a slightly different direction.
[4] We have a LW Post, Entropy and Temperature that explains this a bit. For a different, simplified explanation, try this: http://www.nmsea.org/Curriculum/Primer/what_is_entropy.htm. For a slightly more complete version, try Wikipedia: https://en.wikipedia.org/wiki/Introduction_to_entropy. For a much more complete version, learn the math, talk to a PhD in thermodynamics, then read some textbooks yourself.
[5] I think this, of course, because I was initially heading in that direction. Instead, I realized there was a better analogy – but if we wanted to develop it in this direction instead, I’d point to the phase change energy required to changed phases of matter as a reason that our minds have trouble moving from their initial estimate. On reflection, I think this should be a small part of the story, if not entirely negligible.
I wanted to avoid going too deep into that example - the other LW and linked posts are better, but I wanted to at least introduce it.
Thanks for the feedback.