I sometimes wonder just how useful probability and statistics are. There is the theoretical argument that Bayesian probability is the fundamental method of correct reasoning, and that logical reasoning is just the limit as p=0 or 1 (although that never seems to be applied at the meta-level: what is the probability that Bayes' Theorem is true?), but today I want to consider the practice.
Casinos, lotteries, and quantum mechanics: no problem. The information required for deterministic measurement is simply not available, by adversarial design in the first two cases, and by we know not what in the third. Insurance: by definition, this only works when it's impossible to predict the catastrophes insured against. No-one will offer insurance against a risk that will happen, and no-one will buy it for a risk that won't. Randomised controlled trials are the gold standard of medical testing; but over on OB Robin Hanson points out from time to time that the marginal dollar of medical spending has little effectiveness. And we don't actually know how a lot of treatments work. Quality control: test a random sample from your production run and judge the whole batch from the results. Fine -- it may be too expensive to test every widget, or impossible if the test is destructive. But wherever someone is doing statistical quality control of how accurately you're filling jam jars with the weight of jam it says on the label, someone else will be thinking about how to weigh every single one, and how to make the filling process more accurate. (And someone else will be trying to get the labelling regulations amended to let you sell the occasional 15-ounce pound of jam.)
But when you can make real measurements, that's the way to go. Here is a technical illustration.
Prof. Sagredo has assigned a problem to his two students Simplicio and Salviati: "X is difficult to measure accurately. Predict it in some other way."
Simplicio collects some experimental data consisting of a great many pairs (X,Y) and with high confidence finds a correlation of 0.6 between X and Y. So given the value y of Y, his best prediction for the value of X is 0.6y. [Edit: that formula is mistaken. The regression line for Y against X is Y = bcX/a, assuming the means have been normalised to zero, where a and b are the standard deviations of X and Y respectively. For the Y=X+D1 model below, bc/a is equal to 1.]
Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X. Let us suppose that the standard deviation of Z-X is 10% that of X.
How do these two approaches compare?
A correlation of 0.6 is generally considered pretty high in psychology and social science, especially if it's established with p=0.001 to be above, say, 0.5. So Simplicio is quite pleased with himself.
A measurement whose range of error is 10% of the range of the thing measured is about as bad as it could be and still be called a measurement. (One might argue that any sort of entanglement whatever is a measurement, but one would be wrong.) It's a rubber tape measure. By that standard, Salviati is doing rather badly.
In effect, Simplicio is trying to predict someone's weight from their height, while Salviati is putting them on a (rather poor) weighing machine (and both, presumably, are putting their subjects on a very expensive and accurate weighing machine to obtain their true weights).
So we are comparing a good correlation with a bad measurement. How do they stack up? Let us suppose that the underlying reality is that Y = X + D1 and Z = X + D2, where X, D1, and D2 are normally distributed and uncorrelated (and causally unrelated, which is a stronger condition). I'm choosing the normal distribution because it's easy to calculate exact numbers, but I don't believe the conclusions would be substantially different for other distributions.
For convenience, assume the variables are normalised to all have mean zero, and let X, D1, and D2 have standard deviations 1, d1, and d2 respectively.
Z-X is D2, so d2 = 0.1. The correlation between Z and X is c(X,Z) = cov(X,Z)/(sd(X)sd(Z)) = 1/sqrt(1+d2 2) = 0.995.
The correlation between X and Y is c(X,Y) = 1/sqrt(1+d1 2) = 0.6, so d1 = 1.333.
We immediately see something suspicious here. Even a terrible measurement yields a sky-high correlation. Or put the other way round, if you're bothering to measure correlations, your data are rubbish. Even this "good" correlation gives a signal to noise ratio of less than 1. But let us proceed to calculate the mutual informations. How much do Y and Z tell you about X, separately or together?
For the bivariate normal distribution, the mutual information between variables A and B with correlation c is lg(I), where lg is the binary logarithm and I = sd(A)/sd(A|B). (The denominator here -- the standard deviation of A conditional on the value of B -- happens to be independent of the particular value of B for this distribution.) This works out to 1/sqrt(1-c2). So the mutual information is -lg(sqrt(1-c2)).
corr. | mut. inf. | |||
---|---|---|---|---|
Simplicio | 0.6 | 0.3219 | ||
Salviati | 0.995 | 3.3291 |
What can you do with one third of a bit? If Simplicio tries to predict just the sign of X from the sign of Y, he will be right only 70% of the time (i.e. cos-1(-c(X,Y))/π). Salviati will be right 96.8% of the time. Salviati's estimate will even be in the right decile 89% of the time, while on that task Simplicio can hardly do better than chance. So even a good correlation is useless as a measurement.
Simplicio and Salviati show their results to Prof. Sagredo. Simplicio can't figure out how Salviati did so much better without taking measurements on thousands of samples. Salviati seemed to just think about the problem and come up with a contraption out of nowhere that did the job, without doing a single statistical test. "But at least," says Simplicio, "you can't throw away my 0.3219, it all adds up!" Sagredo points out that it literally does not add up. The information gained about X from Y and Z together is not 0.3219+3.3291 = 3.6510 bits. The correct result is found from the standard deviation of X conditional on both Y and Z, which is sqrt(1/(1 + 1/d1 2 + 1/d2 2)). The information gained is then lg(sqrt(1 + 1/d1 2 + 1/d2 2)) = 0.5*lg(101.5625) = 3.3331. The extra information over knowing just Z is only 0.0040 = 1/250 of a bit, because nearly all of Simplicio's information is already included in Salviati's.
Sagredo tells Simplicio to go away and come up with some real data.
Now that I've read it, I have to say I agree with you: it is not good evidence. At best, it's an application of PCT to generate an interesting hypothesis or two.
I'm not sure why you'd expect akrasia to be a simple circuit. If it were a simple conflict, between exactly two things, you'd likely be able to resolve it consciously without much effort. A few weeks ago, I did a workshop where we charted a portion of one person's control structure in the area of not working on the iPhone app they wanted to write. It took a couple hours and filled most of a page with the relevant cognitive-level variables and their interconnections.
This is quite consistent with e.g. Ainslie's model of akrasia as involving multiple competing "interests"; I see PCT as an improvement over Ainslie in providing a straightforward implementation mapping, plus simplified management of Ainslie's notion of "appetites", which is not very well worked out (IMO) and a little too handwavy.
Replacing Ainslie's idea of "interests" having "appetites" with controllers measuring time-averaged variables seems like a straightforward win: instead of two entities, you have just one entity that's structurally similar to things we know our brains/nervous systems already have. (Also, Ainslie has no worked-out model for how prioritization and agreement between interests occur; PCT on the other hand has hierarchy and reference levels to account for them.)
Individually, the circuits are simple; collectively, the networks are not. I used to think things were simpler than they are, because I focused only on the things (functional beliefs) that were effectively connections between control circuits. I rarely addressed the settings of the circuits themselves, or used them as a springboard to identifying other beliefs or variables.
There's a difference between having a false label applied to a true experience, and having a false experience. The existence of perceptions such as "how much work I've gotten done lately" or "how much fun I'm having" is certainly some evidence for PCT's notion of time-averaged perceptual variables that can influence decision-making. It's also parsimonious to assume that the brain is unlikely to have evolved specific circuits for these perceptions, rather than simply having a basis for acquiring new perceptions.
In effect, the PCT model of cognitive variables explains how we represent all the things we "just know" or "just feel", including expert intuition in specialized subjects. The PCT prediction would be that if someone is skilled enough in a subject to have a specific intuition about something, we should be able to find a specific neural signal whose intensity corresponds to the degree of that intuition, and which is a time-averaged function of other (possibly gated) input signals.
I don't see how any of this seems extraordinary or controversial in the slightest, on the perception side.
Control, perhaps, might be more controversial... especially given the implication that we don't control our own actions directly, but can only do so through interaction with the control network. But for me, that implication is uncontroversial, because I've been writing about that (independently formed) idea since 2005.
Powers hypothesizes that "awareness" simply is a debugger that can go in and inspect any part of the network, injecting settings or testing hypotheticals. Anything we do by direct conscious intention would therefore consist of "manually" setting control values in the network, which of course would have no long-term effect if a higher-level controller puts the settings right back when you're done. What's more, if your conscious meddling is interfering with something in an "important" (high) position in the network, it's likely to reorganize in such a way that you no longer want to meddle with the network in that particular way!
And that actually sounds like the most straightforward explanation of akrasic behaviors, ever, and is also 100% consistent with everything I've already previously observed about mind hacking.
That is, we really don't control our own behaviors: our networks do. Free will is really just a special case, even if it doesn't seem that way at first glance. PCT just offers a better explanation than my rough models had for why/how that works.
Good. The experiment is, however, very good evidence for the hypothesis that R.S. Marken is a crank, and explains the quote from his farewell speech that didn't make sense to me before:
... (read more)