"Gnomish helms should not function. Their very construction seems to defy the nature of thaumaturgical law. In fact, they are impossible. Like most products of gnomish minds, they include a large number of bells and whistles, and very little substance. Those that work usually have a minor helm contained within, always hidden away, disguised to appear innocuous and inessential."
-- Spelljammer campaign set
We have seen that knowledge implies mutual information between a mind and its environment, and we have seen that this mutual information is negentropy in a very physical sense: If you know where molecules are and how fast they're moving, you can turn heat into work via a Maxwell's Demon / Szilard engine.
We have seen that forming true beliefs without evidence is the same sort of improbability as a hot glass of water spontaneously reorganizing into ice cubes and electricity. Rationality takes "work" in a thermodynamic sense, not just the sense of mental effort; minds have to radiate heat if they are not perfectly efficient. This cognitive work is governed by probability theory, of which thermodynamics is a special case. (Statistical mechanics is a special case of statistics.)
If you saw a machine continually spinning a wheel, apparently without being plugged into a wall outlet or any other source of power, then you would look for a hidden battery, or a nearby broadcast power source - something to explain the work being done, without violating the laws of physics.
So if a mind is arriving at true beliefs, and we assume that the second law of thermodynamics has not been violated, that mind must be doing something at least vaguely Bayesian - at least one process with a sort-of Bayesian structure somewhere - or it couldn't possibly work.
In the beginning, at time T=0, a mind has no mutual information with a subsystem S in its environment. At time T=1,the mind has 10 bits of mutual information with S. Somewhere in between, the mind must have encountered evidence - under the Bayesian definition of evidence, because all Bayesian evidence is mutual information and all mutual information is Bayesian evidence, they are just different ways of looking at it - and processed at least some of that evidence, however inefficiently, in the right direction according to Bayes on at least some occasions. The mind must have moved in harmony with the Bayes at least a little, somewhere along the line - either that or violated the second law of thermodynamics by creating mutual information from nothingness.
In fact, any part of a cognitive process that contributes usefully to truth-finding must have at least a little Bayesian structure - must harmonize with Bayes, at some point or another - must partially conform with the Bayesian flow, however noisily - despite however many disguising bells and whistles - even if this Bayesian structure is only apparent in the context of surrounding processes. Or it couldn't even help.
How philosophers pondered the nature of words! All the ink spent on the true definitions of words, and the true meaning of definitions, and the true meaning of meaning! What collections of gears and wheels they built, in their explanations! And all along, it was a disguised form of Bayesian inference!
I was actually a bit disappointed that no one in the audience jumped up and said: "Yes! Yes, that's it! Of course! It was really Bayes all along!"
But perhaps it is not quite as exciting to see something that doesn't look Bayesian on the surface, revealed as Bayes wearing a clever disguise, if: (a) you don't unravel the mystery yourself, but read about someone else doing it (Newton had more fun than most students taking calculus), and (b) you don't realize that searching for the hidden Bayes-structure is this huge, difficult, omnipresent quest, like searching for the Holy Grail.
It's a different quest for each facet of cognition, but the Grail always turns out to be the same. It has to be the right Grail, though - and the entire Grail, without any parts missing - and so each time you have to go on the quest looking for a full answer whatever form it may take, rather than trying to artificially construct vaguely hand-waving Grailish arguments. Then you always find the same Holy Grail at the end.
It was previously pointed out to me that I might be losing some of my readers with the long essays, because I hadn't "made it clear where I was going"...
...but it's not so easy to just tell people where you're going, when you're going somewhere like that.
It's not very helpful to merely know that a form of cognition is Bayesian, if you don't know how it is Bayesian. If you can't see the detailed flow of probability, you have nothing but a password - or, a bit more charitably, a hint at the form an answer would take; but certainly not an answer. That's why there's a Grand Quest for the Hidden Bayes-Structure, rather than being done when you say "Bayes!" Bayes-structure can be buried under all kinds of disguies, hidden behind thickets of wheels and gears, obscured by bells and whistles.
The way you begin to grasp the Quest for the Holy Bayes is that you learn about cognitive phenomenon XYZ, which seems really useful - and there's this bunch of philosophers who've been arguing about its true nature for centuries, and they are still arguing - and there's a bunch of AI scientists trying to make a computer do it, but they can't agree on the philosophy either -
And - Huh, that's odd! - this cognitive phenomenon didn't look anything like Bayesian on the surface, but there's this non-obvious underlying structure that has a Bayesian interpretation - but wait, there's still some useful work getting done that can't be explained in Bayesian terms - no wait, that's Bayesian too - OH MY GOD this completely different cognitive process, that also didn't look Bayesian on the surface, ALSO HAS BAYESIAN STRUCTURE - hold on, are these non-Bayesian parts even doing anything?
- Yes: Wow, those are Bayesian too!
- No: Dear heavens, what a stupid design. I could eat a bucket of amino acids and puke a better brain architecture than that.
Once this happens to you a few times, you kinda pick up the rhythm. That's what I'm talking about here, the rhythm.
Trying to talk about the rhythm is like trying to dance about architecture.
This left me in a bit of a pickle when it came to trying to explain in advance where I was going. I know from experience that if I say, "Bayes is the secret of the universe," some people may say "Yes! Bayes is the secret of the universe!"; and others will snort and say, "How narrow-minded you are; look at all these other ad-hoc but amazingly useful methods, like regularized linear regression, that I have in my toolbox."
I hoped that with a specific example in hand of "something that doesn't look all that Bayesian on the surface, but turns out to be Bayesian after all" - and an explanation of the difference between passwords and knowledge - and an explanation of the difference between tools and laws - maybe then I could convey such of the rhythm as can be understood without personally going on the quest.
Of course this is not the full Secret of the Bayesian Conspiracy, but it's all that I can convey at this point. Besides, the complete secret is known only to the Bayes Council, and if I told you, I'd have to hire you.
To see through the surface adhockery of a cognitive process, to the Bayesian structure underneath - to perceive the probability flows, and know how, not just know that, this cognition too is Bayesian - as it always is - as it always must be - to be able to sense the Force underlying all cognition - this, is the Bayes-Sight.
"...And the Queen of Kashfa sees with the Eye of the Serpent."
"I don't know that she sees with it," I said. "She's still recovering from the operation. But that's an interesting thought. If she could see with it, what might she behold?"
"The clear, cold lines of eternity, I daresay. Beneath all Shadow."
-- Roger Zelazny, Prince of Chaos
A hard question. I know no good solid answer; people have tried to explain 'why couldn't that rock over there be processing a mind under the right representation?' It's one of those obscene questions - we know when a physics model is simulating nature, and when a computation is doing nothing like simulating nature, but we have no universally accepted criterion. Eliezer has written some entries on this topic, though I don't have them to hand.