I get why homomorphisms between lambda-expressions are the main homomorphism problem for AGI in general. But for humans, I'd guess that homomorphisms between some drastically simpler subset of models would explain most or all of our abstraction capacity. For example, if humans can find morphisms between causal models, or between spatial models, or between some easily-modeled agents (e.g., agents similar to the human in question), or both, then it feels like that would be sufficient for most of the abstract reasoning which we actually perform.
Denotational equivalence is actually undecidable for any class of languages stronger than deterministic pushdown automata.
This doesn't mean that we can't obtain certain evidence that two languages/automata are equal/equivalent via some mechanism other than a decision algorithm, of course. It also doesn't mean that we can't assign a probability of equality in an entirely sensible way. In fact, in probabilistic programming, probabilistic extensional equality of random variables is trivial to model: the problem is that you're talking, there, about zero-free-parameter thunks rather than arbitrarily parameterized lambdas.
So we can't really decide the denotational equivalence of lambda expressions (or recurrent neural networks), but I think that decision algorithms aren't really useful, from a cognitive perspective, for more than obtaining a 100% likelihood of equivalence. That's powerful, when you can do it, but you should also be able to get non-100% likelihoods in other ways.
The various forms of probabilistic static analysis can probably handle that problem.
So, you're thinking that human abstraction ability derives from probable morphisms rather than certain morphisms over weaker classes? That makes a lot of sense.
On the other hand, from what I've seen in CS classes, humans do not seem very good at recognizing equivalences even between pushdown automata beyond a few simple cases. A human equipped with pencil and lots of paper can do a good job, but that's an awful lot more powerful than just a human.
A human equipped with pencil and lots of paper can do a good job, but that's an awful lot more powerful than just a human.
A professional scientist falls more under "human equipped with pencil and lots of paper". Nobody said second-level learning was easy.
So, you're thinking that human abstraction ability derives from probable morphisms rather than certain morphisms over weaker classes?
Maybe? I think that first what happens is that we identify a probable morphism, and then we adjust the two theories at hand in order to eliminate the uncertainty from the morphism by pushing it "upwards" (into the imprecision of the higher-level model) and "downwards" (into the parameter dimensionality of the lower-level model).
I think that, at least according to the breakdown Churchland gave (which may not accurately reflect what the brain is doing all the time, but at least he tried), object permanence belongs more to the transition from feature-governed concepts to causal-role concepts. In fact, it is arguably one of the first causal-role concepts a child learns: appearances-of-thingies (feature-governed concepts) are caused by actual thingies (causal-role concepts for the same objective domains), which are more permanent than the appearances-of-thingies.
Hmm. I would think that generalizing from "mommy disappeared than came back", "teddy disappeared than came back" to "things that disappear tend to come back" and then deducing that "I don't see my blankie, but it is out there" very much counts as second level.
It's "second level" in the sense of being recursive causal learning. It's not "Second Level" in Churchland's sense, of taking multiple preexisting theories and finding a way to reduce one to another.
Harper’s Fishing Nets: a review of Plato’s Camera by Paul Churchland
Abstract
Paul Churchland published Plato’s Camera to defend the thesis that abstract objects and properties are both real and natural, consisting in learned mental representations of the timeless, abstract features of the mind’s environment. He holds that the brain learns, without supervision, high-dimensional maps of objective feature domains – which he calls Domain-Portrayal Semantics. He further elaborates that homomorphisms between these high-dimensional maps allow the brain to occasionally repurpose a higher-quality map to understand a completely different domain, reducing the latter to the former. He finally adds a Map-Portrayal Semantics of language to his Domain-Portrayal Semantics of thought by considering the linguistic, cultural, educational dimensions of human learning.
Part I
Introduction
Surely the title of this review already sounds like some terrible joke is about to be perpetrated, but in fact it merely indicates a philosophical difference between myself and Paul Churchland. Churchland wrote Plato’s Camera[3] not merely to explain a view on philosophy of mind to laypeople and other philosophers, but with the specific goal of defending Platonism about abstract, universal properties and objects (such as those used in mathematics) by naturalizing it. The contrast between such naturalist philosophers as Churchland, Dennett, Flanegan, and Railton and non-naturalist or weakly naturalist philosophy lies precisely in this fact: the latter consider many abstract or intuitive concepts to necessarily form their own part of reality, amenable strictly to philosophical investigation, while the former seek and demand a conscilience of causal explanation for what’s going on in our lives. The results are a breath of fresh air to read.
A great benefit of reading strongly naturalistic philosophy and philosophers is that, over the effort of researching a philosophical position, they tend to absorb so much scientific material that they can’t help but achieve a degree of insight and accuracy in their core thesis – even when getting almost all the details wrong! So it is with Plato’s Camera: reading in 2015 a book published in 2012, that mostly does not cite any scientific research from the past five to ten years, the details can’t help but seem somewhat dated and unrealistic, at least to those of us who’ve been doing our own reading in related scientific literature (or possibly just have partisan opinions). And yet, Plato’s Camera captures and supports a core thesis, this being more or less:
It is these core theses which I regard as largely correct, even where their supporting details are based on old research or the wrong research in the view of the present reviewer. I even believe that had Churchland done as much investigation into my favorite school of computational cognitive science, it would have reinforced his thesis and given him enough material for two books instead of just one. In fact, my disagreements with Churchland can be summed up quite succinctly:
And yet, for all that these may sound substantial, they are the sum total of my objections. Churchland has otherwise written an excellent book that gets its point across well, and whose many moments of snark against non-naturalistic philosophies of mind, especially the linguaformal “Hilbert proof system theory of mind”, are actually enjoyable (at least, to one who enjoys snark).
In fact, in addition to just describing Churchland’s work, I will spend some of my review noting where other work bolsters it, particularly from the rational analysis (and resource-rational) school of cognitive science[9]. This school of thought aims to understand the mind by first assuming that the mind is posed particular, constrained problems by its environment, then positing how these problems can be optimally solved, and then comparing the resulting theoretical solutions with experimental data. The mind is thus understand as an approximately boundedly rational engine of inference, forced by its environment to deal with shortages of sample data and computational power in the most efficient way possible, but ultimately trying to perform well-defined tasks such as predicting environmental stimuli or plan rewarding actions for the embodied organism to take.
Why “Harper’s Fishing Nets”, then? Well, because treating abstract universals as computational objects learned by generalizing over many domains seems more along the lines of Robert Harper’s “computational trinitarianism” than true Platonism, and because the noisy, always-incomplete process of recursive learning seems more like a succession of fishing nets, with their ropes spaced differently to catch specific species of fish, than like a camera that takes a single, complete picture. All learning algorithms aim to capture the structural information in their input samples while ignoring the noise, but the difference is, of course, undecidable[10]. Recursive pattern recognition - the unsupervised recognition of patterns in already-transformed feature representations - may thus be applicable for capturing additional levels of structural information, especially where causal learning prevents collapsing all levels of hierarchy into a single function. Or, as Churchland himself puts it:
Churchland is especially to be congratulated for approaching cognition as a capability that must have evolved in gradual steps, and coming up with a theory that allows for nonhuman animals to have great cognitive abilities in First-Level Learning, even if not in Second and Third.
Choice quotes from the Introductory section:
Part II
First-Level Learning
It is no understatement to say that First-Level Learning forms the shining star of Churchland’s book. It is the process by which the brain forms and updates increasingly accurate maps of conceptual and causal reality, a deeply Pragmatic process shared with nonhuman animal and taking place largely below conscious awareness. In machine-learning terms, First-Level Learning consists mainly of classification and regression problems: classifying hierarchies of regions of compacted metric spaces to form concepts using feedforward neural learning, and regressing trajectories through state-spaces to form causal understanding using recurrent neural networks. One full chapter each is spent on the former and the latter subjects.
1 First-Level Conceptual Learning
He begins in his first chapter on First-Level Learning with a basic introduction to many-layered feedforward neural networks, their training via supervised backpropagation of errors, and their usage for classification of feature-based concepts. He talks about the nonlinear activation functions, like sign and sigmoid, necessary to allow feedforward networks to approximate arbitrary total functions. He gives examples of face-recognition neural networks, which will probably be old-hat for any student of machine learning, but are extremely necessary for laypeople and philosophers untrained in computational approaches to modelling perception. Churchland is also careful to specify that these are not the neural networks of the real human mind, but instead specific examples of what can be done with neural networks. Finally, Churchland begins defending his thesis about Platonism when talking about an artificial neural network designed to classify colors:
Or put simply, the kinds of abstract, higher-level features learned by multi-layer neural networks serve to represent certain objective facts about the environment, with each successively lower layer of the network filtering out some perceptual noise and capturing some important structural information.
Churchland also elaborates, in several places, on the compaction of metric-space produced by the nonlinear transformations encoded in neural networks. Neural networks don’t spread their training data uniformly in the output space (or in any of the spaces formed by the intermediate layers of the network)! In fact, they tend to push their training points into highly compacted prototype regions in their output spaces, and when later activated they will try to “divert” any given vector into one of those compacted regions, depending on how well it resembles them in the first place. Since all neural networks receive and produce vectors, and vector spaces are metric spaces, Churchland notes that these neural-network concepts innately and necessarily carry distance metrics for gauging the similarities or differences between any two sensory feature-vectors (or, Churchland implies, real-world objects represented by abstract feature vectors). Churchland even notes, in a rare mention of probability in his book, that these compactions into distinct prototype regions for classes or clusters of training data can even be taken as a sort of emerging set of probability density functions over the training data:
Churchland deploys the vector-completion effect in feedforward networks as an example of primitive abductive reasoning himself:
Churchland deploys his metaphor of concepts as maps of feature-spaces again and again to great effect; I only wish he had taken greater effort to talk of his rarely-mentioned “deformed grids” as topographical maps, measures over the training data, and of the nonlinear transformations taken vectors from their input spaces into topologically-measured maps as flows or rivers. I cannot tell if he took seriously the notion of neural-network training as learning topographical maps of the non-input spaces, or of those topographies as measures in the sense of probability theory. Certainly, the physical metaphor of a river’s flow provides a good intuition pump for describing how well-trained neural network carves out paths from where drops of rain fall to where they ought to go, by whatever criterion trains the network. Certainly, he seems to be thinking something along these lines when he uses metric-space compaction to examine category effects:
Looking at neural-network training data as measurable would also help us think about how mere perception generates “sensorily simple” random variables, representing qualitative measurements of the world that correspond to the world, which would then be of use according to probabilistic theories of cognition. Certainly, a number of cognitive scientists and neuroscientists have been researching neural mechanisms for representing probabilities[13, 19]. A number of these even provide exactly the kind of approximate Bayesian inference one would require when working with open-world models that can have countably infinitely many separate random variables, an important component of working with Turing-complete modelling domains. One paper even proposes that the neural implementation and learning of probability density/mass functions can explain certain deviations of human judgements from the probabilistic optimum[13]. Again: Churchland’s book, published in 2012 and sent to press without little mention of probability, still clearly prefigured neural encodings of probability, which have turned out to be a productive research effort. This is a testament to how well Churchland has generalized from what previous neuroscientific research he did have!
Of course, Churchland himself decries any notion of sensorily simple variables:
Churchland would also have done quite well to cover the Blessing of Abstraction and hierarchical modelling (first mentioned in [7]) for their unique effect: they allow training data to be shared across tasks and categories, and thus ameliorate the Curse of Dimensionality. They are how real embodied minds compress their sensory features so as to reduce the necessary sample-complexity of learning to the absolute minimum: sometimes even one single example[18]. I personally hypothesize that the same effect is at work in hierarchical Bayesian modelling as in the recent fad for “deep” learning in artificial neural networks, which learn hierarchies of features: breadth in the lower layers of the model/network provides large amounts of information to quickly train the higher, “abstract” layer of the model/network, which then provides a strong inductive bias to the lower layers. He does mention something like this, however:
This certainly gives an insight into why deep neural networks with sparse later layers work so well: sample information is aggregated in the top layers and then backpropagated to lower layers.
This brings us right back to the Platonism for which Churchland is trying to argue. As usual, we wish to operate under the “game rules” of a very strong naturalism, in which Platonic entities are surely not allowed to be any kind of ontologically “spooky” stuff. After all, we don’t observe any spooky processes interfering in ordinary physical and computational causality to generate thoughts about Platonic Forms or mathematical structures. Instead, we observe embodied, resource-bounded creatures generalizing from data, even if Churchland is a pure connectionist while I favor a probabilistic language of thought. What sort of Platonism would help us explain what goes on in real minds? I think a productive avenue is to view Platonic abstractions as concepts (necessarily compositional concepts of the kind Churchland doesn’t address much, but which are now sometimes described as stochastic functions[8]) which optimally compress a given type of experiential data. We could thus propose Platonic realism about abstract concepts which any reasoner must necessarily develop as they approach the limit of increasing sample data and computational power, and simultaneously Platonic antirealism about abstract concepts which tend to disappear as reasoners gain more information and compute further.
This will probably sound somewhat overwrought and unnecessary to theorists from backgrounds in algorithmic information theory and artificial intelligence. What need does the optimally intelligent “agent”, AIXI, have for Platonic concepts of anything[12]? It just updates a distribution over all possible causal structures and uses it to make predictions. The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information – AIXI can update optimally on sensory information by falling back on its infinite computing power. Such a reasoner, it seems, has no need to compose or decompose causal structures, no need for concepts, but for everyone else, hierarchical representations compress data very efficiently[14]. They also map well onto probabilistic modelling. This trade-off between the decomposability and the degree of compression achieved by any given representation of a concept will have to play a part in a more complete theory of abstract objects as optimally compressed stochastic functions.
Here, though, is a reason for learned representations to be “white-box”, open to introspection and decomposition into smaller concepts: counterfactual-causal reasoning involves zeroing in on a particular random variable in a model and cutting its links to its causal parents. Only white-box representations allow this “graph surgery”; only open-box representations are friendly to causal reasoning about independent, composable concepts rather than whole possible-worlds.
2 Causal reasoning as recurrent-activation-space trajectories
And Churchland does cover causal reasoning! Or at least, he covers reasoning and learning in sequence-prediction tasks, with an elaborate theory of First-Level Learning in recurrent neural networks. Whether this counts as causal reasoning or not depends on whether the reader considers causal reasoning to require modelling counterfactuals and doing graph-surgery to support interventions. Churchland begins by explaining exactly why an embodied organism should want to reason about temporal sequences:
Churchland starts his chapter on temporal and causal learning thusly, noting that for an embodied animal, temporal reasoning provides not only an essential way to handle ecologically necessarily tasks, but a dramatic improvement on the performance of moment-to-moment cognitive distinctions. Thus he theorizes that creatures understand causal models as trajectories through metrically-sculpted activation spaces of recurrent neural networks, isomorphic to the execution traces of a computer program. In fact, he tells the reader, extending an animal’s reasoning in Time helps it to cut reality at the joints, so much so that temporal reasoning may have come first.
He further points out that, since the function of the autonomic nervous system has always been to regulate cyclical bodily processes, recurrent neural networks may actually be the norm in living animals, and could easily have evolved first for autonomic functions before being adapted to aid in temporal cognition. The brain, then, is conceived as a network-of-networks, capable of activating the recurrent evolution of its sub-networks whenever it needs to imagine how some temporal (or computational) process might proceed:
Much of the material from the previous chapter on supervised learning, Hebbian unsupervised learning, and map metaphors is repeated and carried over in this chapter, the better to hammer it home.
3 Criticisms
Now the unfortunate negative. Churchland’s account of conceptual and causal First-Level Learning spends too little explanatory effort, for my tastes at least, on causal-role concepts in particular. Philosophy of mind has long given both feature-governed and role-governed notions of concept, and the cognitive sciences have shown how general learning mechanisms can produce concepts governed by mixtures of sensory features and causal or relational roles[17]. In fact, causal-role concepts appear to form a bedrock for uniquely human thought: humans and other highly intelligent, social animals learn concepts abstracted from their available feature data, of “what something does” rather than “how something looks”. This is how human thought gains its infinitely productive compositionality. In fact, we often utilize concepts grounded so thoroughly in causal role, and so little in feature data, that we forget they “look like” anything at all (more on that when we cover Second-Level Learning and naturalization)! Churchland explicitly mentions how we ought to be able to “index” our “maps” via multiple input modalities, thus enabling us to use concepts abstracted from any one way of obtaining or producing feature data:
He just doesn’t say how the brain does so.
He also gives a theory for identifying maps with each-other, which is to find a homomorphism taking the contents of one map into the contents of the other:
This works just fine for his given example of two-dimensional highway maps, but (at least we have solid reason to think) cannot work when the maps themselves come to express a Turing-complete mode of computation, as in recurrent neural networks. The equality of lambda expressions in general is undecidable, after all; the only open question is whether we can determine equality in some useful, though algorithmically random, subset of cases (as is common in theoretical computer science), or whether we can find some sort of approximate equality-by-degrees that works well-enough for creatures with limited information.
The “map” metaphor also elides the fact that computation, in neural networks, takes place at the synapses, not in the neurons. The actual work is done by the nonlinear transformations of vectors between layers of neurons.
Churchland also fails to elaborate on the differences between training neural networks via backpropagation of errors and training them via Hebbian update rules. This is important: as far as my own background reasoning can find, backpropagation of errors suffices to train a neural network to approximate any circuit (or even any computable partial function if we deal with recurrent networks), while even the most general form of unsupervised Hebbian learning seems to learn the directions of variation within a set of feature vectors, rather than general total or partial recursive functions over the input data.
4 Loose-Leaf Highlights
Churchland on free will:
He extends the matter up to whole societies:
On unsupervised learning without a preestablished system of propositions (as is used in most current Bayesian methods), in defense of connectionism:
Part III
Second-Level Learning: Reductionism, Hierarchies of Theories, Naturalization, and the Progress of the Sciences
If Churchland’s material on First-Level Learning seems, in some ways, like so much outmoded hype about neural networks, his material on Second-Level Learning remains sufficient justification to read his book. Second-Level Learning, the process by which the mind notices that it can repurpose its available conceptual “maps”, and thus comes to form an increasingly unified and coherent picture of the world, is where Churchland hits his (as ever, understated) stride. In addressing Second-Level Learning, Churchland covers the well-worn philosophy-of-science progression of physics from Aristotelian intuitive theories up through Newton and then, eventually, Einstein. This is also where he begins to talk about normatively rational reasoning:
Second-Level Learning is described as just turning old ideas to new uses. The brain more-or-less randomly notices the partial homomorphism of two conceptual “maps” (again: high-dimensional vector spaces with metric compaction based on Hebbian learning in neural networks) and repurposes (and re-trains) the more accurate, detailed, and general “map” (call it the larger map) to predict and describe the phenomena once encompassed by the less accurate, less detailed, and less general “map” (call it the smaller one). Viewed in the larger historical context Churchland gives it, however, Second-Level Learning is the methodology of scientific thought as we have come to understand it. Churchland gives solid reason to hypothesize that by means of Second-Level Learning, human beings and humankind have come to understand our world.
In larger terms, Second-Level Learning consists of naturalizing concepts in terms of other concepts, forming hierarchies of theories.
Our knowledge begins as a vast, disconnected, disparate mish-mash of independent concepts and theories, none of which makes sense in terms of the others, and which leaves us no recourse to any universal terms of explanation. Worse, our intuitive theories are often so disconnected that we may have only one modality of causal access to the objective reality behind any particular concept, perhaps even one so utterly unreliable as subjective introspection.
As we proceed to assemble interlocking hierarchies of theories, however, the increased connectedness of our theories allows us to spread the training information derived from experience and experiment throughout, letting us use the feature-modality behind one concept to inquire about the objective reality behind a seemingly different concept. By judicious application of Second-Level Learning, we develop an increasingly coherent, predictive, unified body of knowledge about the objective reality in which we find ourselves. We also become able to dissolve concepts that no longer make sense by showing what explains their training experiences, and sometimes come to be rationally obligated to reject concepts and theories that just no longer fit our experiences. Consilience can thus be seen as the key to truth, overcoming the exclaimed cries - “But thou must!” - of intuition or apparently-logical argumentation.
This is where Churchland feels a definite need to argue with other major philosophers of science, particularly Karl Popper’s falsificationism (still a staple of many methodology and philosophy-of-science lessons given to grad students everywhere):
Heavy and contentious words already, but well in line with the basic facts about learning and inference discovered by the pioneers of statistical learning theory: as long as one’s theory remains fully deterministic and one’s reasoning fully deductive, one must place absolute faith in experience (which, to wit, experience tells us is unreliable) and meaningfully eliminate hypotheses slowly, if ever. Abductive inference, not deductive, forms the core of real-world scientific reasoning, and one is reminded of Broad’s calling inductive reasoning, “the glory of Science” and yet “the scandal of Philosophy”. Having adopted abduction of inferred models, subject to revision, we can now justify those inferences much better than we could when philosophers talked of inductive reasoning about the certain truth or falsity of propositions. Churchland continues into territory even surer to arouse controversy, among the public if not among professional scientists or philosophers:
Throughout this latter portion of the book, Churchland takes numerous other shots at superstition, religion, model-theoretic philosophical theories of semantics, non-natural normativity, and various other forms of belief in the spooky and weird (whatever joke I may appear to be making here is paraphrased straight from Churchland’s own views). Regarding the last item on the list in particular, Churchland does indeed take an explicit stand in favor of naturalizing normative rationality via Second-Level Learning:
This objection to the “is-ought gap” should be happily received by cognitive scientists everywhere: it is certainly impossible to prove that an algorithm solves a given problem optimally, or even approximately, when we do not know what the problem is. What certain schools of thinking about rationality tend to fail to appreciate is that, particularly when dealing with highly constrained problems of abductive reasoning, we also cannot prove that a certain algorithm is very bad (in failing to approximate or approach an optimal solution, even in the limit of increasing resources) without knowing what the problem to be solved actually is.
Churchland backs up these ideas with a cogent analogy:
5 Hierarchies of Theories and Reductionism
How, then, does Second-Level Learning proceed in the actual, physical brain?
Churchland has, earlier in the book, already proposed an algorithm for inferring the degree to which two maps seem to portray the same domain, and he is deploying it here to explain how the brain can perform inter-theoretic reductions. The only problem, to my eyes, is that as stated above, this algorithm proposes to solve an undecidable problem when we begin to deal with the Turing-complete hypothesis-space represented by recurrent neural networks (and considering finite recurrent networks as learning deterministic finite-state automata just reduces our problem from undecidable to EXPTIME-complete).
On the question of how we come to intertheoretic reductions, Churchland opined that they occur more-or-less randomly, or at least unpredictably:
Thanks to later work, we know that Churchland erred at least somewhat on this point, but that doesn’t make Churchland’s view of intertheoretic reductions irredeemable. Quite to the contrary, later work has ridden to the rescue of Churchland’s Second-Level Learning, presenting us with a map of the landscape of scientific hierarchies. The statistical nature of this map of maps is worth quoting directly for its elegance[16]:
The objective reality we confront on a daily basis not only can be modelled at multiple levels of abstraction, but in order to utilize our experiential data as efficiently as possible, we must model it at multiple levels of abstraction. Macroscopic models explain more of the variation in observable data with fewer parameters, while microscopic models successfully explain a larger portion of the total available data by including even the “sloppier” parameters. How large is the trade-off between these models, in terms of necessary data and generalization power? Extremely large:
The amounts of variation explained by expanding combinations of parameters are distributed exponentially: the plurality of variation can usually be captured with very few parameters (as with intuitive theories that are “fuzzy” even on the mesoscopic scale), the majority with relatively few parameters (as with macroscopically accurate models that ignore microscopic reality), and the whole of variation explained by recourse to increasingly many parameters (as in microscopic models). Note that this exponential distribution of variance explanation adds weight to the Platonism of optimal compressions advocated above, and to Churchland’s Platonism: in order to make efficient use of available experiential data to explain variance and predict well in varying environments, we must form certain abstract concepts, and we must either form them into hierarchies (or to take from mathematical logic, entailment preorders of probabilistic conditioning). An embodied mind most likely cannot feasibly function in real-time without modelling what Churchland calls “the timeless landscape of abstract universals that collectively structure the universe” (even if one doesn’t accord those abstracts any vaunted metaphysical status).
What, then, can we call an intertheoretic reduction, on a modelling level? The perfect answer would be: a deterministic, continuous function from the high-dimensional parameter space of a microscopic model (which has a simple deterministic component but vast uncertainty about parameters) to the low-dimensional parameter space of a macroscopic model (which makes less precise, more stochastic predictions, but allows for more certainty about parameters). In a rare few cases, we can even construct such a function: consider temperature as the average kinetic energy, thus derived from the average velocity, of a body of particles. Even though we cannot feasibly obtain the sample data to know the individual velocity of tens of millions of particles in a jar of air, our microscopic model tells us that averaging those tens of millions of parameters will give us the single macroscopic parameter we call temperature, which is as directly observable as anything via a simple thermometer (whose usage is just another model for the human scientist to learn and employ). Churchland even gives us an example of how these connections between theories aid a nonhuman creature in its everyday cognition:
Usually, intertheoretic reductions are more probabilistic than this, though. Newton generalized his Laws of Motion and calculated the motion of the planets under his laws of gravitation for himself, rather than possessing a function that would construct Kepler’s equations from his. This looks more like evaluating a likelihood function and selecting as his “microscopic” theory the one which gave a higher likelihood to the available data while having a larger support set, as in probabilistic interpretations of scientific reasoning.
6 Naturalization and the Progress of the Sciences
We face a substantial difficulty in employing hierarchies of theories to explain the natural world around us: our meso-scale observable variables are very distantly abstracted from the microscopic phenomena that, under our best scientific theories, form the foundations of reality. On the one hand, this is reassuring: our microscopic theories require huge amounts of free parameters precisely because they reduce large, complex things to aggregations of smaller, simpler things. Since we need many small things to make a large thing, we should find that thinking of the large thing in terms of its constituent small things requires huge amounts of information. However, this also implies that our descriptions of fundamental reality are far more theory-laden than our descriptions of our everyday surroundings. We suffer from a polarization in which humanly intuitive theories and theories of the fundamentals of reality come to occupy the opposite sides of our hierarchy. Thus:
We might call it a symptom of that very polarization that human beings require strict intellectual training to successfully think in a naturalistic, scientific way – Churchland has really switched to philosophy of science instead of mind in this part of the book. Our intuitive theories tend to explain most of the variance visible in our observables, but nonetheless don’t predict all that well. As a result, we tend to just intuitively accept that we can’t entirely understand the world. In fact, modern science has obtained more success from trying to find additional observables that will let us get accurate data about the (usually) less influential, smaller-scale structure and parameters of reality. As Churchland describes it:
“Naturalization” of concepts thus turns out to come in two kinds of inference rather than one. “Upwards” naturalizations, let us say, string a connection from more microscopic theories to more macroscopic concepts. “Downwards” naturalizations, the traditional mode of intertheoretic reduction, connect existing macroscopic concepts and theories to more microscopic theories, exploiting the thoroughness and simplicity of the microscopic theory to provide a well-informed inductive bias to the more macroscopic theory. This inductive bias embodies what we learned, as we developed the microscopic theory, about all the observables we used to learn that theory. We can thus see that both kinds of naturalizations connect our concepts and theories to additional observable variables, thus enabling quicker and more accurate inductive training.
In combination with causal-role concepts and theories thereof, this all comes back to Churchland’s defense of the thesis that abstract objects and properties are both real and natural. The greater the degree of unity we attain in our hierarchical forests of abstract concepts and theories, the more we can justify those abstractions by reference to their role in successful causal description of concrete observations, rather than by abstracted argumentation. The more we naturalize our concepts, the more we feel licensed by Indespensability Arguments to call them real abstract universals (or at least, real abstract generalities of the neighborhood of reality we happen to live in), despite their being mere inferred theories bound ultimately to empirical data[15].
Certain naive forms of scientific realism would thus say that we are thus, through our scientific progress, coming to understand reality on a single, supreme, fundamental level. Churchland disagrees, and I concur with his disagreement.
To the contrary, a single Ur-map would be an extremely high-dimensional model, would require an extremely large amount of data to train, and would carry an extraordinarily large chance of overfitting after we had trained it. Entailment preorders of maps compress and represent experiential data far more efficiently than a single Ur-map, even if we know there exists a single underlying objective reality. In fact, we might often possess multiple maps of similar, or even identical, objective domains:
Churchland emphasizes that the final emphasis must be on empiricism and (sometimes counterfactual) observability:
Churchland is, of course, reciting the naturalist creed by stating that “every aspect of reality is somehow in causal interaction with the rest of reality” (or at least, it was in its past or will be in its future). This is a bullet both he and I can gladly bite, however. I can also add that since Second-Level Learning enables us to cohere our concepts into vast, inter-related preorders over time, it also enables us to gain increasing certainty about which conceptual maps refer to real abstract objects (optimal generalizations of properties of other maps), real concrete objects (which participate directly in causality), and apparent objects actually derived from erroneous inferences. As we learn more and integrate our concepts, real concrete and abstract objects come to be tied together, whereas unreal concrete objects (like superstitions) or abstract objects (like false philosophical frameworks) come to be increasingly isolated in our framework of maps of the world. A more integrated, naturalistic explanation for the experiential phenomena which originally gave birth to a model of unreal concrete or abstract objects can, if we allow ourselves to admit it into our worldview, clear up the experiential confusion and clear away the “zombie concepts”.
Part IV
Third-Level Learning: Cultural Progress
In the third major part of the book, although the shortest, we finally arrive to the domain of learning and thought in which we deal exclusively with human beings communicating via language. Churchland opens the chapter almost apologetically:
Unfortunately, this statement appears to ignore the close links between probabilistic inference and the entire rest of statistical learning theory, including the neural networks that form the foundation for Churchland’s theory of cognition in the First-Level Learning chapters. Alas.
Still, Churchland’s skepticism regarding the “language of thought” hypothesis makes a great deal of intuitive sense. It takes thorough study to learn the difference between formal systems (sets of axioms demonstrated to have a model) from the foundations of mathematics, and formal languages (notations for computations) in the science of computing, although Douglas Hofstadter did write the world’s premier “pop comp-sci” text on exactly that matter[11]. Furthermore, any given spoken or written sentence, in formal or informal language, contains fairly little communicable information relative to the size of an entire mental model of a relevant domain, as Churchland has spotted:
Unlike in much of analytic philosophy, the science of computing takes programs and programming languages to simply be different ways of writing down calculations, to the point that the field of denotational semantics for programming remains relatively small relative to the study of proving which computations the program carries out. A hypothesis regarding neurocomputation which can explain how learning and commonsense reasoning take place would apply, via the Church-Turing Thesis, to neural nets as well as Turing machines.
Third-Level Learning is perhaps a misnomer, since as far as I know, it does not actually come third in any particular causal or historical ordering. After all, humans communicated ideas, and thus carried out Third-Level Learning, long before we ever engaged seriously in reductionist science, and if standardized test scores show anything at all, they surely show that our societies have invented sophisticated systems devoted to ensuring that existing ideas are passed down to children as-is. In fact, the educational system often performs quite reliably, in the sense that the children consistently pass their exams, even if we all ritually lament the failure to pass down the true understanding and clarity once achieved by discoverers, inventors, and teachers. Such true understanding, Churchland would say, involves a high-dimensional conceptual map sculpted by large sums of experiential data. Perhaps we indeed ought to pessimistically expect that such high-dimensional understanding cannot be passed down accurately, even though teaching is a well-developed science (albeit, one prone to fads whose occasional serious results are also often ignored in favor of “how it’s always been done” or “the strong students will survive”). After all, as Churchland says:
Third-Level Learning, then, consists in using a Map-Portrayal Semantics for language (and other forms of human communication) to pass down maps that, according to the Domain Portrayal Semantics Churchland posits, accurately portray some piece of local reality. It may come before or after Second-Level Learning in our history, but it surely occurs. By means of evocative and descriptive language, human beings can index each-other’s maps and even, through carefully chosen series of evocations, describe their conceptual maps to each-other. Although other vocalizing species - such as wolves, nonhuman great apes, and some marine mammals - display the former ability to signal to each-other with sound, humans are exclusive in having the latter ability: to systematically educate each-other, passing on whole conceptual frameworks from their original discoverers to vast social peer-groups. By this means, human intellectual life surpasses the individual human:
One might think that little can be said about education by someone other than a professional expert on education, but Churchland does have an important point to make in describing Third-Level Learning: it is a form of learning, not a form of something other than learning. In particular, he explicitly criticizes the “memetic” theory of cultural “evolution”, for attempting to ground culture in Darwinist principles without making any reference to such obvious participants in culture as the mind and brain:
Churchland also notes that reasoning can work, even when individual reasoners don’t quite understand how or why they reason, as in the case of scientists with too little knowledge of methodology:
In fact, he even demands that we account for the Third-Level Learning and reasoning of others in such “unclean” fields as politics:
Churchland also notes how successful Third-Level Learning ultimately requires engaging, sometimes, in successful Second-Level Learning, attributed to Kuhnian “paradigm shifts”:
He then ends the book on a positive note:
Unfortunately, I do feel that this “up-ending” opens Churchland to a substantive criticism, namely: he has failed to address anything outside the sciences. Since most actually existing humans are neither scientists nor science hobbyists, one would think that a book about the brain would bother to address the vast domains of human life outside the halls of academic science, lest one be reminded of Professor Smith in Piled Higher and Deeper justifying the professorial career pyramid just by making everything outside academic science sound scary.
I suppose that Churchland’s own career and position as a philosopher of mind and science led him to write as chiefly addressing domains he thoroughly understands, but I, at least, think his core thesis draws strength from its potential applications outside those domains. If Churchland, and much other literature, can explain a naturalistic theory of how the brain comes to understand abstract, immaterial objects and properties in such domains as science and mathematics, then why not in, say, aesthetics, ethics, or the emotional life? Among the first abstract properties posited at the beginnings of any human culture are beauty and goodness, among the first abstract objects, the soul. It may sound suddenly religious to speak of the soul when talking about science and statistical modelling, but eliminativism on these “soulful” objects and properties has always stood as the largest bullet for naturalists to bite. Having a constructive-naturalist theory to apply to “soulful” subjects of inquiry could turn the bitter bullet into a harmless sugar pill.
Churchland also spent an entire book talking about the brain without ever once mentioning subjective consciousness/experience, for reasons of, I suspect, the same sort of greedy eliminativism.
However, that might just mean I need to put both Churchland’s earlier work - like Matter and Consciousness[4], Engine of Reason, Seat of the Soul[5], and Patricia Churchland’s Braintrust[2] - on my reading list to see what they have to say on such subjects.
References
[1] Alonzo Church. An unsolvable problem of elementary number theory. American Journal of Mathematics, 58(2):345–363, April 1936.
[2] Patricia Smith. Churchland. Braintrust: What Neuroscience Tells Us about Morality. Princeton University Press, Princeton, N.J., 2011.
[3] Paul Churchland. Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals. MIT Press, 2012.
[4] Paul Churchland. Matter and Consciousness. MIT Press, Cambridge, 2013.
[5] Paul M. Churchland. The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain. MIT Press, Cambridge, 1995.
[6] C. E. Freer, D. M. Roy, and J. B. Tenenbaum. Towards common-sense reasoning via conditional simulation: Legacies of Turing in Artificial Intelligence. Turing’s Legacy (ASL Lecture Notes in Logic), 2012.
[7] N. D. Goodman, T. D. Ullman, , and J. B. Tenenbaum. Learning a theory of causality. Psychological review, 2011.
[8] Noah D Goodman, Joshua B Tenenbaum, and T Gerstenberg. Concepts in a probabilistic language of thought. MIT Press, 2015.
[9] T. L. Griffiths, F. Lieder, and N. D. Goodman. Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, To appear.
[10] Peter D. Grünwald and Paul M. B. Vitányi. Algorithmic information theory. CoRR, abs/0809.2754, 2008.
[11] Douglas R. Hofstadter. Godel, Escher, Bach: An Eternal Golden Braid. Basic Books, Inc., New York, NY, USA, 1979.
[12] Marcus Hutter. Universal algorithmic intelligence: A mathematical top-down approach. In B. Goertzel and C. Pennachin, editors, Artificial General Intelligence, Cognitive Technologies, pages 227–290. Springer, Berlin, 2007.
[13] Milad Kharratzadeh and Thomas Shultz. Neural implementation of probabilistic models of cognition.
[14] John C. Kieffer. A tutorial on hierarchical lossless data compression. In Moshe Dror, Pierre L’Ecuyer, and Ferenc Szidarovszky, editors, Modeling Uncertainty, volume 46 of International Series in Operations Research & Management Science, pages 711–733. Springer US, 2005.
[15] David Liggins. Quine, Putnam, and the ‘Quine-Putnam’ indispensability argument. Erkenntnis (1975-), 68(1):pp. 113–127, 2008.
[16] Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, and James P. Sethna. Parameter space compression underlies emergent theories and predictive models. Science, 342(6158):604–607, 2013.
[17] Thomas L. Griffiths Noah D. Goodman, Joshua B. Tenenbaum and Jacob Feldman. Compositionality in rational analysis: Grammar-based induction for concept learning. In Nick Chater and Mike Oaksford, editors, The Probabilistic Mind: Prospects for Bayesian Cognitive Science. Oup Oxford, 2008.
[18] Ruslan Salakhutdinov, Joshua B. Tenenbaum, and Antonio Torralba. One-shot learning with a hierarchical nonparametric bayesian model. Journal of Machine Learning Research - Proceedings Track, 27:195–206, 2012.
[19] Lei Shi and Thomas L. Griffiths. Neural implementation of hierarchical bayesian inference by importance sampling. In Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1669–1677. Curran Associates, Inc., 2009.