Don't You Care If It Works? - Part 2
Part 2 – Winstrumental
The forgotten fifth virtue
Remember, you can't be wrong unless you take a position. Don't fall into that trap.
-- Scott Adams, Dogbert's Top Secret Management Handbook
CronoDAS posted this in a reply to my poem, and I dismissed him because my typical mind is typical. I would never make that mistake, so I didn’t think it’s a big deal. But it is. In the comments to part 1 a lot of people are heartily disagreeing with everything I wrote. I admire and respect them. I already made a correction to a part of the post which was wrong. Unfortunately, a lot of people reading this couldn’t disagree if they wanted to, because they don’t have an account. I get that lurking is fun, but if you’re spending hours and hours on LessWrong and not posting anything I think you’re doing yourself a disservice.
In part 1 I speculated a lot about what goes on in Eliezer’s mind, knowing full well that Eliezer could read this and say that I’m wrong and I will have no comeback but pure embarrassment. What kind of foolhardy dunce would risk such a thing? Let me answer with another question: how else could I possibly change my mind? After reading them for a year, I have strong opinions on the goals and lessons of the sequences, and the only way to find out if I’m right or wrong is to open myself up to challenge. Worst case: people agree with me and I get sweet sweet karma. Best case: I become wiser. Am I at risk of sticking to an opinion too long just because I wrote it down? Yes, but I know I have that bias, anything known is something I can adjust for. If I don’t argue I don’t know what I don’t know.
If you want a chance to change your opinions, you have to put them where they can hurt you. Or to use an Umeshism: if you’ve never been proven an idiot on the internet you’re not learning enough from the internet.
Back to Harvard
Why don’t the psychologists at Harvard switch to reviewing nameless CVs? Well, why would they? They are tenured Harvard professors, they already won! There was no bias shown for assessing stellar CVs, only those on the margins. So they’re not missing out on any superstars, at worst they hire some gentleman who would be their 32nd strongest faculty member instead of a lady who would be 29th. Would you cause a fuss if you were there?
In “Thinking Fast and Slow” Kahnemann writes that he noticed suffering from the halo effect when grading student exams. If a student did well on herfirst essay Kahnemann gave her the benefit of the doubt on later questions. He switched to grading all the answers to question 1, then all the answers to question 2 and so on. It took more time, but the grades were more accurate and fair. What’s my point? I guess it’s possible to “win at rationality” without a strong incentive, just maybe it takes a Nobel-level rationalist to do so.
Winning isn’t everything?
Vince Lombardi said that “Winning isn’t everything, it’s the only thing.” Aren’t you jealous of him? It’s so simple! I think the most common question asked of our community, mostly by our community, is why we don’t “win” as much as we think we are supposed to. In a rare display of good sense, I’m not going to speculate about why any of you don’t win, I’ll talk about myself.
My job isn’t as interesting, meaningful and full of potential as I would hope for. Why don’t I apply rationality to win at building a better career? Because when I think about it I remember that my job is also decently paying, secure, and full of decent people. My job is easy, and winning is hard. When I read about Nate Soares trying to save the world I feel a little inspired and a little ashamed that I’m not. Nate is almost certainly a better mathematician that I am, but I don’t think there’s a gargantuan gap between us. The big gap between Nate and me is in the desire to win. In my heart of hearts, I just don’t want to save the world as much as he does.
Love wins
What could I possibly want more than saving the world?
There are two ladies, let’s call them Rachel and Leah since my username is reminiscent of the Biblical Jacob. I met Rachel at the desert well (OKCupid) and we went on a few dates and at the same time Leah also replied to me on OKCupid and we also went on a few dates. Then there were some situations and complications and my desire not to be an asshole so I decided that I had to choose one. The basic heuristic I would normally use pointed slightly to Rachel, but I kept vacillating back and forth for a few days, they were both much more attractive than any other girl I ever met through the site. Suddenly it hit me like Chuck Norris: this is an important decision, with huge stakes, one that I would have to make based on incomplete information with my brain biology trying to trip me up every step of the way. Might not this call for some EWOR?
I got to work. I introspected on past relationships and read the relevant science literature to come up with a weighted list of qualities I am looking for to maximize my chances of a happy long-term relationship. I wrote down all the evidence that could affect my assessment each quality for each lady, and employed every method I could think of to debias myself and give my best guess at the ratings. Then I peeked for the first time at the final score, and it was very surprising. My gut expected Rachel to be slightly ahead, but Leah won handily. I stared at the numbers for a while. Maybe I was too critical here? Overweighted this category there? No! The ghost of Eliezer wouldn’t let me change the bottom line from a formula to a value cell. And then, after 30 minutes of staring at the numbers, my intuition started catching up. For example, my impression from the first date was that Leah wasn’t very funny, and it stuck. When I actually wrote down the evidence, I remembered that she cracked me up once on our second date and a couple of times on our third date as she was slowly beginning to open up and trust me. I gave her a higher rating on humor-compatibility than I thought I would. I closed the spreadsheet and went to sleep. Two days later I broke up with Rachel.
Was I accurate in assessing Leah? Not exactly. She’s above and beyond anything I could’ve guessed. If I don’t “win” a single thing more from my rationality training than the few months I have gotten to spend so far with her, I’ve won enough.
Did I just praise disagreement?
I told this story about Leah to someone at a rationalist gathering. I thought he might congratulate me on my achievement in rationality or denounce me as a cold and heartless robot. His actual reaction caught me completely by surprise: he just flat out didn’t believe me. He said that I probably used a spreadsheet to justify after the fact a decision that my gut had already made. The idea of someone applying something like EWOR that belongs on internet forums, to something like picking a woman to date was so foreign to him that he rejected it outright. I could almost hear him screaming separate magisteria!
Getting to the points
I’m no good at writing pithy summaries. If you saw a good point anywhere in those two posts, grab it. I can’t help you. For what it’s worth, here’s Jacobian’s guide to actually using rationality to win:
1. If you don’t believe you can, Luke, don’t bother. But if you’re not sure whether it works, wouldn’t it be interesting to find out?
2. Taking ideas seriously requires work, maybe even *gasp* doing math. If you disagree with Eliezer or anyone else on a matter of math or science, sit down and figure it out. Don’t just read stuff, write stuff. Write a bit of code that simulates a probability problem. Derive something from Shrodinger’s equation on a piece of paper. Reading stuff is useful, but it’s not work; rationality is work.
3. If there’s an opinion that you’re afraid you may be irrationally attached to and you have a real desire to find out the truth, post it on LessWrong. Don’t post things that are 99.999% true, they probably are. Post what you’re 80% sure about, that’s a 20% chance to really learn something. People will call you an idiot online, that’s what the internet is for. Losing karma is how you become smarter, it’s quite a thrill.
4. Rationality will not change your entire life at once. Pick one thing that you want to win at and apply rationality to it. Just one, but one where you’ll know if you won or lost, so “being wiser” doesn’t count. Getting laid counts. If you take an L, you’ll learn a lot. If you win, you’ll know that the force is yours to command.
Who knows, maybe in a few years you’ll think you’re strong enough to save the world or something.
Rationality Reading Group: Part F: Politics and Rationality
This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.
Welcome to the Rationality reading group. This fortnight we discuss Part F: Politics and Rationality (pp. 255-289). This post summarizes each article of the sequence, linking to the original LessWrong post where available.
F. Politics and Rationality
57. Politics is the Mind-Killer - People act funny when they talk about politics. In the ancestral environment, being on the wrong side might get you killed, and being on the correct side might get you sex, food, or let you kill your hated rival. If you must talk about politics (for the purpose of teaching rationality), use examples from the distant past. Politics is an extension of war by other means. Arguments are soldiers. Once you know which side you're on, you must support all arguments of that side, and attack all arguments that appear to favor the enemy side; otherwise, it's like stabbing your soldiers in the back - providing aid and comfort to the enemy. If your topic legitimately relates to attempts to ban evolution in school curricula, then go ahead and talk about it, but don't blame it explicitly on the whole Republican/Democratic/Liberal/Conservative/Nationalist Party.
58. Policy Debates Should Not Appear One-Sided - Debates over outcomes with multiple effects will have arguments both for and against, so you must integrate the evidence, not expect the issue to be completely one-sided.
59. The Scales of Justice, the Notebook of Rationality - People have an irrational tendency to simplify their assessment of things into how good or bad they are without considering that the things in question may have many distinct and unrelated attributes.
60. Correspondence Bias - Also known as the fundamental attribution error, refers to the tendency to attribute the behavior of others to intrinsic dispositions, while excusing one's own behavior as the result of circumstance.
61. Are Your Enemies Innately Evil? - People want to think that the Enemy is an innately evil mutant. But, usually, the Enemy is acting as you might in their circumstances. They think that they are the hero in their story and that their motives are just. That doesn't mean that they are right. Killing them may be the best option available. But it is still a tragedy.
62. Reversed Stupidity Is Not Intelligence - The world's greatest fool may say the Sun is shining, but that doesn't make it dark out. Stalin also believed that 2 + 2 = 4. Stupidity or human evil do not anticorrelate with truth. Arguing against weaker advocates proves nothing, because even the strongest idea will attract weak advocates.
63. Argument Screens Off Authority - There are many cases in which we should take the authority of experts into account, when we decide whether or not to believe their claims. But, if there are technical arguments that are available, these can screen off the authority of experts.
64. Hug the Query - The more directly your arguments bear on a question, without intermediate inferences, the more powerful the evidence. We should try to observe evidence that is as near to the original question as possible, so that it screens off as many other arguments as possible.
65. Rationality and the English Language - George Orwell's writings on language and totalitarianism are critical to understanding rationality. Orwell was an opponent of the use of words to obscure meaning, or to convey ideas without their emotional impact. Language should get the point across - when the effort to convey information gets lost in the effort to sound authoritative, you are acting irrationally.
66. Human Evil and Muddled Thinking - It's easy to think that rationality and seeking truth is an intellectual exercise, but this ignores the lessons of history. Cognitive biases and muddled thinking allow people to hide from their own mistakes and allow evil to take root. Spreading the truth makes a real difference in defeating evil.
This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Part G: Against Rationalization (pp. 293-339). The discussion will go live on Wednesday, 12 August 2015, right here on the discussion forum of LessWrong.
Astronomy, Astrobiology, & The Fermi Paradox I: Introductions, and Space & Time
This is the first in a series of posts I am putting together on a personal blog I just started two days ago as a collection of my musings on astrobiology ("The Great A'Tuin" - sorry, I couldn't help it), and will be reposting here. Much has been written here about the Fermi paradox and the 'great filter'. It seems to me that going back to a somewhat more basic level of astronomy and astrobiology is extremely informative to these questions, and so this is what I will be doing. The bloggery is intended for a slightly more general audience than this site (hence much of the content of the introduction) but I think it will be of interest. Many of the points I will be making are ones I have touched on in previous comments here, but hope to explore in more detail.
This post references my first two posts - an introduction, and a discussion of our apparent position in space and time in the universe. The blog posts may be found at:
http://thegreatatuin.blogspot.com/2015/07/whats-all-this-about.html
http://thegreatatuin.blogspot.com/2015/07/space-and-time.htm
Experiences in applying "The Biodeterminist's Guide to Parenting"
I'm posting this because LessWrong was very influential on how I viewed parenting, particularly the emphasis on helping one's brain work better. In this context, creating and influencing another person's brain is an awesome responsibility.
It turned out to be a lot more anxiety-provoking than I expected. I don't think that's necessarily a bad thing, as the possibility of screwing up someone's brain should make a parent anxious, but it's something to be aware of. I've heard some blithe "Rational parenting could be a very high-impact activity!" statements from childless LWers who may be interested to hear some experiences in actually applying that.
One thing that really scared me about trying to raise a child with the healthiest-possible brain and body was the possibility that I might not love her if she turned out to not be smart. 15 months in, I'm no longer worried. Evolution has been very successful at producing parents and children that love each other despite their flaws, and our family is no exception. Our daughter Lily seems to be doing fine, but if she turns out to have disabilities or other problems, I'm confident that we'll roll with the punches.
Cross-posted from The Whole Sky.
Before I got pregnant, I read Scott Alexander's (Yvain's) excellent Biodeterminist's Guide to Parenting and was so excited to have this knowledge. I thought how lucky my child would be to have parents who knew and cared about how to protect her from things that would damage her brain.
Real life, of course, got more complicated. It's one thing to intend to avoid neurotoxins, but another to arrive at the grandparents' house and find they've just had ant poison sprayed. What do you do then?
Here are some tradeoffs Jeff and I have made between things that are good for children in one way but bad in another, or things that are good for children but really difficult or expensive.
Germs and parasites
The hygiene hypothesis states that lack of exposure to germs and parasites increases risk of auto-immune disease. Our pediatrician recommended letting Lily playing in the dirt for this reason.
While exposure to animal dander and pollution increase asthma later in life, it seems that being exposed to these in the first year of life actually protects against asthma. Apparently if you're going to live in a house with roaches, you should do it in the first year or not at all.
Except some stuff in dirt is actually bad for you.
Scott writes:
Parasite-infestedness of an area correlates with national IQ at about r = -0.82. The same is true of US states, with a slightly reduced correlation coefficient of -0.67 (p<0.0001). . . . When an area eliminates parasites (like the US did for malaria and hookworm in the early 1900s) the IQ for the area goes up at about the right time.
Living with cats as a child seems to increase risk of schizophrenia, apparently via toxoplasmosis. But in order to catch toxoplasmosis from a cat, you have to eat its feces during the two weeks after it first becomes infected (which it’s most likely to do by eating birds or rodents carrying the disease). This makes me guess that most kids get it through tasting a handful of cat litter, dirt from the yard, or sand from the sandbox rather than simply through cat ownership. We live with indoor cats who don’t seem to be mousers, so I’m not concerned about them giving anyone toxoplasmosis. If we build Lily a sandbox, we’ll keep it covered when not in use.
The evidence is mixed about whether infections like colds during the first year of life increase or decrease your risk of asthma later. After the newborn period, we defaulted to being pretty casual about germ exposure.
Toxins in buildings
Our experiences with lead. Our experiences with mercury.
In some areas, it’s not that feasible to live in a house with zero lead. We live in Boston, where 87% of the housing was built before lead paint was banned. Even in a new building, we’d need to go far out of town before reaching soil that wasn’t near where a lead-painted building had been.
It is possible to do some renovations without exposing kids to lead. Jeff recently did some demolition of walls with lead paint, very carefully sealed off and cleaned up, while Lily and I spent the day elsewhere. Afterwards her lead level was no higher than it had been.
But Jeff got serious lead poisoning as a toddler while his parents did major renovations on their old house. If I didn’t think I could keep the child away from the dust, I wouldn’t renovate.
Recently a house across the street from us was gutted, with workers throwing debris out the windows and creating big plumes of dust (presumable lead-laden) that blew all down the street. Later I realized I should have called city building inspection services, which would have at least made them carry the debris into the dumpster instead of throwing it from the second story.
Floor varnish releases formaldehyde and other nasties as it cures. We kept Lily out of the house for a few weeks after Jeff redid the floors. We found it worthwhile to pay rent at our previous house in order to not have to live in the new house while this kind of work was happening.
Pressure-treated wood was treated with arsenic and chromium until around 2004 in the US. It has a greenish tint, though this may have faded with time. Playing on playsets or decks made of such wood increases children's cancer risk. It should not be used for furniture (I thought this would be obvious, but apparently it wasn't to some of my handyman relatives).
I found it difficult to know how to deal with fresh paint and other fumes in my building at work while I was pregnant. Women of reproductive age have a heightened sense of smell, and many pregnant women have heightened aversion to smells, so you can literally smell things some of your coworkers can’t (or don’t mind). The most critical period of development is during the first trimester, when most women aren’t telling the world they’re pregnant (because it’s also the time when a miscarriage is most likely, and if you do lose the pregnancy you might not want to have to tell the world). During that period, I found it difficult to explain why I was concerned about the fumes from the roofing adhesive being used in our building. I didn’t want to seem like a princess who thought she was too good to work in conditions that everybody else found acceptable. (After I told them I was pregnant, my coworkers were very understanding about such things.)
Food
Recommendations usually focus on what you should eat during pregnancy, but obviously children’s brain development doesn’t stop there. I’ve opted to take precautions with the food Lily and I eat for as long as I’m nursing her.
Claims that pesticide residues are poisoning children scare me, although most scientists seem to think the paper cited is overblown. Other sources say the levels of pesticides in conventionally grown produce are fine. We buy organic produce at home but eat whatever we’re served elsewhere.
I would love to see a study with families randomly selected to receive organic produce for the first 8 years of the kids’ lives, then looking at IQ and hyperactivity. But no one’s going to do that study because of how expensive 8 years of organic produce would be.
The Biodeterminist’s Guide doesn’t mention PCBs in the section on fish, but fish (particularly farmed salmon) are a major source of these pollutants. They don’t seem to be as bad as mercury, but are neurotoxic. Unfortunately their half-life in the body is around 14 years, so if you have even a vague idea of getting pregnant ever in your life you shouldn’t be eating farmed salmon (or Atlantic/farmed salmon, bluefish, wild striped bass, white and Atlantic croaker, blackback or winter flounder, summer flounder, or blue crab).
I had the best intentions of eating lots of the right kind of high-omega-3, low-pollutant fish during and after pregnancy. Unfortunately, fish was the only food I developed an aversion to. Now that Lily is eating food on her own, we tried several sources of omega-3 and found that kippered herring was the only success. Lesson: it’s hard to predict what foods kids will eat, so keep trying.
In terms of hassle, I underestimated how long I would be “eating for two” in the sense that anything I put in my body ends up in my child’s body. Counting pre-pregnancy (because mercury has a half-life of around 50 days in the body, so sushi you eat before getting pregnant could still affect your child), pregnancy, breastfeeding, and presuming a second pregnancy, I’ll probably spend about 5 solid years feeding another person via my body, sometimes two children at once. That’s a long time in which you have to consider the effect of every medication, every cup of coffee, every glass of wine on your child. There are hardly any medications considered completely safe during pregnancy and lactation—most things are in Category C, meaning there’s some evidence from animal trials that they may be bad for human children.
Fluoride
Too much fluoride is bad for children’s brains. The CDC recently recommended lowering fluoride levels in municipal water (though apparently because of concerns about tooth discoloration more than neurotoxicity). Around the same time, the American Dental Association began recommending the use of fluoride toothpaste as soon as babies have teeth, rather than waiting until they can rinse and spit.
Cavities are actually a serious problem even in baby teeth, because of the pain and possible infection they cause children. Pulling them messes up the alignment of adult teeth. Drilling on children too young to hold still requires full anesthesia, which is dangerous itself.
But Lily isn’t particularly at risk for cavities. 20% of children get a cavity by age six, and they are disproportionately poor, African-American, and particularly Mexican-American children (presumably because of different diet and less ability to afford dentists). 75% of cavities in children under 5 occur in 8% of the population.
We decided to have Lily brush without toothpaste, avoid juice and other sugary drinks, and see the dentist regularly.
Home pesticides
One of the most commonly applied insecticides makes kids less smart. This isn’t too surprising, given that it kills insects by disabling their nervous system. But it’s not something you can observe on a small scale, so it’s not surprising that the exterminator I talked to brushed off my questions with “I’ve never heard of a problem!”
If you get carpenter ants in your house, you basically have to choose between poisoning them or letting them structurally damage the house. We’ve only seen a few so far, but if the problem progresses, we plan to:
1) remove any rotting wood in the yard where they could be nesting
2) have the perimeter of the building sprayed
3) place gel bait in areas kids can’t access
4) only then spray poison inside the house.
If we have mice we’ll plan to use mechanical traps rather than poison.
Flame retardants
Since the 1970s, California required a high degree of flame-resistance from furniture. This basically meant that US manufacturers sprayed flame retardant chemicals on anything made of polyurethane foam, such as sofas, rug pads, nursing pillows, and baby mattresses.
The law recently changed, due to growing acknowledgement that the carcinogenic and neurotoxic chemicals were more dangerous than the fires they were supposed to be preventing. Even firefighters opposed the use of the flame retardants, because when people die in fires it’s usually from smoke inhalation rather than burns, and firefighters don’t want to breathe the smoke from your toxic sofa (which will eventually catch fire even with the flame retardants).
We’ve opted to use furniture from companies that have stopped using flame retardants (like Ikea and others listed here). Apparently futons are okay if they’re stuffed with cotton rather than foam. We also have some pre-1970s furniture that tested clean for flame retardants. You can get foam samples tested for free.
The main vehicle for children ingesting the flame retardants is that it settles into dust on the floor, and children crawl around in the dust. If you don’t want to get rid of your furniture, frequent damp-mopping would probably help.
The standards for mattresses are so stringent that the chemical sprays aren’t generally used, and instead most mattresses are wrapped in a flame-resistant barrier which apparently isn’t toxic. I contacted the companies that made our mattresses and they’re fine.
Ratings for chemical safety of children’s car seats here.
Thoughts on IQ
A lot of people, when I start talking like this, say things like “Well, I lived in a house with lead paint/played with mercury/etc. and I’m still alive.” And yes, I played with mercury as a child, and Jeff is still one of the smartest people I know even after getting acute lead poisoning as a child.
But I do wonder if my mind would work a little better without the mercury exposure, and if Jeff would have had an easier time in school without the hyperactivity (a symptom of lead exposure). Given the choice between a brain that works a little better and one that works a little worse, who wouldn’t choose the one that works better?
We’ll never know how an individual’s nervous system might have been different with a different childhood. But we can see population-level effects. The Environmental Protection Agency, for example, is fine with calculating the expected benefit of making coal plants stop releasing mercury by looking at the expected gains in terms of children’s IQ and increased earnings.
Scott writes:
A 15 to 20 point rise in IQ, which is a little more than you get from supplementing iodine in an iodine-deficient region, is associated with half the chance of living in poverty, going to prison, or being on welfare, and with only one-fifth the chance of dropping out of high-school (“associated with” does not mean “causes”).
Salkever concludes that for each lost IQ point, males experience a 1.93% decrease in lifetime earnings and females experience a 3.23% decrease. If Lily would earn about what I do, saving her one IQ point would save her $1600 a year or $64000 over her career. (And that’s not counting the other benefits she and others will reap from her having a better-functioning mind!) I use that for perspective when making decisions. $64000 would buy a lot of the posh prenatal vitamins that actually contain iodine, or organic food, or alternate housing while we’re fixing up the new house.
Conclusion
There are times when Jeff and I prioritize social relationships over protecting Lily from everything that might harm her physical development. It’s awkward to refuse to go to someone’s house because of the chemicals they use, or to refuse to eat food we’re offered. Social interactions are good for children’s development, and we value those as well as physical safety. And there are times when I’ve had to stop being so careful because I was getting paralyzed by anxiety (literally perched in the rocker with the baby trying not to touch anything after my in-laws scraped lead paint off the outside of the house).
But we also prioritize neurological development more than most parents, and we hope that will have good outcomes for Lily.
Entrepreneurial autopsies
Entrepreneurial ideas come and go. Some I don't give a second thought to. Others I commence market research for, examine the competitive landscape and explore the feasibility for development. This can be time consuming, and has yet to have produced any tangible, commercialized product.
I figure it's about time I devote the time I would spend to exploiting my existing repertoire of knowledge to develop an idea, to exploring parsimonious, efficient techniques for assessing viability.
In my search I found [Autopsy.io], a startup graveyard. Founders describe why their startups failed, concisely. It made me think about my past startup ideas and why they haven't flied.
I'm going to work that out, put it in a spreadsheet and regress to whatever problem keeps popping up - then, I'll work on improving my subject matter knowledge in that domain - for example, if its the feasibility of implementing with existing technology - I might learn more about the current technological landscape in general. Or, more about existing services for investors, if my product is a service for investors, like my last startup idea, which I have autopsied in detail here
I just thought I'd share my general strategy for anyone who'd want to copy this procedure for startup autopsy. Please use this space to suggest other appropriate diagnostic methods.
edit 1: Thanks for pointing out the typos :)
'Charge for something and make more than you spend' - Marco Arment, Founder of Instapaper
Zooming your mind in and out
I recently noticed I had two mental processes opposing one another in an interesting way.
The first mental process was instilled by reading Daniel Kahneman on the focusing illusion and Paul Graham on procrastination. This process encourages me to "zoom out" when engaging in low-value activities so I can see they don't deliver much value in the grand scheme of things.
The second mental process was instilled by reading about the importance of just trying things. (These articles could be seen as steelmanning Mark Friedenbach's recent Less Wrong critique.) This mental process encourages me to "zoom in" and get my hands dirty through experimentation.
Both these processes seem useful. Instead of spending long stretches of time in either the "zoomed in" or "zoomed out" state, I think I'd do better flip-flopping between them. For example, if I'm wandering down internet rabbit holes, I'm spending too much time zoomed in. Asking "why" repeatedly could help me realize I'm doing something low value. If I'm daydreaming or planning lots with little doing, I'm spending too much time zoomed out. Asking "how" repeatedly could help me identify a first step.
This fits in with construal level theory, aka "near/far theory" as discussed by Robin Hanson. (I recommend the reviews Hanson links to; they gave me a different view of the concept than his standard presentation.) To be more effective, maybe one should increase cross communication between the "near" and "far" modes, so the parts work together harmoniously instead of being at odds.
If Hanson's view is right, maybe the reason people become uncomfortable when they realize they are procrastinating (or not Just Trying It) is that this maps to getting caught red-handed in an act of hypocrisy in the ancestral environment. You're pursuing near interests (watching Youtube videos) instead of working towards far ideals (doing your homework)? For shame!
(Possible cure: Tell yourself that there's nothing to be ashamed of if you get stuck zoomed in; it happens to everyone. Just zoom out.)
Part of me is reluctant to make this post, because I just had this idea and it feels like I should test it out more before writing about it. So here are my excuses:
1. If I wait until I develop expertise in everything, it may be too late to pass it on.
2. In order to see if this idea is useful, I'll need to pay attention to it. And writing about it publicly is a good way to help myself pay attention to it, since it will become part of my identity and I'll be interested to see how people respond.
There might be activities people already do on a regular basis that consist of repeated zooming in and out. If so, engaging in them could be a good way to build this mental muscle. Can anyone think of something like this?
The Pre-Historical Fallacy
One fallacy that I see frequently in works of popular science -- and also here on LessWrong -- is the belief that we have strong evidence of the way things were in pre-history, particularly when one is giving evidence that we can explain various aspects of our culture, psychology, or personal experience because we evolved in a certain way. Moreover, it is held implicit that because we have this 'strong evidence', it must be relevant to the topic at hand. While it is true that the environment did effect our evolution and thus the way we are today, evolution and anthropology of pre-historic societies is emphasized to a much greater extent than rational thought would indicate is appropriate.
As a matter of course, you should remember these points whenever you hear a claim about prehistory:
- Most of what we know (or guess) is based on less data than you would expect, and the publish or perish mentality is alive and well in the field of anthropology.
- Most of the information is limited and technical, which means that anyone writing for a popular audience will have strong motivation to generalize and simplify.
- It has been found time and time again that for any statement that we can make about human culture and behavior that there is (or was) a society somewhere that will serve as a counterexample.
- Very rarely do anthropologists or members of related fields have finely tuned critical thinking skills or a strong background on the philosophy of science, and are highly motivated to come up with interpretations of results that match their previous theories and expectations.
Results that you should have reasonable levels of confidence in should be framed in generalities, not absolutes. E.g., "The great majority of human cultures that we have observed have distinct and strong religious traditions", and not "humans evolved to have religion". It may be true that we have areas in our brain that evolved not only 'consistent with holding religion', but actually evolved 'specifically for the purpose of experiencing religion'... but it would be very hard to prove this second statement, and anyone who makes it should be highly suspect.
Perhaps more importantly, these statements are almost always a red herring. It may make you feel better that humans evolved to be violent, to fit in with the tribe, to eat meat, to be spiritual, to die at the age of thirty.... But rarely do we see these claims in a context where the stated purpose is to make you feel better. Instead they are couched in language indicating that they are making a normative statement -- that this is the way things in some way should be. (This is specifically the argumentum ad antiquitatem or appeal to tradition, and should not be confused with the historical fallacy, but it is certainly a fallacy).
It is fine to identify, for example, that your fear of flying has a evolutionary basis. However, it is foolish to therefore refuse to fly because it is unnatural, or to undertake gene therapy to correct the fear. Whether or not the explanation is valid, it is not meaningful.
Obviously, this doesn't mean that we shouldn't study evolution or the effects evolution has on behavior. However, any time you hear someone refer to this information in order to support any argument outside the fields of biology or anthropology, you should look carefully at why they are taking the time to distract you from the practical implications of the matter under discussion.
Harper’s Fishing Nets: a review of Plato’s Camera by Paul Churchland
Harper’s Fishing Nets: a review of Plato’s Camera by Paul Churchland
Abstract
Paul Churchland published Plato’s Camera to defend the thesis that abstract objects and properties are both real and natural, consisting in learned mental representations of the timeless, abstract features of the mind’s environment. He holds that the brain learns, without supervision, high-dimensional maps of objective feature domains – which he calls Domain-Portrayal Semantics. He further elaborates that homomorphisms between these high-dimensional maps allow the brain to occasionally repurpose a higher-quality map to understand a completely different domain, reducing the latter to the former. He finally adds a Map-Portrayal Semantics of language to his Domain-Portrayal Semantics of thought by considering the linguistic, cultural, educational dimensions of human learning.
Part I
Introduction
Surely the title of this review already sounds like some terrible joke is about to be perpetrated, but in fact it merely indicates a philosophical difference between myself and Paul Churchland. Churchland wrote Plato’s Camera[3] not merely to explain a view on philosophy of mind to laypeople and other philosophers, but with the specific goal of defending Platonism about abstract, universal properties and objects (such as those used in mathematics) by naturalizing it. The contrast between such naturalist philosophers as Churchland, Dennett, Flanegan, and Railton and non-naturalist or weakly naturalist philosophy lies precisely in this fact: the latter consider many abstract or intuitive concepts to necessarily form their own part of reality, amenable strictly to philosophical investigation, while the former seek and demand a conscilience of causal explanation for what’s going on in our lives. The results are a breath of fresh air to read.
A great benefit of reading strongly naturalistic philosophy and philosophers is that, over the effort of researching a philosophical position, they tend to absorb so much scientific material that they can’t help but achieve a degree of insight and accuracy in their core thesis – even when getting almost all the details wrong! So it is with Plato’s Camera: reading in 2015 a book published in 2012, that mostly does not cite any scientific research from the past five to ten years, the details can’t help but seem somewhat dated and unrealistic, at least to those of us who’ve been doing our own reading in related scientific literature (or possibly just have partisan opinions). And yet, Plato’s Camera captures and supports a core thesis, this being more or less:
- The brain contains or embodies (high-dimensional) maps of objective domains, and by Hebbian updating over time, the map comes to resemble the territory, be it conceptual (as with mono-directional neural networks) or causal (as with recurrent networks). This is Churchland’s Domain-Portrayal Semantics theory of thought, and Churchland calls the learning process behind it First-Level Learning.
- Homomorphisms between these (high-dimensional) maps, albeit imperfect ones, allow the brain to notice when one objective domain is reducible to another, and thus deploy its existing conceptual knowledge in new ways. Churchland calls this process Second-Level Learning, and it further bolsters the organism’s ability to navigate reality (as well as implementing reductionism at the heart of Churchland’s epistemology). In a rather more insightful point for a reader to take away from Churchland’s book, this reduction does not invalidate the old map, but in fact supports its veracity, the accuracy with which the map portrays its territory, in the subdomain where the old map works at all. Churchland thus argues for an “optimistic meta-induction”, by which he means that in a Pragmatic Empiricist sense, our past, present, and future scientific knowledge is and will be reliable knowledge about the world, to the extent it agrees with data, even in the absence of a Grand Unified Theory of All Reality.
- While the senses allow nonhuman animals to index their maps (a “You Are Here!” marker is how Churchland describes it), language allows humans to deliberately and artificially index each-others’ maps, thus allowing us to create long-lived cultural and institutional traditions of knowledge that accumulate over time rather than dying with individuals. Progress thus extends beyond the span of an individual lifetime. This Third-Level Learning allows Churchland to add an implicit Map Portrayal Semantics theory of language to his Domain-Portrayal theory of thought, although I do not recall him naming that implicit theory as such.
It is these core theses which I regard as largely correct, even where their supporting details are based on old research or the wrong research in the view of the present reviewer. I even believe that had Churchland done as much investigation into my favorite school of computational cognitive science, it would have reinforced his thesis and given him enough material for two books instead of just one. In fact, my disagreements with Churchland can be summed up quite succinctly:
- I believe, and will supply citations for the belief, that probabilistic representations play more role in human cognition, despite making little appearance in Plato’s Camera. In particular, I find Churchland’s defense of Hebbian learning for encoding causal knowledge in recursive deep neural-nets somewhat unconvincing, preferring instead the presentation of [6].
- I find Churchland’s thesis that recursive, many-layered learning allows animals (not only humans) to map abstract features of their environment incredibly insightful, but disagree that this can correctly be called Platonism. Platonism concerns itself with abstract universals (and Churchland says it does). I feel that recursive, many-layered learning allows organisms to map the abstract features of their local environment, while making no guarantees regarding the universal applicability of maps learned from finite information about local territory.
- Platonism is also often about specific objects (such as those of mathematics or ethics) that are claimed to abstractly exist. This notion brings in the important spectrum in cognitive science between feature-governed concepts and causal role-governed concepts. “Electron”, for instance, is actually a theory-laden concept defined chiefly by the causal role(s) involved – but we usually think of electrons as “not very Platonic” while metric spaces are “more Platonic” and Categorical Imperatives are “very extremely Platonic”. I feel that while the mind may posit objects which model certain feature-spaces and fill certain causal roles very elegantly, if those objects are not available, even counterfactually, to multiple modalities from which to sample feature data, I can’t help but suspect they might not really “exist” in a mind-independent sense. This probably sounds like quite a nitpick, but immense portions of the things dreamt-of in human philosophies depend on one’s position on this question. (In fact, confusing a causal role with an object or substance lies at the heart of many superstitions.) On the other hand, we should consider it an open question whether or not “Platonic” abstractions form a necessary component of resource-rational cognition.
- I feel that imperfect (implied to be linear) homomorphism between maps doesn’t work very well as a theory of Second-Level Learning, as any real computational system capable of representing the entire physical world would have to be Turing-complete. Since the representation language would be Turing-complete, the total extensional equivalence of any two models would necessarily be undecidable[1]. And this undecidability arises long before the creature begins to think in the kinds of self-referential terms for which undecidability theorems have been made famous! Dealing with this issue in a sane way remains a major open research problem for anyone proposing to theorize on the workings of the mind.
And yet, for all that these may sound substantial, they are the sum total of my objections. Churchland has otherwise written an excellent book that gets its point across well, and whose many moments of snark against non-naturalistic philosophies of mind, especially the linguaformal “Hilbert proof system theory of mind”, are actually enjoyable (at least, to one who enjoys snark).
In fact, in addition to just describing Churchland’s work, I will spend some of my review noting where other work bolsters it, particularly from the rational analysis (and resource-rational) school of cognitive science[9]. This school of thought aims to understand the mind by first assuming that the mind is posed particular, constrained problems by its environment, then positing how these problems can be optimally solved, and then comparing the resulting theoretical solutions with experimental data. The mind is thus understand as an approximately boundedly rational engine of inference, forced by its environment to deal with shortages of sample data and computational power in the most efficient way possible, but ultimately trying to perform well-defined tasks such as predicting environmental stimuli or plan rewarding actions for the embodied organism to take.
Why “Harper’s Fishing Nets”, then? Well, because treating abstract universals as computational objects learned by generalizing over many domains seems more along the lines of Robert Harper’s “computational trinitarianism” than true Platonism, and because the noisy, always-incomplete process of recursive learning seems more like a succession of fishing nets, with their ropes spaced differently to catch specific species of fish, than like a camera that takes a single, complete picture. All learning algorithms aim to capture the structural information in their input samples while ignoring the noise, but the difference is, of course, undecidable[10]. Recursive pattern recognition - the unsupervised recognition of patterns in already-transformed feature representations - may thus be applicable for capturing additional levels of structural information, especially where causal learning prevents collapsing all levels of hierarchy into a single function. Or, as Churchland himself puts it:
Since these thousands of spaces or ‘maps’ are all connected to one another by billions of axonal projections and trillions of synaptic junctions, such specific locational information within one map can and does provoke subsequent pointlike activations in a sequence of downstream representational spaces, and ultimately in one or more motor-representation spaces, whose unfolding activations are projected onto the body’s muscle systems, thereby to generate cognitively informed behaviors.
Churchland is especially to be congratulated for approaching cognition as a capability that must have evolved in gradual steps, and coming up with a theory that allows for nonhuman animals to have great cognitive abilities in First-Level Learning, even if not in Second and Third.
Choice quotes from the Introductory section:
- Since these thousands of spaces or ‘maps’ are all connected to one another by billions of axonal projections and trillions of synaptic junctions, such specific locational information within one map can and does provoke subsequent pointlike activations in a sequence of downstream representational spaces, and ultimately in one or more motor-representation spaces, whose unfolding activations are projected onto the body’s muscle systems, thereby to generate cognitively informed behaviors.
- The whole point of the synapse-adjusting learning process discussed above was to make the behavior of neurons that are progressively higher in the information-processing hierarchy profoundly and systematically dependent on the activities of the neurons below them.
- [Trained neural networks represent] a space that has a robust and built-in probability metric against which to measure the likelihood, or unlikelihood, of the objective feature represented by that position’s ever being instantiated.
- Indeed, the “justified-true-belief” approach is misconceived from the outset, since it attempts to make concepts that are appropriate only at the level of cultural or language-based learning do the job of characterizing cognitive achievements that lie predominantly at the sublinguistic level.
Part II
First-Level Learning
It is no understatement to say that First-Level Learning forms the shining star of Churchland’s book. It is the process by which the brain forms and updates increasingly accurate maps of conceptual and causal reality, a deeply Pragmatic process shared with nonhuman animal and taking place largely below conscious awareness. In machine-learning terms, First-Level Learning consists mainly of classification and regression problems: classifying hierarchies of regions of compacted metric spaces to form concepts using feedforward neural learning, and regressing trajectories through state-spaces to form causal understanding using recurrent neural networks. One full chapter each is spent on the former and the latter subjects.
1 First-Level Conceptual Learning
He begins in his first chapter on First-Level Learning with a basic introduction to many-layered feedforward neural networks, their training via supervised backpropagation of errors, and their usage for classification of feature-based concepts. He talks about the nonlinear activation functions, like sign and sigmoid, necessary to allow feedforward networks to approximate arbitrary total functions. He gives examples of face-recognition neural networks, which will probably be old-hat for any student of machine learning, but are extremely necessary for laypeople and philosophers untrained in computational approaches to modelling perception. Churchland is also careful to specify that these are not the neural networks of the real human mind, but instead specific examples of what can be done with neural networks. Finally, Churchland begins defending his thesis about Platonism when talking about an artificial neural network designed to classify colors:
[W]e can here point to the first advantage of the information-compression effected by the Hurvich network: it gives us grip on any object’s objective color that is independent of the current background level of illumination.
Or put simply, the kinds of abstract, higher-level features learned by multi-layer neural networks serve to represent certain objective facts about the environment, with each successively lower layer of the network filtering out some perceptual noise and capturing some important structural information.
Churchland also elaborates, in several places, on the compaction of metric-space produced by the nonlinear transformations encoded in neural networks. Neural networks don’t spread their training data uniformly in the output space (or in any of the spaces formed by the intermediate layers of the network)! In fact, they tend to push their training points into highly compacted prototype regions in their output spaces, and when later activated they will try to “divert” any given vector into one of those compacted regions, depending on how well it resembles them in the first place. Since all neural networks receive and produce vectors, and vector spaces are metric spaces, Churchland notes that these neural-network concepts innately and necessarily carry distance metrics for gauging the similarities or differences between any two sensory feature-vectors (or, Churchland implies, real-world objects represented by abstract feature vectors). Churchland even notes, in a rare mention of probability in his book, that these compactions into distinct prototype regions for classes or clusters of training data can even be taken as a sort of emerging set of probability density functions over the training data:
The regions of the two prototypical hot spots represent the highest probability of activation; the region of stretched lines in between them represents a very low probability; and the empty regions outside the deformed grid are not acknowledged as possibilities at all—no activity at the input layer will produce [such] a second-rung activation pattern[.]
Churchland deploys the vector-completion effect in feedforward networks as an example of primitive abductive reasoning himself:
Accordingly, it is at least tempting to see, in this charming capacity for relevant vector completion, the first and most basic instances of what philosophers have called “inference-to-the-best-explanation,” and have tried, with only limited success, to explicate in linguistic or propositional terms.
Churchland deploys his metaphor of concepts as maps of feature-spaces again and again to great effect; I only wish he had taken greater effort to talk of his rarely-mentioned “deformed grids” as topographical maps, measures over the training data, and of the nonlinear transformations taken vectors from their input spaces into topologically-measured maps as flows or rivers. I cannot tell if he took seriously the notion of neural-network training as learning topographical maps of the non-input spaces, or of those topographies as measures in the sense of probability theory. Certainly, the physical metaphor of a river’s flow provides a good intuition pump for describing how well-trained neural network carves out paths from where drops of rain fall to where they ought to go, by whatever criterion trains the network. Certainly, he seems to be thinking something along these lines when he uses metric-space compaction to examine category effects:
This gives us, incidentally, a plausible explanation for so-called ‘category effects’ in perceptual judgments. This is the tendency of normal humans, and of creatures generally, to make similarity judgments that group any two within-category items as being much more similar (to each other) than any other two items, one of which is inside and one of which is outside the familiar category. Humans display this tilt even when, by any “objective measure” over the unprocessed sensory inputs, the similarity measures across the two pairs are the same.
Looking at neural-network training data as measurable would also help us think about how mere perception generates “sensorily simple” random variables, representing qualitative measurements of the world that correspond to the world, which would then be of use according to probabilistic theories of cognition. Certainly, a number of cognitive scientists and neuroscientists have been researching neural mechanisms for representing probabilities[13, 19]. A number of these even provide exactly the kind of approximate Bayesian inference one would require when working with open-world models that can have countably infinitely many separate random variables, an important component of working with Turing-complete modelling domains. One paper even proposes that the neural implementation and learning of probability density/mass functions can explain certain deviations of human judgements from the probabilistic optimum[13]. Again: Churchland’s book, published in 2012 and sent to press without little mention of probability, still clearly prefigured neural encodings of probability, which have turned out to be a productive research effort. This is a testament to how well Churchland has generalized from what previous neuroscientific research he did have!
Of course, Churchland himself decries any notion of sensorily simple variables:
This story thus assumes that our sensations also admit of a determinate decomposition into an antecedent alphabet of ‘simples,’ simples that correspond, finally, to an antecedent alphabet of ‘simple’ properties in the environment.
Churchland would also have done quite well to cover the Blessing of Abstraction and hierarchical modelling (first mentioned in [7]) for their unique effect: they allow training data to be shared across tasks and categories, and thus ameliorate the Curse of Dimensionality. They are how real embodied minds compress their sensory features so as to reduce the necessary sample-complexity of learning to the absolute minimum: sometimes even one single example[18]. I personally hypothesize that the same effect is at work in hierarchical Bayesian modelling as in the recent fad for “deep” learning in artificial neural networks, which learn hierarchies of features: breadth in the lower layers of the model/network provides large amounts of information to quickly train the higher, “abstract” layer of the model/network, which then provides a strong inductive bias to the lower layers. He does mention something like this, however:
[A]s the original sensory input vector pursues its transformative journey up the processing ladder, successively more background information gets tapped from the synaptic matrices driving each successive rung.
This certainly gives an insight into why deep neural networks with sparse later layers work so well: sample information is aggregated in the top layers and then backpropagated to lower layers.
This brings us right back to the Platonism for which Churchland is trying to argue. As usual, we wish to operate under the “game rules” of a very strong naturalism, in which Platonic entities are surely not allowed to be any kind of ontologically “spooky” stuff. After all, we don’t observe any spooky processes interfering in ordinary physical and computational causality to generate thoughts about Platonic Forms or mathematical structures. Instead, we observe embodied, resource-bounded creatures generalizing from data, even if Churchland is a pure connectionist while I favor a probabilistic language of thought. What sort of Platonism would help us explain what goes on in real minds? I think a productive avenue is to view Platonic abstractions as concepts (necessarily compositional concepts of the kind Churchland doesn’t address much, but which are now sometimes described as stochastic functions[8]) which optimally compress a given type of experiential data. We could thus propose Platonic realism about abstract concepts which any reasoner must necessarily develop as they approach the limit of increasing sample data and computational power, and simultaneously Platonic antirealism about abstract concepts which tend to disappear as reasoners gain more information and compute further.
This will probably sound somewhat overwrought and unnecessary to theorists from backgrounds in algorithmic information theory and artificial intelligence. What need does the optimally intelligent “agent”, AIXI, have for Platonic concepts of anything[12]? It just updates a distribution over all possible causal structures and uses it to make predictions. The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information – AIXI can update optimally on sensory information by falling back on its infinite computing power. Such a reasoner, it seems, has no need to compose or decompose causal structures, no need for concepts, but for everyone else, hierarchical representations compress data very efficiently[14]. They also map well onto probabilistic modelling. This trade-off between the decomposability and the degree of compression achieved by any given representation of a concept will have to play a part in a more complete theory of abstract objects as optimally compressed stochastic functions.
Here, though, is a reason for learned representations to be “white-box”, open to introspection and decomposition into smaller concepts: counterfactual-causal reasoning involves zeroing in on a particular random variable in a model and cutting its links to its causal parents. Only white-box representations allow this “graph surgery”; only open-box representations are friendly to causal reasoning about independent, composable concepts rather than whole possible-worlds.
2 Causal reasoning as recurrent-activation-space trajectories
And Churchland does cover causal reasoning! Or at least, he covers reasoning and learning in sequence-prediction tasks, with an elaborate theory of First-Level Learning in recurrent neural networks. Whether this counts as causal reasoning or not depends on whether the reader considers causal reasoning to require modelling counterfactuals and doing graph-surgery to support interventions. Churchland begins by explaining exactly why an embodied organism should want to reason about temporal sequences:
Two complex interacting objects were each outfitted with roughly two dozen small lights attached to various critical parts thereof, and the room lights were then completely extinguished, leaving only the attached lights as effective visual stimuli for any observer. A single snapshot of the two objects, in the midst of their mobile interaction, presented a meaningless and undecipherable scatter of luminous dots to any naïve viewer. But if a movie film or video clip of those two invisible objects were presented, instead of the freeze-frame snapshot, most viewers could recognize, within less than a second and despite the spatial poverty of the visual stimuli described, that they were watching two humans ballroom-dancing in the pitch dark.
Churchland starts his chapter on temporal and causal learning thusly, noting that for an embodied animal, temporal reasoning provides not only an essential way to handle ecologically necessarily tasks, but a dramatic improvement on the performance of moment-to-moment cognitive distinctions. Thus he theorizes that creatures understand causal models as trajectories through metrically-sculpted activation spaces of recurrent neural networks, isomorphic to the execution traces of a computer program. In fact, he tells the reader, extending an animal’s reasoning in Time helps it to cut reality at the joints, so much so that temporal reasoning may have come first.
[I]t is at least worth considering the hypothesis that prototypical causal or nomological processes are what the brain grasps first and best. The creation and fine-tuning of useful trajectories in activation space may be the primary obligation of our most basic mechanisms of learning.
He further points out that, since the function of the autonomic nervous system has always been to regulate cyclical bodily processes, recurrent neural networks may actually be the norm in living animals, and could easily have evolved first for autonomic functions before being adapted to aid in temporal cognition. The brain, then, is conceived as a network-of-networks, capable of activating the recurrent evolution of its sub-networks whenever it needs to imagine how some temporal (or computational) process might proceed:
Our network, that is, is also capable of imaginative activity. It is capable of representational activity that is prompted not by a sensory encounter with an instance of the objective reality therein represented, but rather by ‘top-down’ stimulation of some kind from elsewhere within a larger encompassing network or ‘brain.’
Much of the material from the previous chapter on supervised learning, Hebbian unsupervised learning, and map metaphors is repeated and carried over in this chapter, the better to hammer it home.
3 Criticisms
Now the unfortunate negative. Churchland’s account of conceptual and causal First-Level Learning spends too little explanatory effort, for my tastes at least, on causal-role concepts in particular. Philosophy of mind has long given both feature-governed and role-governed notions of concept, and the cognitive sciences have shown how general learning mechanisms can produce concepts governed by mixtures of sensory features and causal or relational roles[17]. In fact, causal-role concepts appear to form a bedrock for uniquely human thought: humans and other highly intelligent, social animals learn concepts abstracted from their available feature data, of “what something does” rather than “how something looks”. This is how human thought gains its infinitely productive compositionality. In fact, we often utilize concepts grounded so thoroughly in causal role, and so little in feature data, that we forget they “look like” anything at all (more on that when we cover Second-Level Learning and naturalization)! Churchland explicitly mentions how we ought to be able to “index” our “maps” via multiple input modalities, thus enabling us to use concepts abstracted from any one way of obtaining or producing feature data:
Choose any family of familiar observational terms, for whichever sensory modality you like (the family of terms for temperature, for example), and ask what happens to the semantic content of those terms, for their users, if every user has the relevant sensory modality suddenly and permanently disabled. The correct answer is that the relevant family of terms, which used to be at least partly observational for those users, has now become a family of purely theoretical terms. But those terms can still play, and surely will play, much the same descriptive, predictive, explanatory, and manipulative roles that they have always played in the conceptual commerce of those users.
He just doesn’t say how the brain does so.
He also gives a theory for identifying maps with each-other, which is to find a homomorphism taking the contents of one map into the contents of the other:
[T]hey do indeed embody the same portrayal, then, for some superposition of respective map elements, the across-map distances, between any distinct map element in (a) and its ‘nearest distinct map element’ in (b), will fall collectively to zero for some rotation of map (b) around some appropriate superposition point.
This works just fine for his given example of two-dimensional highway maps, but (at least we have solid reason to think) cannot work when the maps themselves come to express a Turing-complete mode of computation, as in recurrent neural networks. The equality of lambda expressions in general is undecidable, after all; the only open question is whether we can determine equality in some useful, though algorithmically random, subset of cases (as is common in theoretical computer science), or whether we can find some sort of approximate equality-by-degrees that works well-enough for creatures with limited information.
The “map” metaphor also elides the fact that computation, in neural networks, takes place at the synapses, not in the neurons. The actual work is done by the nonlinear transformations of vectors between layers of neurons.
Churchland also fails to elaborate on the differences between training neural networks via backpropagation of errors and training them via Hebbian update rules. This is important: as far as my own background reasoning can find, backpropagation of errors suffices to train a neural network to approximate any circuit (or even any computable partial function if we deal with recurrent networks), while even the most general form of unsupervised Hebbian learning seems to learn the directions of variation within a set of feature vectors, rather than general total or partial recursive functions over the input data.
4 Loose-Leaf Highlights
Churchland on free will:
Freedom, on this view, is ultimately a matter of knowledge—knowledge sufficient to see at least some distance into one’s possible futures at any given time, and knowledge of how to behave so as to realize, or to enhance the likelihood of, some of those alternatives at the expense of the others.
He extends the matter up to whole societies:
And as humanity’s capacity for anticipating and shaping our economic, medical, industrial, and ecological futures slowly expands, so does our collective freedom as a society. That capacity, note well, resides in our collective scientific knowledge and in our well-informed legislative and executive institutions.
On unsupervised learning without a preestablished system of propositions (as is used in most current Bayesian methods), in defense of connectionism:
What is perhaps most important about this kind of learning process, beyond its being biologically realistic right down to the synaptic level of physiological activity, is that it does not require a conceptual framework already in place, a framework fit for expressing propositions, some of which serve as hypotheses about the world, and some of which serve as evidence for or against those hypotheses.
Part III
Second-Level Learning: Reductionism, Hierarchies of Theories, Naturalization, and the Progress of the Sciences
If Churchland’s material on First-Level Learning seems, in some ways, like so much outmoded hype about neural networks, his material on Second-Level Learning remains sufficient justification to read his book. Second-Level Learning, the process by which the mind notices that it can repurpose its available conceptual “maps”, and thus comes to form an increasingly unified and coherent picture of the world, is where Churchland hits his (as ever, understated) stride. In addressing Second-Level Learning, Churchland covers the well-worn philosophy-of-science progression of physics from Aristotelian intuitive theories up through Newton and then, eventually, Einstein. This is also where he begins to talk about normatively rational reasoning:
Both history and current experience show that humans are all too ready to interpret puzzling aspects of the world in various fabulous or irrelevant terms, and then rationalize away their subsequent predictive/manipulative failures, if those failures are noticed at all.
Second-Level Learning is described as just turning old ideas to new uses. The brain more-or-less randomly notices the partial homomorphism of two conceptual “maps” (again: high-dimensional vector spaces with metric compaction based on Hebbian learning in neural networks) and repurposes (and re-trains) the more accurate, detailed, and general “map” (call it the larger map) to predict and describe the phenomena once encompassed by the less accurate, less detailed, and less general “map” (call it the smaller one). Viewed in the larger historical context Churchland gives it, however, Second-Level Learning is the methodology of scientific thought as we have come to understand it. Churchland gives solid reason to hypothesize that by means of Second-Level Learning, human beings and humankind have come to understand our world.
In larger terms, Second-Level Learning consists of naturalizing concepts in terms of other concepts, forming hierarchies of theories.
Our knowledge begins as a vast, disconnected, disparate mish-mash of independent concepts and theories, none of which makes sense in terms of the others, and which leaves us no recourse to any universal terms of explanation. Worse, our intuitive theories are often so disconnected that we may have only one modality of causal access to the objective reality behind any particular concept, perhaps even one so utterly unreliable as subjective introspection.
As we proceed to assemble interlocking hierarchies of theories, however, the increased connectedness of our theories allows us to spread the training information derived from experience and experiment throughout, letting us use the feature-modality behind one concept to inquire about the objective reality behind a seemingly different concept. By judicious application of Second-Level Learning, we develop an increasingly coherent, predictive, unified body of knowledge about the objective reality in which we find ourselves. We also become able to dissolve concepts that no longer make sense by showing what explains their training experiences, and sometimes come to be rationally obligated to reject concepts and theories that just no longer fit our experiences. Consilience can thus be seen as the key to truth, overcoming the exclaimed cries - “But thou must!” - of intuition or apparently-logical argumentation.
This is where Churchland feels a definite need to argue with other major philosophers of science, particularly Karl Popper’s falsificationism (still a staple of many methodology and philosophy-of-science lessons given to grad students everywhere):
Popper’s story of the proper relation between science and experience was also too simple-minded. Formally speaking, we can always conjure up an ‘auxiliary premise’ that will put any lunatic metaphysical hypothesis into the required logical contact with a possible refuting observation statement. …
The supposedly possible refutation of a scientific hypothesis “H” at the hands of “if H then not-O” and “O” can be only as certain as one’s confidence in the truth of “O.”… Unfortunately, given the theory-laden character of all concepts, and the contextual contingencies surrounding all perceptions, no observation statement is ever known with certainty, a point Popper himself acknowledges. So no hypothesis, even a legitimately scientific one, can be refuted with certainty – not ever. One might shrug one’s shoulders and acquiesce in this welcome consequence, resting content with the requirement that possible observations can at least contradict a genuinely scientific hypothesis, if not refute it with certainty.
Heavy and contentious words already, but well in line with the basic facts about learning and inference discovered by the pioneers of statistical learning theory: as long as one’s theory remains fully deterministic and one’s reasoning fully deductive, one must place absolute faith in experience (which, to wit, experience tells us is unreliable) and meaningfully eliminate hypotheses slowly, if ever. Abductive inference, not deductive, forms the core of real-world scientific reasoning, and one is reminded of Broad’s calling inductive reasoning, “the glory of Science” and yet “the scandal of Philosophy”. Having adopted abduction of inferred models, subject to revision, we can now justify those inferences much better than we could when philosophers talked of inductive reasoning about the certain truth or falsity of propositions. Churchland continues into territory even surer to arouse controversy, among the public if not among professional scientists or philosophers:
But this [revision to Popper given above] won’t draw the required distinction either, even if we let go of the requirement of decisive refutability for generalized hypotheses. The problem is that presumptive metaphysics can also creep into our habits of perceptual judgment, as when an unquestioning devout sincerely avers, “I feel God’s disapproval” or “I see God’s happiness,” when the rest of us would simply say, “I feel guilty” or “I see a glorious sunset.” This possibility is not just a philosopher’s a priori complaint: millions of religious people reflexively approach the perceivable world with precisely the sorts of metaphysical concepts just cited.
Throughout this latter portion of the book, Churchland takes numerous other shots at superstition, religion, model-theoretic philosophical theories of semantics, non-natural normativity, and various other forms of belief in the spooky and weird (whatever joke I may appear to be making here is paraphrased straight from Churchland’s own views). Regarding the last item on the list in particular, Churchland does indeed take an explicit stand in favor of naturalizing normative rationality via Second-Level Learning:
Since we cannot derive an “ought” from an “is,” continues the objection, any descriptive account of the de facto operations of a brain must be strictly irrelevant to the question of how our representational states can be justified, and to the question of how a rational brain ought to conduct its cognitive affairs. … An immediate riposte points out that our normative convictions in any domain always have systematic factual presuppositions about the nature of that domain. … A second riposte points out that a deeper descriptive appreciation of how the cognitive machinery of a normal or typical brain actually functions, so as to represent the world, is likely to give us a much deeper insight into the manifold ways in which it can occasionally fail to function to our representational advantage, and a deeper insight into what optimal functioning might amount to.
This objection to the “is-ought gap” should be happily received by cognitive scientists everywhere: it is certainly impossible to prove that an algorithm solves a given problem optimally, or even approximately, when we do not know what the problem is. What certain schools of thinking about rationality tend to fail to appreciate is that, particularly when dealing with highly constrained problems of abductive reasoning, we also cannot prove that a certain algorithm is very bad (in failing to approximate or approach an optimal solution, even in the limit of increasing resources) without knowing what the problem to be solved actually is.
Churchland backs up these ideas with a cogent analogy:
Imagine now a possible eighteenth century complaint, raised just as microbiology and biochemistry were getting started, that such descriptive scientific undertakings were strictly speaking a waste of our time, at least where normative matters such as Health are concerned, a complaint based on the ‘principle’ that “you can’t derive an ought from an is.” … Our subsequent appreciation of the various viral and bacteriological origins of the pantheon of diseases that plague us, of the operations of the immune system, and of the endless sorts of degenerative conditions that undermine our normal metabolic functions, gave us an unprecedented insight into the underlying nature of Health and its many manipulable dimensions. Our normative wisdom increased a thousand-fold, and not just concerning means-to-ends, but concerning the identity and nature of the ’ultimate’ ends themselves.
… The nature of Rationality, in sum, is something we humans have only just begun to penetrate, and the cognitive neurosciences are sure to play a central role in advancing our normative as well as our descriptive understanding, just as in the prior case of Health.
5 Hierarchies of Theories and Reductionism
How, then, does Second-Level Learning proceed in the actual, physical brain?
Here the issue is whether the acquired structure of one of our maps mirrors in some way (that is, whether it is homomorphic with) some substructure of the second map under consideration. Is the first map, perhaps, simply a more familiar and more parochial version of a smallish part of the larger and more encompassing second map?
Churchland has, earlier in the book, already proposed an algorithm for inferring the degree to which two maps seem to portray the same domain, and he is deploying it here to explain how the brain can perform inter-theoretic reductions. The only problem, to my eyes, is that as stated above, this algorithm proposes to solve an undecidable problem when we begin to deal with the Turing-complete hypothesis-space represented by recurrent neural networks (and considering finite recurrent networks as learning deterministic finite-state automata just reduces our problem from undecidable to EXPTIME-complete).
On the question of how we come to intertheoretic reductions, Churchland opined that they occur more-or-less randomly, or at least unpredictably:
Most importantly, such singular events are flatly unpredictable, being the expression of the occasionally turbulent transitions, from one stable regime to another, of a highly nonlinear dynamical system: the brain.
Thanks to later work, we know that Churchland erred at least somewhat on this point, but that doesn’t make Churchland’s view of intertheoretic reductions irredeemable. Quite to the contrary, later work has ridden to the rescue of Churchland’s Second-Level Learning, presenting us with a map of the landscape of scientific hierarchies. The statistical nature of this map of maps is worth quoting directly for its elegance[16]:
Recent studies of nonlinear, multiparameter models drawn from disparate areas in science have shown that predictions from these models largely depend only on a few ’stiff’ combinations of parameters [6, 8, 9]. This recurring characteristic (termed ’sloppiness’) appears to be an inherent property of these models and may be a manifestation of an underlying universality [11]. Indeed, many of the practical and philosophical implications of sloppiness are identical to those of the renormalization group (RG) and continuum limit methods of statistical physics: models show weak dependance of macroscopic observables (defined at long length and time scales) on microscopic details. They thus have a smaller effective model dimensionality than their microscopic parameter space [12].
The objective reality we confront on a daily basis not only can be modelled at multiple levels of abstraction, but in order to utilize our experiential data as efficiently as possible, we must model it at multiple levels of abstraction. Macroscopic models explain more of the variation in observable data with fewer parameters, while microscopic models successfully explain a larger portion of the total available data by including even the “sloppier” parameters. How large is the trade-off between these models, in terms of necessary data and generalization power? Extremely large:
Eigenvalues [of the Fisher Information Matrix] are normalized to unit stiffest value; only the first 10 decades are shown. This means that inferring the parameter combination whose eigenvalue is smallest shown would require ~1010 times more data than the stiffest parameter combination. Conversely, this means that the least important parameter combination is sqrt(1010) times less important for understanding system behavior.
The amounts of variation explained by expanding combinations of parameters are distributed exponentially: the plurality of variation can usually be captured with very few parameters (as with intuitive theories that are “fuzzy” even on the mesoscopic scale), the majority with relatively few parameters (as with macroscopically accurate models that ignore microscopic reality), and the whole of variation explained by recourse to increasingly many parameters (as in microscopic models). Note that this exponential distribution of variance explanation adds weight to the Platonism of optimal compressions advocated above, and to Churchland’s Platonism: in order to make efficient use of available experiential data to explain variance and predict well in varying environments, we must form certain abstract concepts, and we must either form them into hierarchies (or to take from mathematical logic, entailment preorders of probabilistic conditioning). An embodied mind most likely cannot feasibly function in real-time without modelling what Churchland calls “the timeless landscape of abstract universals that collectively structure the universe” (even if one doesn’t accord those abstracts any vaunted metaphysical status).
What, then, can we call an intertheoretic reduction, on a modelling level? The perfect answer would be: a deterministic, continuous function from the high-dimensional parameter space of a microscopic model (which has a simple deterministic component but vast uncertainty about parameters) to the low-dimensional parameter space of a macroscopic model (which makes less precise, more stochastic predictions, but allows for more certainty about parameters). In a rare few cases, we can even construct such a function: consider temperature as the average kinetic energy, thus derived from the average velocity, of a body of particles. Even though we cannot feasibly obtain the sample data to know the individual velocity of tens of millions of particles in a jar of air, our microscopic model tells us that averaging those tens of millions of parameters will give us the single macroscopic parameter we call temperature, which is as directly observable as anything via a simple thermometer (whose usage is just another model for the human scientist to learn and employ). Churchland even gives us an example of how these connections between theories aid a nonhuman creature in its everyday cognition:
Who would have felt that the local speed of sound was something which could be felt? But it can, and quite accurately, too. Of what earthly use might that be? Well, suppose you are a bat, for example. The echo-return time of a probing squeak, to which bats have accurate access, gives you the exact distance to an edible target moth, if you have running access to the local speed of sound.
Usually, intertheoretic reductions are more probabilistic than this, though. Newton generalized his Laws of Motion and calculated the motion of the planets under his laws of gravitation for himself, rather than possessing a function that would construct Kepler’s equations from his. This looks more like evaluating a likelihood function and selecting as his “microscopic” theory the one which gave a higher likelihood to the available data while having a larger support set, as in probabilistic interpretations of scientific reasoning.
6 Naturalization and the Progress of the Sciences
We face a substantial difficulty in employing hierarchies of theories to explain the natural world around us: our meso-scale observable variables are very distantly abstracted from the microscopic phenomena that, under our best scientific theories, form the foundations of reality. On the one hand, this is reassuring: our microscopic theories require huge amounts of free parameters precisely because they reduce large, complex things to aggregations of smaller, simpler things. Since we need many small things to make a large thing, we should find that thinking of the large thing in terms of its constituent small things requires huge amounts of information. However, this also implies that our descriptions of fundamental reality are far more theory-laden than our descriptions of our everyday surroundings. We suffer from a polarization in which humanly intuitive theories and theories of the fundamentals of reality come to occupy the opposite sides of our hierarchy. Thus:
The process presents itself as a meaningless historical meander, without compass or convergence. Except, it would seem, in the domain of the natural sciences.
We might call it a symptom of that very polarization that human beings require strict intellectual training to successfully think in a naturalistic, scientific way – Churchland has really switched to philosophy of science instead of mind in this part of the book. Our intuitive theories tend to explain most of the variance visible in our observables, but nonetheless don’t predict all that well. As a result, we tend to just intuitively accept that we can’t entirely understand the world. In fact, modern science has obtained more success from trying to find additional observables that will let us get accurate data about the (usually) less influential, smaller-scale structure and parameters of reality. As Churchland describes it:
Such experimental indexings can also be evaluated for their consistency with distinct experimental indexings made within distinct but partially overlapping interpretive practices, as when one uses distinct measuring instruments and distinct but overlapping conceptual maps to try to ‘get at’ one and the same phenomenon.
“Naturalization” of concepts thus turns out to come in two kinds of inference rather than one. “Upwards” naturalizations, let us say, string a connection from more microscopic theories to more macroscopic concepts. “Downwards” naturalizations, the traditional mode of intertheoretic reduction, connect existing macroscopic concepts and theories to more microscopic theories, exploiting the thoroughness and simplicity of the microscopic theory to provide a well-informed inductive bias to the more macroscopic theory. This inductive bias embodies what we learned, as we developed the microscopic theory, about all the observables we used to learn that theory. We can thus see that both kinds of naturalizations connect our concepts and theories to additional observable variables, thus enabling quicker and more accurate inductive training.
In combination with causal-role concepts and theories thereof, this all comes back to Churchland’s defense of the thesis that abstract objects and properties are both real and natural. The greater the degree of unity we attain in our hierarchical forests of abstract concepts and theories, the more we can justify those abstractions by reference to their role in successful causal description of concrete observations, rather than by abstracted argumentation. The more we naturalize our concepts, the more we feel licensed by Indespensability Arguments to call them real abstract universals (or at least, real abstract generalities of the neighborhood of reality we happen to live in), despite their being mere inferred theories bound ultimately to empirical data[15].
Certain naive forms of scientific realism would thus say that we are thus, through our scientific progress, coming to understand reality on a single, supreme, fundamental level. Churchland disagrees, and I concur with his disagreement.
That our sundry overlapping maps frequently enjoy adjustments that bring them into increasing conformity with one another (even as their predictive accuracy continues to increase) need not mean that there is some Ur map toward which all are tending.
To the contrary, a single Ur-map would be an extremely high-dimensional model, would require an extremely large amount of data to train, and would carry an extraordinarily large chance of overfitting after we had trained it. Entailment preorders of maps compress and represent experiential data far more efficiently than a single Ur-map, even if we know there exists a single underlying objective reality. In fact, we might often possess multiple maps of similar, or even identical, objective domains:
Two maps can differ substantially from each other, and yet still be, both of them, highly accurate maps of the same objective reality. For they may be focusing on distinct aspects or orthogonal dimensions of that shared reality. Reality, after all, is spectacularly complex, and it is asking too much of any given map that it capture all of reality (see, e.g., Giere 2006).
Churchland emphasizes that the final emphasis must be on empiricism and (sometimes counterfactual) observability:
What is important, for any map to be taken seriously as a representation of reality, is that somehow or other, however indirectly, it is possible to index it. …So long as every aspect of reality is somehow in causal interaction with the rest of reality, then every aspect of reality is, in principle, at least accessible to critical cognitive activity. Nothing guarantees that we will succeed in getting a grip on any given aspect. But nothing precludes it, either.
Churchland is, of course, reciting the naturalist creed by stating that “every aspect of reality is somehow in causal interaction with the rest of reality” (or at least, it was in its past or will be in its future). This is a bullet both he and I can gladly bite, however. I can also add that since Second-Level Learning enables us to cohere our concepts into vast, inter-related preorders over time, it also enables us to gain increasing certainty about which conceptual maps refer to real abstract objects (optimal generalizations of properties of other maps), real concrete objects (which participate directly in causality), and apparent objects actually derived from erroneous inferences. As we learn more and integrate our concepts, real concrete and abstract objects come to be tied together, whereas unreal concrete objects (like superstitions) or abstract objects (like false philosophical frameworks) come to be increasingly isolated in our framework of maps of the world. A more integrated, naturalistic explanation for the experiential phenomena which originally gave birth to a model of unreal concrete or abstract objects can, if we allow ourselves to admit it into our worldview, clear up the experiential confusion and clear away the “zombie concepts”.
Part IV
Third-Level Learning: Cultural Progress
In the third major part of the book, although the shortest, we finally arrive to the domain of learning and thought in which we deal exclusively with human beings communicating via language. Churchland opens the chapter almost apologetically:
The reader will have noticed, in all of the preceding chapters, a firm skepticism concerning the role or significance of linguaformal structures in the business of both learning and deploying a conceptual framework. This skepticism goes back almost four decades …. In the intervening period, my skepticism on this point has only expanded and deepened, as the developments - positive and negative - in cognitive psychology, classical AI, and the several neurosciences gave both empirical and theoretical substance to those skeptical worries. As I saw these developments, they signaled the need to jettison the traditional epistemological playing field of accepted or rejected sentences, and the dynamics of logical or probabilistic inference that typically went with it.
Unfortunately, this statement appears to ignore the close links between probabilistic inference and the entire rest of statistical learning theory, including the neural networks that form the foundation for Churchland’s theory of cognition in the First-Level Learning chapters. Alas.
Still, Churchland’s skepticism regarding the “language of thought” hypothesis makes a great deal of intuitive sense. It takes thorough study to learn the difference between formal systems (sets of axioms demonstrated to have a model) from the foundations of mathematics, and formal languages (notations for computations) in the science of computing, although Douglas Hofstadter did write the world’s premier “pop comp-sci” text on exactly that matter[11]. Furthermore, any given spoken or written sentence, in formal or informal language, contains fairly little communicable information relative to the size of an entire mental model of a relevant domain, as Churchland has spotted:
We must doubt this [sentential] perspective, indeed, we must reject it, because theories themselves are not sets of sentences at all. Sentences are just the comparatively low-dimensional public stand-ins that allow us to make rough mutual coordinations of our endlessly idiosyncratic conceptual frameworks or theories, so that we can more effectively apply them and evaluate them.
Unlike in much of analytic philosophy, the science of computing takes programs and programming languages to simply be different ways of writing down calculations, to the point that the field of denotational semantics for programming remains relatively small relative to the study of proving which computations the program carries out. A hypothesis regarding neurocomputation which can explain how learning and commonsense reasoning take place would apply, via the Church-Turing Thesis, to neural nets as well as Turing machines.
Third-Level Learning is perhaps a misnomer, since as far as I know, it does not actually come third in any particular causal or historical ordering. After all, humans communicated ideas, and thus carried out Third-Level Learning, long before we ever engaged seriously in reductionist science, and if standardized test scores show anything at all, they surely show that our societies have invented sophisticated systems devoted to ensuring that existing ideas are passed down to children as-is. In fact, the educational system often performs quite reliably, in the sense that the children consistently pass their exams, even if we all ritually lament the failure to pass down the true understanding and clarity once achieved by discoverers, inventors, and teachers. Such true understanding, Churchland would say, involves a high-dimensional conceptual map sculpted by large sums of experiential data. Perhaps we indeed ought to pessimistically expect that such high-dimensional understanding cannot be passed down accurately, even though teaching is a well-developed science (albeit, one prone to fads whose occasional serious results are also often ignored in favor of “how it’s always been done” or “the strong students will survive”). After all, as Churchland says:
[W]e have no public access to those raw sensory activation patterns [which sculpted our conceptual frameworks], as such.
Third-Level Learning, then, consists in using a Map-Portrayal Semantics for language (and other forms of human communication) to pass down maps that, according to the Domain Portrayal Semantics Churchland posits, accurately portray some piece of local reality. It may come before or after Second-Level Learning in our history, but it surely occurs. By means of evocative and descriptive language, human beings can index each-other’s maps and even, through carefully chosen series of evocations, describe their conceptual maps to each-other. Although other vocalizing species - such as wolves, nonhuman great apes, and some marine mammals - display the former ability to signal to each-other with sound, humans are exclusive in having the latter ability: to systematically educate each-other, passing on whole conceptual frameworks from their original discoverers to vast social peer-groups. By this means, human intellectual life surpasses the individual human:
While the collective cognitive process steadily loses some of its participants to old age, and adds fresh participants in the form of newborns, the collective process itself is now effectively immortal. It faces no inevitable termination.
One might think that little can be said about education by someone other than a professional expert on education, but Churchland does have an important point to make in describing Third-Level Learning: it is a form of learning, not a form of something other than learning. In particular, he explicitly criticizes the “memetic” theory of cultural “evolution”, for attempting to ground culture in Darwinist principles without making any reference to such obvious participants in culture as the mind and brain:
The dynamical parallels between a virus-type and a theory-type are pretty thin. …Dawkins’ story, though novel and agreeably naturalistic, once again attempts, like so many other accounts before it, to characterize the essential nature of the scientific enterprise without making any reference to the unique kinematics and dynamics of the brain.
…
Similarly, no account of science or rationality that confines itself to social-level mechanisms alone will ever get to the heart of that matter. For that, the microstructure of the brain and the nature of its microactivities are also uniquely essential.
Churchland also notes that reasoning can work, even when individual reasoners don’t quite understand how or why they reason, as in the case of scientists with too little knowledge of methodology:
For the scientists themselves may indeed be confabulating their explanations within a methodological framework that positively misrepresents the real causal factors and the real dynamics of their own cognitive behaviors.
In fact, he even demands that we account for the Third-Level Learning and reasoning of others in such “unclean” fields as politics:
For better or for worse, the moral convictions of those agents will play a major role in determining their voting behavior. To be sure, one may be deeply skeptical of the moral convictions of the citizens, or the senators, involved. Indeed, one may reject those convictions entirely, on the grounds that they presuppose some irrational religion, for example. But it would be foolish to make a policy of systematically ignoring those assembled moral convictions (even if they are dubious), if one wants to understand the voting behavior of the individuals involved.
Churchland also notes how successful Third-Level Learning ultimately requires engaging, sometimes, in successful Second-Level Learning, attributed to Kuhnian “paradigm shifts”:
As we have seen, Kuhn describes such periods of turmoil as ‘crisis science,’ and he explains in some illustrative detail how the normal pursuit of scientific inquiry is rather well-designed to produce such crises, sooner or later. I am compelled to agree with his portrayal, for, on the present account, ‘normal science,’ as discussed at length by Kuhn, just is the process of trying to navigate new territory under the guidance of an existing map, however rough, and of trying to articulate its often vague outlines and to fill in its missing details as the exploration proceeds.
He then ends the book on a positive note:
All told, the metabolisms of humans are wrapped in the benign embrace of an interlocking system of mechanisms that help to sustain, regulate, and amplify their (one hopes) healthy activities, just as the cognitive organs of humans are wrapped in the benign embrace of an interlocking system of mechanisms that help to sustain, regulate, and amplify their (one hopes) rational activities.
Unfortunately, I do feel that this “up-ending” opens Churchland to a substantive criticism, namely: he has failed to address anything outside the sciences. Since most actually existing humans are neither scientists nor science hobbyists, one would think that a book about the brain would bother to address the vast domains of human life outside the halls of academic science, lest one be reminded of Professor Smith in Piled Higher and Deeper justifying the professorial career pyramid just by making everything outside academic science sound scary.
I suppose that Churchland’s own career and position as a philosopher of mind and science led him to write as chiefly addressing domains he thoroughly understands, but I, at least, think his core thesis draws strength from its potential applications outside those domains. If Churchland, and much other literature, can explain a naturalistic theory of how the brain comes to understand abstract, immaterial objects and properties in such domains as science and mathematics, then why not in, say, aesthetics, ethics, or the emotional life? Among the first abstract properties posited at the beginnings of any human culture are beauty and goodness, among the first abstract objects, the soul. It may sound suddenly religious to speak of the soul when talking about science and statistical modelling, but eliminativism on these “soulful” objects and properties has always stood as the largest bullet for naturalists to bite. Having a constructive-naturalist theory to apply to “soulful” subjects of inquiry could turn the bitter bullet into a harmless sugar pill.
Churchland also spent an entire book talking about the brain without ever once mentioning subjective consciousness/experience, for reasons of, I suspect, the same sort of greedy eliminativism.
However, that might just mean I need to put both Churchland’s earlier work - like Matter and Consciousness[4], Engine of Reason, Seat of the Soul[5], and Patricia Churchland’s Braintrust[2] - on my reading list to see what they have to say on such subjects.
References
[1] Alonzo Church. An unsolvable problem of elementary number theory. American Journal of Mathematics, 58(2):345–363, April 1936.
[2] Patricia Smith. Churchland. Braintrust: What Neuroscience Tells Us about Morality. Princeton University Press, Princeton, N.J., 2011.
[3] Paul Churchland. Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals. MIT Press, 2012.
[4] Paul Churchland. Matter and Consciousness. MIT Press, Cambridge, 2013.
[5] Paul M. Churchland. The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain. MIT Press, Cambridge, 1995.
[6] C. E. Freer, D. M. Roy, and J. B. Tenenbaum. Towards common-sense reasoning via conditional simulation: Legacies of Turing in Artificial Intelligence. Turing’s Legacy (ASL Lecture Notes in Logic), 2012.
[7] N. D. Goodman, T. D. Ullman, , and J. B. Tenenbaum. Learning a theory of causality. Psychological review, 2011.
[8] Noah D Goodman, Joshua B Tenenbaum, and T Gerstenberg. Concepts in a probabilistic language of thought. MIT Press, 2015.
[9] T. L. Griffiths, F. Lieder, and N. D. Goodman. Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, To appear.
[10] Peter D. Grünwald and Paul M. B. Vitányi. Algorithmic information theory. CoRR, abs/0809.2754, 2008.
[11] Douglas R. Hofstadter. Godel, Escher, Bach: An Eternal Golden Braid. Basic Books, Inc., New York, NY, USA, 1979.
[12] Marcus Hutter. Universal algorithmic intelligence: A mathematical top-down approach. In B. Goertzel and C. Pennachin, editors, Artificial General Intelligence, Cognitive Technologies, pages 227–290. Springer, Berlin, 2007.
[13] Milad Kharratzadeh and Thomas Shultz. Neural implementation of probabilistic models of cognition.
[14] John C. Kieffer. A tutorial on hierarchical lossless data compression. In Moshe Dror, Pierre L’Ecuyer, and Ferenc Szidarovszky, editors, Modeling Uncertainty, volume 46 of International Series in Operations Research & Management Science, pages 711–733. Springer US, 2005.
[15] David Liggins. Quine, Putnam, and the ‘Quine-Putnam’ indispensability argument. Erkenntnis (1975-), 68(1):pp. 113–127, 2008.
[16] Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, and James P. Sethna. Parameter space compression underlies emergent theories and predictive models. Science, 342(6158):604–607, 2013.
[17] Thomas L. Griffiths Noah D. Goodman, Joshua B. Tenenbaum and Jacob Feldman. Compositionality in rational analysis: Grammar-based induction for concept learning. In Nick Chater and Mike Oaksford, editors, The Probabilistic Mind: Prospects for Bayesian Cognitive Science. Oup Oxford, 2008.
[18] Ruslan Salakhutdinov, Joshua B. Tenenbaum, and Antonio Torralba. One-shot learning with a hierarchical nonparametric bayesian model. Journal of Machine Learning Research - Proceedings Track, 27:195–206, 2012.
[19] Lei Shi and Thomas L. Griffiths. Neural implementation of hierarchical bayesian inference by importance sampling. In Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1669–1677. Curran Associates, Inc., 2009.
Pattern-botching: when you forget you understand
It’s all too easy to let a false understanding of something replace your actual understanding. Sometimes this is an oversimplification, but it can also take the form of an overcomplication. I have an illuminating story:
Years ago, when I was young and foolish, I found myself in a particular romantic relationship that would later end for epistemic reasons, when I was slightly less young and slightly less foolish. Anyway, this particular girlfriend of mine was very into healthy eating: raw, organic, home-cooked, etc. During her visits my diet would change substantially for a few days. At one point, we got in a tiny fight about something, and in a not-actually-desperate chance to placate her, I semi-jokingly offered: “I’ll go vegetarian!”
“I don’t care,” she said with a sneer.
…and she didn’t. She wasn’t a vegetarian. Duhhh... I knew that. We’d made some ground beef together the day before.
So what was I thinking? Why did I say “I’ll go vegetarian” as an attempt to appeal to her values?
(I’ll invite you to take a moment to come up with your own model of why that happened. You don't have to, but it can be helpful for evading hindsight bias of obviousness.)
(Got one?)
Here's my take: I pattern-matched a bunch of actual preferences she had with a general "healthy-eating" cluster, and then I went and pulled out something random that felt vaguely associated. It's telling, I think, that I don't even explicitly believe that vegetarianism is healthy. But to my pattern-matcher, they go together nicely.
I'm going to call this pattern-botching.† Pattern-botching is when you pattern-match a thing "X", as following a certain model, but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.
†Maybe this already has a name, but I've read a lot of stuff and it feels like a distinct concept to me.
Examples of pattern-botching
So, that's pattern-botching, in a nutshell. Now, examples! We'll start with some simple ones.
Calmness and pretending to be a zen master
In my Againstness Training video, past!me tries a bunch of things to calm down. In the pursuit of "calm", I tried things like...
- dissociating
- trying to imitate a zen master
- speaking really quietly and timidly
None of these are the desired state. The desired state is present, authentic, and can project well while speaking assertively.
But that would require actually being in a different state, which to my brain at the time seemed hard. So my brain constructed a pattern around the target state, and said "what's easy and looks vaguely like this?" and generated the list above. Not as a list, of course! That would be too easy. It generated each one individually as a plausible course of action, which I then tried, and which Val then called me out on.
Personality Types
I'm quite gregarious, extraverted, and generally unflappable by noise and social situations. Many people I know describe themselves as HSPs (Highly Sensitive Persons) or as very introverted, or as "not having a lot of spoons". These concepts are related—or perhaps not related, but at least correlated—but they're not the same. And even if these three terms did all mean the same thing, individual people would still vary in their needs and preferences.
Just this past week, I found myself talking with an HSP friend L, and noting that I didn't really know what her needs were. Like I knew that she was easily startled by loud noises and often found them painful, and that she found motion in her periphery distracting. But beyond that... yeah. So I told her this, in the context of a more general conversation about her HSPness, and I said that I'd like to learn more about her needs.
L responded positively, and suggested we talk about it at some point. I said, "Sure," then added, "though it would be helpful for me to know just this one thing: how would you feel about me asking you about a specific need in the middle of an interaction we're having?"
"I would love that!" she said.
"Great! Then I suspect our future interactions will go more smoothly," I responded. I realized what had happened was that I had conflated L's HSPness with... something else. I'm not exactly sure what, but a preference for indirect communication, perhaps? I have another friend, who is also sometimes short on spoons, who I model as finding that kind of question stressful because it would kind of put them on the spot.
I've only just recently been realizing this, so I suspect that I'm still doing a ton of this pattern-botching with people, that I haven't specifically noticed.
Of course, having clusters makes it easier to have heuristics about what people will do, without knowing them too well. A loose cluster is better than nothing. I think the issue is when we do know the person well, but we're still relying on this cluster-based model of them. It's telling that I was not actually surprised when L said that she would like it if I asked about her needs. On some level I kind of already knew it. But my botched pattern was making me doubt what I knew.
False aversions
CFAR teaches a technique called "Aversion Factoring", in which you try to break down the reasons why you don't do something, and then consider each reason. In some cases, the reasons are sound reasons, so you decide not to try to force yourself to do the thing. If not, then you want to make the reasons go away. There are three types of reasons, with different approaches.
One is for when you have a legitimate issue, and you have to redesign your plan to avert that issue. The second is where the thing you're averse to is real but isn't actually bad, and you can kind of ignore it, or maybe use exposure therapy to get yourself more comfortable with it. The third is... when the outcome would be an issue, but it's not actually a necessary outcome of the thing. As in, it's a fear that's vaguely associated with the thing at hand, but the thing you're afraid of isn't real.
All of these share a structural similarity with pattern-botching, but the third one in particular is a great example. The aversion is generated from a property that the thing you're averse to doesn't actually have. Unlike a miscalibrated aversion (#2 above) it's usually pretty obvious under careful inspection that the fear itself is based on a botched model of the thing you're averse to.
Taking the training wheels off of your model
One other place this structure shows up is in the difference between what something looks like when you're learning it versus what it looks like once you've learned it. Many people learn to ride a bike while actually riding a four-wheeled vehicle: training wheels. I don't think anyone makes the mistake of thinking that the ultimate bike will have training wheels, but in other contexts it's much less obvious.
The remaining three examples look at how pattern-botching shows up in learning contexts, where people implicitly forget that they're only partway there.
Rationality as a way of thinking
CFAR runs 4-day rationality workshops, which currently are evenly split between specific techniques and how to approach things in general. Let's consider what kinds of behaviours spring to mind when someone encounters a problem and asks themselves: "what would be a rational approach to this problem?"
- someone with a really naïve model, who hasn't actually learned much about applied rationality, might pattern-match "rational" to "hyper-logical", and think "What Would Spock Do?"
- someone who is somewhat familiar with CFAR and its instructors but who still doesn't know any rationality techniques, might complete the pattern with something that they think of as being archetypal of CFAR-folk: "What Would Anna Salamon Do?"
- CFAR alumni, especially new ones, might pattern-match "rational" as "using these rationality techniques" and conclude that they need to "goal factor" or "use trigger-action plans"
- someone who gets rationality would simply apply that particular structure of thinking to their problem
In the case of a bike, we see hundreds of people biking around without training wheels, and so that becomes the obvious example from which we generalize the pattern of "bike". In other learning contexts, though, most people—including, sometimes, the people at the leading edge—are still in the early learning phases, so the training wheels are the rule, not the exception.
So people start thinking that the figurative bikes are supposed to have training wheels.
Incidentally, this can also be the grounds for strawman arguments where detractors of the thing say, "Look at these bikes [with training wheels]! How are you supposed to get anywhere on them?!"
Effective Altruism
We potentially see a similar effect with topics like Effective Altruism. It's a movement that is still in its infancy, which means that nobody has it all figured out. So when trying to answer "How do I be an effective altruist?" our pattern-matchers might pull up a bunch of examples of things that EA-identified people have been commonly observed to do.
- donating 10% of one's income to a strategically selected charity
- going to a coding bootcamp and switching careers, in order to Earn to Give
- starting a new organization to serve an unmet need, or to serve a need more efficiently
- supporting the Against Malaria Fund
...and this generated list might be helpful for various things, but be wary of thinking that it represents what Effective Altruism is. It's possible—it's almost inevitable—that we don't actually know what the most effective interventions are yet. We will potentially never actually know, but we can expect that in the future we will generally know more than at present. Which means that the current sampling of good EA behaviours likely does not actually even cluster around the ultimate set of behaviours we might expect.
Creating a new (platform for) culture
At my intentional community in Waterloo, we're building a new culture. But that's actually a by-product: our goal isn't to build this particular culture but to build a platform on which many cultures can be built. It's like how as a company you don't just want to be building the product but rather building the company itself, or "the machine that builds the product,” as Foursquare founder Dennis Crowley puts it.
What I started to notice though, is that we started to confused the particular, transitionary culture that we have at our house, with either (a) the particular, target culture, that we're aiming for, or (b) the more abstract range of cultures that will be constructable on our platform.
So from a training wheels perspective, we might totally eradicate words like "should". I did this! It was really helpful. But once I had removed the word from my idiolect, it became unhelpful to still be treating it as being a touchy word. Then I heard my mentor use it, and I remembered that the point of removing the word wasn't to not ever use it, but to train my brain to think without a particular structure that "should" represented.
This shows up on much larger scales too. Val from CFAR was talking about a particular kind of fierceness, "hellfire", that he sees as fundamental and important, and he noted that it seemed to be incompatible with the kind of culture my group is building. I initially agreed with him, which was kind of dissonant for my brain, but then I realized that hellfire was only incompatible with our training culture, not the entire set of cultures that could ultimately be built on our platform. That is, engaging with hellfire would potentially interfere with the learning process, but it's not ultimately proscribed by our culture platform.
Conscious cargo-culting
I think it might be helpful to repeat the definition:
Pattern-botching is you pattern-match a thing "X", as following a certain model, but then but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.
It's kind of like if you were doing a cargo-cult, except you knew how airplanes worked.
(Cross-posted from malcolmocean.com)
Alien neuropunk slaver civilizations
Here's some blue-sky speculation about one way alien sapients' civilizations might develop differently from our own. Alternatively, you can consider it conworlding. Content note: torture, slavery.
Looking at human history, after we developed electronics, we painstakingly constructed machines that can perform general computation, then built software which approximates the workings of the human brain. For instance, we nowadays use in-silico reinforcement learning and neural nets to solve various "messy" problems like computer vision and robot movement. In the future, we might scan brains and then emulate them on computers. This all seems like a very circuitous course of development - those algorithms have existed all around us for thousands of years in the form of brains. Putting them on computers requires an extra layer of technology.
Suppose that some alien species's biology is a lot more robust than ours - their homeostatic systems are less failure-prone than our own, due to some difference in their environment or evolutionary history. They don't get brain-damaged just from holding their breath for a couple minutes, and open wounds don't easily get infected.
Now suppose that after they invent agriculture but before they invent electronics, they study biology and neurology. Combined with their robust biology, this leads to a world where things that are electronic in our world are instead controlled by vat-grown brains. For instance, a car-building robot could be constructed by growing a brain in a vat, hooking it up to some actuators and sensors, then dosing it with happy chemicals when it correctly builds a car, and stimulating its nociceptors when it makes mistakes. This rewarding and punishing can be done by other lab-grown "overseer" brains trained specifically for the job, which are in turn manually rewarded at the end of the day by their owner for the total number of cars successfully built. Custom-trained brains could control chemical plants, traffic lights, surveillance systems, etc. The actuators and sensors could be either biologically-based (lab-grown eyes, muscles, etc., powered with liquefied food) or powered with combustion engines or steam engines or even wound springs.
Obviously this is a pretty terrible world, because many minds will live lives with very little meaning, never grasping the big picture, at the mercy of unmerciful human or vat-brain overseers, without even the option of suicide. Brains wouldn't necessarily be designed or drugged to be happy overall - maybe a brain in pain does its job better. I don't think the owners would be very concerned about the ethical problems - look at how humans treat other animals.
You can see this technology as a sort of slavery set up so that slaves are cheap and unsympathetic and powerless. They won't run away, because: they'll want to perform their duties, for the drugs; many won't be able to survive without owners to top up their food drips; they could be developed or drugged to ensure docility; you could prevent them from even getting the idea of emancipation, by not giving them the necessary sensors; perhaps you could even set things up so the overseer brains can read the thoughts of their charges directly, and punish bad thoughts. This world has many parallels to Hanson's brain emulation world.
Is this scenario at all likely? Would these civilizations develop biological superintelligent AGI, or would they only be able to create superintelligent AGI once they develop electronic computing?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)