Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

A Year of Spaced Repetition Software in the Classroom

tanagrabeast 04 July 2015 10:30PM

Last year, I asked LW for some advice about spaced repetition software (SRS) that might be useful to me as a high school teacher. With said advice came a request to write a follow-up after I had accumulated some experience using SRS in the classroom. This is my report.

Please note that this was not a scientific experiment to determine whether SRS "works." Prior studies are already pretty convincing on this point and I couldn't think of a practical way to run a control group or "blind" myself. What follows is more of an informal debriefing for how I used SRS during the 2014-15 school year, my insights for others who might want to try it, and how the experience is changing how I teach.

Summary

SRS can raise student achievement even with students who won't use the software on their own, and even with frequent disruptions to the study schedule. Gains are most apparent with the already high-performing students, but are also meaningful for the lowest students. Deliberate efforts are needed to get student buy-in, and getting the most out of SRS may require changes in course design.

The software

After looking into various programs, including the game-like Memrise, and even writing my own simple SRS, I ultimately went with Anki for its multi-platform availability, cloud sync, and ease-of-use. I also wanted a program that could act as an impromptu catch-all bin for the 2,000+ cards I would be producing on the fly throughout the year. (Memrise, in contrast, really needs clearly defined units packaged in advance).

The students

I teach 9th and 10th grade English at an above-average suburban American public high school in a below-average state. Mine are the lower "required level" students at a school with high enrollment in honors and Advanced Placement classes. Generally speaking, this means my students are mostly not self-motivated, are only very weakly motivated by grades, and will not do anything school-related outside of class no matter how much it would be in their interest to do so. There are, of course, plenty of exceptions, and my students span an extremely wide range of ability and apathy levels.

The procedure

First, what I did not do. I did not make Anki decks, assign them to my students to study independently, and then quiz them on the content. With honors classes I taught in previous years I think that might have worked, but I know my current students too well. Only about 10% of them would have done it, and the rest would have blamed me for their failing grades—with some justification, in my opinion.

Instead, we did Anki together, as a class, nearly every day.

As initial setup, I created a separate Anki profile for each class period. With a third-party add-on for Anki called Zoom, I enlarged the display font sizes to be clearly legible on the interactive whiteboard at the front of my room.

Nightly, I wrote up cards to reinforce new material and integrated them into the deck in time for the next day's classes. This averaged about 7 new cards per lesson period.These cards came in many varieties, but the three main types were:

  1. concepts and terms, often with reversed companion cards, sometimes supplemented with "what is this an example of" scenario cards.
  2. vocabulary, 3 cards per word: word/def, reverse, and fill-in-the-blank example sentence
  3. grammar, usually in the form of "What change(s), if any, does this sentence need?" Alternative cards had different permutations of the sentence.

Weekly, I updated the deck to the cloud for self-motivated students wishing to study on their own.

Daily, I led each class in an Anki review of new and due cards for an average of 8 minutes per study day, usually as our first activity, at a rate of about 3.5 cards per minute. As each card appeared on the interactive whiteboard, I would read it out loud while students willing to share the answer raised their hands. Depending on the card, I might offer additional time to think before calling on someone to answer. Depending on their answer, and my impressions of the class as a whole, I might elaborate or offer some reminders, mnemonics, etc. I would then quickly poll the class on how they felt about the card by having them show a color by way of a small piece of card-stock divided into green, red, yellow, and white quadrants. Based on my own judgment (informed only partly by the poll), I would choose and press a response button in Anki, determining when we should see that card again.

End-of-year summary for one of my classes

[Data shown is from one of my five classes. We didn't start using Anki until a couple weeks into the school year.]

Opportunity costs

8 minutes is a significant portion of a 55 minute class period, especially for a teacher like me who fills every one of those minutes. Something had to give. For me, I entirely cut some varieties of written vocab reinforcement, and reduced the time we spent playing the team-based vocab/term review game I wrote for our interactive whiteboards some years ago. To a lesser extent, I also cut back on some oral reading comprehension spot-checks that accompany my whole-class reading sessions. On balance, I think Anki was a much better way to spend the time, but it's complicated. Keep reading.

Whole-class SRS not ideal

Every student is different, and would get the most out of having a personal Anki profile determine when they should see each card. Also, most individuals could study many more cards per minute on their own than we averaged doing it together. (To be fair, a small handful of my students did use the software independently, judging from Ankiweb download stats)

Getting student buy-in

Before we started using SRS I tried to sell my students on it with a heartfelt, over-prepared 20 minute presentation on how it works and the superpowers to be gained from it. It might have been a waste of time. It might have changed someone's life. Hard to say.

As for the daily class review, I induced engagement partly through participation points that were part of the final semester grade, and which students knew I tracked closely. Raising a hand could earn a kind of bonus currency, but was never required—unlike looking up front and showing colors during polls, which I insisted on. When I thought students were just reflexively holding up the same color and zoning out, I would sometimes spot check them on the last card we did and penalize them if warranted.

But because I know my students are not strongly motivated by grades, I think the most important influence was my attitude. I made it a point to really turn up the charm during review and play the part of the engaging game show host. Positive feedback. Coaxing out the lurkers. Keeping that energy up. Being ready to kill and joke about bad cards. Reminding classes how awesome they did on tests and assignments because they knew their Anki stuff.

(This is a good time to point out that the average review time per class period stabilized at about 8 minutes because I tried to end reviews before student engagement tapered off too much, which typically started happening at around the 6-7 minute mark. Occasional short end-of-class reviews mostly account for the difference.)

I also got my students more on the Anki bandwagon by showing them how this was directly linked reduced note-taking requirements. If I could trust that they would remember something through Anki alone, why waste time waiting for them to write it down? They were unlikely to study from those notes anyway. And if they aren't looking down at their paper, they'll be paying more attention to me. I better come up with more cool things to tell them!

Making memories

Everything I had read about spaced repetition suggested it was a great reinforcement tool but not a good way to introduce new material. With that in mind, I tried hard to find or create memorable images, examples, mnemonics, and anecdotes that my Anki cards could become hooks for, and to get those cards into circulation as soon as possible. I even gave this method a mantra: "vivid memory, card ready".

When a student during review raised their hand, gave me a pained look, and said, "like that time when...." or "I can see that picture of..." as they struggled to remember, I knew I had done well. (And I would always wait a moment, because they would usually get it.)

Baby cards need immediate love

Unfortunately, if the card wasn't introduced quickly enough—within a day or two of the lesson—the entire memory often vanished and had to be recreated, killing the momentum of our review. This happened far too often—not because I didn't write the card soon enough (I stayed really on top of that), but because it didn't always come up for study soon enough. There were a few reasons for this:

  1. We often had too many due cards to get through in one session, and by default Anki puts new cards behind due ones.
  2. By default, Anki only introduces 20 new cards in one session (I soon uncapped this).
  3. Some cards were in categories that I gave lower priority to.

Two obvious cures for this problem:

  1. Make fewer cards. (I did get more selective as the year went on.)
  2. Have all cards prepped ahead of time and introduce new ones at the end of the class period they go with. (For practical reasons, not the least of which was the fact that I didn't always know what cards I was making until after the lesson, I did not do this. I might able to next year.)

Days off suck

SRS is meant to be used every day. When you take weekends off, you get a backlog of due cards. Not only do my students take every weekend and major holiday off (slackers), they have a few 1-2 week vacations built into the calendar. Coming back from a week's vacation means a 9-day backlog (due to the weekends bookending it). There's no good workaround for students that won't study on their own. The best I could do was run longer or multiple Anki sessions on return days to try catch up with the backlog. It wasn't enough. The "caught up" condition was not normal for most classes at most points during the year, but rather something to aspire to and occasionally applaud ourselves for reaching. Some cards spent weeks or months on the bottom of the stack. Memories died. Baby cards emerged stillborn. Learning was lost.

Needless to say, the last weeks of the school year also had a certain silliness to them. When the class will never see the card again, it doesn't matter whether I push the button that says 11 days or the one that says 8 months. (So I reduced polling and accelerated our cards/minute rate.)

Never before SRS did I fully appreciate the loss of learning that must happen every summer break.

Triage

I kept each course's master deck divided into a few large subdecks. This was initially for organizational reasons, but I eventually started using it as a prioritizing tool. This happened after a curse-worthy discovery: if you tell Anki to review a deck made from subdecks, due cards from subdecks higher up in the stack are shown before cards from decks listed below, no matter how overdue they might be. From that point, on days when we were backlogged (most days) I would specifically review the concept/terminology subdeck for the current semester before any other subdecks, as these were my highest priority.

On a couple of occasions, I also used Anki's study deck tools to create temporary decks of especially high-priority cards.

Seizing those moments

Veteran teachers start acquiring a sense of when it might be a good time to go off book and teach something that isn't in the unit, and maybe not even in the curriculum. Maybe it's teaching exactly the right word to describe a vivid situation you're reading about, or maybe it's advice on what to do in a certain type of emergency that nearly happened. As the year progressed, I found myself humoring my instincts more often because of a new confidence that I can turn an impressionable moment into a strong memory and lock it down with a new Anki card. I don't even care if it will ever be on a test. This insight has me questioning a great deal of what I thought knew about organizing a curriculum. And I like it.

A lifeline for low performers

An accidental discovery came from having written some cards that were, it was immediately obvious to me, much too easy. I was embarrassed to even be reading them out loud. Then I saw which hands were coming up.

In any class you'll get some small number of extremely low performers who never seem to be doing anything that we're doing, and, when confronted, deny that they have any ability whatsoever. Some of the hands I was seeing were attached to these students. And you better believe I called on them.

It turns out that easy cards are really important because they can give wins to students who desperately need them. Knowing a 6th grade level card in a 10th grade class is no great achievement, of course, but the action takes what been negative morale and nudges it upward. And it can trend. I can build on it. A few of these students started making Anki the thing they did in class, even if they ignored everything else. I can confidently name one student I'm sure passed my class only because of Anki. Don't get me wrong—he just barely passed. Most cards remained over his head. Anki was no miracle cure here, but it gave him and I something to work with that we didn't have when he failed my class the year before.

A springboard for high achievers

It's not even fair. The lowest students got something important out of Anki, but the highest achievers drank it up and used it for rocket fuel. When people ask who's widening the achievement gap, I guess I get to raise my hand now.

I refuse to feel bad for this. Smart kids are badly underserved in American public schools thanks to policies that encourage staff to focus on that slice of students near (but not at) the bottom—the ones who might just barely be able to pass the state test, given enough attention.

Where my bright students might have been used to high Bs and low As on tests, they were now breaking my scales. You could see it in the multiple choice, but it was most obvious in their writing: they were skillfully working in terminology at an unprecedented rate, and making way more attempts to use new vocabulary—attempts that were, for the most part, successful.

Given the seemingly objective nature of Anki it might seem counterintuitive that the benefits would be more obvious in writing than in multiple choice, but it actually makes sense when I consider that even without SRS these students probably would have known the terms and the vocab well enough to get multiple choice questions right, but might have lacked the confidence to use them on their own initiative. Anki gave them that extra confidence.

A wash for the apathetic middle?

I'm confident that about a third of my students got very little out of our Anki review. They were either really good at faking involvement while they zoned out, or didn't even try to pretend and just took the hit to their participation grade day after day, no matter what I did or who I contacted.

These weren't even necessarily failing students—just the apathetic middle that's smart enough to remember some fraction of what they hear and regurgitate some fraction of that at the appropriate times. Review of any kind holds no interest for them. It's a rerun. They don't really know the material, but they tell themselves that they do, and they don't care if they're wrong.

On the one hand, these students are no worse off with Anki than they would have been with with the activities it replaced, and nobody cries when average kids get average grades. On the other hand, I'm not ok with this... but so far I don't like any of my ideas for what to do about it.

Putting up numbers: a case study

For unplanned reasons, I taught a unit at the start of a quarter that I didn't formally test them on until the end of said quarter. Historically, this would have been a disaster. In this case, it worked out well. For five weeks, Anki was the only ongoing exposure they were getting to that unit, but it proved to be enough. Because I had given the same test as a pre-test early in the unit, I have some numbers to back it up. The test was all multiple choice, with two sections: the first was on general terminology and concepts related to the unit. The second was a much harder reading comprehension section.

As expected, scores did not go up much on the reading comprehension section. Overall reading levels are very difficult to boost in the short term and I would not expect any one unit or quarter to make a significant difference. The average score there rose by 4 percentage points, from 48 to 52%.

Scores in the terminology and concept section were more encouraging. For material we had not covered until after the pre-test, the average score rose by 22 percentage points, from 53 to 75%. No surprise there either, though; it's hard to say how much credit we should give to SRS for that.

But there were also a number of questions about material we had already covered before the pretest. Being the earliest material, I might have expected some degradation in performance on the second test. Instead, the already strong average score in that section rose by an additional 3 percentage points, from 82 to 85%. (These numbers are less reliable because of the smaller number of questions, but they tell me Anki at least "locked in" the older knowledge, and may have strengthened it.)

Some other time, I might try reserving a section of content that I teach before the pre-test but don't make any Anki cards for. This would give me a way to compare Anki to an alternative review exercise.

What about formal standardized tests?

I don't know yet. The scores aren't back. I'll probably be shown some "value added" analysis numbers at some point that tell me whether my students beat expectations, but I don't know how much that will tell me. My students were consistently beating expectations before Anki, and the state gave an entirely different test this year because of legislative changes. I'll go back and revise this paragraph if I learn anything useful.

Those discussions...

If I'm trying to acquire a new skill, one of the first things I try to do is listen to skilled practitioners of that skill talk about it to each other. What are the terms-of-art? How do they use them? What does this tell me about how they see their craft? Their shorthand is a treasure trove of crystallized concepts; once I can use it the same way they do, I find I'm working at a level of abstraction much closer to theirs.

Similarly, I was hoping Anki could help make my students more fluent in the subject-specific lexicon that helps you score well in analytical essays. After introducing a new term and making the Anki card for it, I made extra efforts to use it conversationally. I used to shy away from that because so many students would have forgotten it immediately and tuned me out for not making any sense. Not this year. Once we'd seen the card, I used the term freely, with only the occasional reminder of what it meant. I started using multiple terms in the same sentence. I started talking about writing and analysis the way my fellow experts do, and so invited them into that world.

Even though I was already seeing written evidence that some of my high performers had assimilated the lexicon, the high quality discussions of these same students caught me off guard. You see, I usually dread whole-class discussions with non-honors classes because good comments are so rare that I end up dejectedly spouting all the insights I had hoped they could find. But by the end of the year, my students had stepped up.

I think what happened here was, as with the writing, as much a boost in confidence as a boost in fluency. Whatever it was, they got into some good discussions where they used the terminology and built on it to say smarter stuff.

Don't get me wrong. Most of my students never got to that point. But on average even small groups without smart kids had a noticeably higher level of discourse than I am used to hearing when I break up the class for smaller discussions.

Limitations

SRS is inherently weak when it comes to the abstract and complex. No card I've devised enables a student to develop a distinctive authorial voice, or write essay openings that reveal just enough to make the reader curious. Yes, you can make cards about strategies for this sort of thing, but these were consistently my worst cards—the overly difficult "leeches" that I eventually suspended from my decks.

A less obvious limitation of SRS is that students with a very strong grasp of a concept often fail to apply that knowledge in more authentic situations. For instance, they may know perfectly well the difference between "there", "their", and "they're", but never pause to think carefully about whether they're using the right one in a sentence. I am very open to suggestions about how I might train my students' autonomous "System 1" brains to have "interrupts" for that sort of thing... or even just a reflex to go back and check after finishing a draft.

Moving forward

I absolutely intend to continue using SRS in the classroom. Here's what I intend to do differently this coming school year:

  • Reduce the number of cards by about 20%, to maybe 850-950 for the year in a given course, mostly by reducing the number of variations on some overexposed concepts.
  • Be more willing to add extra Anki study sessions to stay better caught-up with the deck, even if this means my lesson content doesn't line up with class periods as neatly.
  • Be more willing to press the red button on cards we need to re-learn. I think I was too hesitant here because we were rarely caught up as it was.
  • Rework underperforming cards to be simpler and more fun.
  • Use more simple cloze deletion cards. I only had a few of these, but they worked better than I expected for structured idea sets like, "characteristics of a tragic hero".
  • Take a less linear and more opportunistic approach to introducing terms and concepts.
  • Allow for more impromptu discussions where we bring up older concepts in relevant situations and build on them.
  • Shape more of my lessons around the "vivid memory, card ready" philosophy.
  • Continue to reduce needless student note-taking.
  • Keep a close eye on 10th grade students who had me for 9th grade last year. I wonder how much they retained over the summer, and I can't wait to see what a second year of SRS will do for them.

Suggestions and comments very welcome!

The Pre-Historical Fallacy

6 Tem42 03 July 2015 08:21PM

One fallacy that I see frequently in works of popular science -- and also here on LessWrong -- is the belief that we have strong evidence of the way things were in pre-history, particularly when one is giving evidence that we can explain various aspects of our culture, psychology, or personal experience because we evolved in a certain way. Moreover, it is held implicit that because we have this 'strong evidence', it must be relevant to the topic at hand. While it is true that the environment did effect our evolution and thus the way we are today, evolution and anthropology of pre-historic societies is emphasized to a much greater extent than rational thought would indicate is appropriate. 

As a matter of course, you should remember these points whenever you hear a claim about prehistory:

  • Most of what we know (or guess) is based on less data than you would expect, and the publish or perish mentality is alive and well in the field of anthropology.
  • Most of the information is limited and technical, which means that anyone writing for a popular audience will have strong motivation to generalize and simplify.
  • It has been found time and time again that for any statement that we can make about human culture and behavior that there is (or was) a society somewhere that will serve as a counterexample. 
  • Very rarely do anthropologists or members of related fields have finely tuned critical thinking skills or a strong background on the philosophy of science, and are highly motivated to come up with interpretations of results that match their previous theories and expectations. 

Results that you should have reasonable levels of confidence in should be framed in generalities, not absolutes. E.g., "The great majority of human cultures that we have observed have distinct and strong religious traditions", and not "humans evolved to have religion". It may be true that we have areas in our brain that evolved not only 'consistent with holding religion', but actually evolved 'specifically for the purpose of experiencing religion'... but it would be very hard to prove this second statement, and anyone who makes it should be highly suspect. 

Perhaps more importantly, these statements are almost always a red herring. It may make you feel better that humans evolved to be violent, to fit in with the tribe, to eat meat, to be spiritual, to die at the age of thirty.... But rarely do we see these claims in a context where the stated purpose is to make you feel better. Instead they are couched in language indicating that they are making a normative statement -- that this is the way things in some way should be. (This is specifically the argumentum ad antiquitatem or appeal to tradition, and should not be confused with the historical fallacy, but it is certainly a fallacy). 

It is fine to identify, for example, that your fear of flying has a evolutionary basis. However, it is foolish to therefore refuse to fly because it is unnatural, or to undertake gene therapy to correct the fear. Whether or not the explanation is valid, it is not meaningful. 

Obviously, this doesn't mean that we shouldn't study evolution or the effects evolution has on behavior. However, any time you hear someone refer to this information in order to support any argument outside the fields of biology or anthropology, you should look carefully at why they are taking the time to distract you from the practical implications of the matter under discussion. 

 

Weekly LW Meetups

2 FrankAdamek 03 July 2015 06:05PM

This summary was posted to LW Main on June 26th. The following week's summary is here.

New meetups (or meetups with a hiatus of more than a year) are happening in:

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

A Federal Judge on Biases in the Criminal Justice System.

13 Costanza 03 July 2015 03:17AM

A well-known American federal appellate judge, Alex Kozinski, has written a commentary on systemic biases and institutional myths in the criminal justice system.

The basic thrust of his criticism will be familiar to readers of the sequences and rationalists generally. Lots about cognitive biases (but some specific criticisms of fingerprints and DNA evidence as well). Still, it's interesting that a prominent federal judge -- the youngest when appointed, and later chief of the Ninth Circuit -- would treat some sacred cows of the judiciary so ruthlessly. 

This is specifically a criticism of U.S. criminal justice, but, ceteris paribus, much of it applies not only to other areas of U.S. law, but to legal practices throughout the world as well.

The Unfriendly Superintelligence next door

34 jacob_cannell 02 July 2015 06:46PM

Markets are powerful decentralized optimization engines - it is known.  Liberals see the free market as a kind of optimizer run amuck, a dangerous superintelligence with simple non-human values that must be checked and constrained by the government - the friendly SI.  Conservatives just reverse the narrative roles.

In some domains, where the incentive structure aligns with human values, the market works well.  In our current framework, the market works best for producing gadgets. It does not work so well for pricing intangible information, and most specifically it is broken when it comes to health.

We treat health as just another gadget problem: something to be solved by pills.  Health is really a problem of knowledge; it is a computational prediction problem.  Drugs are useful only to the extent that you can package the results of new knowledge into a pill and patent it.  If you can't patent it, you can't profit from it.

So the market is constrained to solve human health by coming up with new patentable designs for mass-producible physical objects which go into human bodies.  Why did we add that constraint - thou should solve health, but thou shalt only use pills?  (Ok technically the solutions don't have to be ingestible, but that's a detail.)

The gadget model works for gadgets because we know how gadgets work - we built them, after all.  The central problem with health is that we do not completely understand how the human body works - we did not build it.  Thus we should be using the market to figure out how the body works - completely - and arguably we should be allocating trillions of dollars towards that problem.

The market optimizer analogy runs deeper when we consider the complexity of instilling values into a market.  Lawmakers cannot program the market with goals directly, so instead they attempt to engineer desireable behavior by ever more layers and layers of constraints.  Lawmakers are deontologists.

As an example, consider the regulations on drug advertising.  Big pharma is unsafe - its profit function does not encode anything like "maximize human health and happiness" (which of course itself is an oversimplification).  If allowed to its own devices, there are strong incentives to sell subtly addictive drugs, to create elaborate hyped false advertising campaigns, etc.  Thus all the deontological injunctions.  I take that as a strong indicator of a poor solution - a value alignment failure.

What would healthcare look like in a world where we solved the alignment problem?

To solve the alignment problem, the market's profit function must encode long term human health and happiness.  This really is a mechanism design problem - its not something lawmakers are even remotely trained or qualified for.  A full solution is naturally beyond the scope of a little blog post, but I will sketch out the general idea.

To encode health into a market utility function, first we create financial contracts with an expected value which captures long-term health.  We can accomplish this with a long-term contract that generates positive cash flow when a human is healthy, and negative when unhealthy - basically an insurance contract.  There is naturally much complexity in getting those contracts right, so that they measure what we really want.  But assuming that is accomplished, the next step is pretty simple - we allow those contracts to trade freely on an open market.

There are some interesting failure modes and considerations that are mostly beyond scope but worth briefly mentioning.  This system probably needs to be asymmetric.  The transfers on poor health outcomes should partially go to cover medical payments, but it may be best to have a portion of the wealth simply go to nobody/everybody - just destroyed.

In this new framework, designing and patenting new drugs can still be profitable, but it is now put on even footing with preventive medicine.  More importantly, the market can now actually allocate the correct resources towards long term research.

To make all this concrete, let's use an example of a trillion dollar health question - one that our current system is especially ill-posed to solve:

What are the long-term health effects of abnormally low levels of solar radiation?  What levels of sun exposure are ideal for human health?

This is a big important question, and you've probably read some of the hoopla and debate about vitamin D.  I'm going to soon briefly summarize a general abstract theory, one that I would bet heavily on if we lived in a more rational world where such bets were possible.

In a sane world where health is solved by a proper computational market, I could make enormous - ridiculous really - amounts of money if I happened to be an early researcher who discovered the full health effects of sunlight.  I would bet on my theory simply by buying up contracts for individuals/demographics who had the most health to gain by correcting their sunlight deficiency.  I would then publicize the theory and evidence, and perhaps even raise a heap pile of money to create a strong marketing engine to help ensure that my investments - my patients - were taking the necessary actions to correct their sunlight deficiency.  Naturally I would use complex machine learning models to guide the trading strategy.

Now, just as an example, here is the brief 'pitch' for sunlight.

If we go back and look across all of time, there is a mountain of evidence which more or less screams - proper sunlight is important to health.  Heliotherapy has a long history.

Humans, like most mammals, and most other earth organisms in general, evolved under the sun.  A priori we should expect that organisms will have some 'genetic programs' which take approximate measures of incident sunlight as an input.  The serotonin -> melatonin mediated blue-light pathway is an example of one such light detecting circuit which is useful for regulating the 24 hour circadian rhythm.

The vitamin D pathway has existed since the time of algae such as the Coccolithophore.  It is a multi-stage pathway that can measure solar radiation over a range of temporal frequencies.  It starts with synthesis of fat soluble cholecalciferiol which has a very long half life measured in months. [1] [2]

The rough pathway is:

  • Cholecalciferiol (HL ~ months) becomes 
  • 25(OH)D (HL ~ 15 days) which finally becomes 
  • 1,25(OH)2 D (HL ~ 15 hours)

The main recognized role for this pathway in regards to human health - at least according to the current Wikipedia entry - is to enhance "the internal absorption of calcium, iron, magnesium, phosphate, and zinc".  Ponder that for a moment.

Interestingly, this pathway still works as a general solar clock and radiation detector for carnivores - as they can simply eat the precomputed measurement in their diet.

So, what is a long term sunlight detector useful for?  One potential application could be deciding appropriate resource allocation towards DNA repair.  Every time an organism is in the sun it is accumulating potentially catastrophic DNA damage that must be repaired when the cell next divides.  We should expect that genetic programs would allocate resources to DNA repair and various related activities dependent upon estimates of solar radiation.

I should point out - just in case it isn't obvious - that this general idea does not imply that cranking up the sunlight hormone to insane levels will lead to much better DNA/cellular repair.  There are always tradeoffs, etc.

One other obvious use of a long term sunlight detector is to regulate general strategic metabolic decisions that depend on the seasonal clock - especially for organisms living far from the equator.  During the summer when food is plentiful, the body can expect easy calories.  As winter approaches calories become scarce and frugal strategies are expected.

So first off we'd expect to see a huge range of complex effects showing up as correlations between low vit D levels and various illnesses, and specifically illnesses connected to DNA damage (such as cancer) and or BMI.  

Now it turns out that BMI itself is also strongly correlated with a huge range of health issues.  So the first key question to focus on is the relationship between vit D and BMI.  And - perhaps not surprisingly - there is pretty good evidence for such a correlation [3][4] , and this has been known for a while.

Now we get into the real debate.  Numerous vit D supplement intervention studies have now been run, and the results are controversial.  In general the vit D experts (such as my father, who started the vit D council, and publishes some related research[5]) say that the only studies that matter are those that supplement at high doses sufficient to elevate vit D levels into a 'proper' range which substitutes for sunlight, which in general requires 5000 IU day on average - depending completely on genetics and lifestyle (to the point that any one-size-fits all recommendation is probably terrible).

The mainstream basically ignores all that and funds studies at tiny RDA doses - say 400 IU or less - and then they do meta-analysis over those studies and conclude that their big meta-analysis, unsurprisingly, doesn't show a statistically significant effect.  However, these studies still show small effects.  Often the meta-analysis is corrected for BMI, which of course also tends to remove any vit D effect, to the extent that low vit D/sunlight is a cause of both weight gain and a bunch of other stuff.

So let's look at two studies for vit D and weight loss.

First, this recent 2015 study of 400 overweight Italians (sorry the actual paper doesn't appear to be available yet) tested vit D supplementation for weight loss.  The 3 groups were (0 IU/day, ~1,000 IU / day, ~3,000 IU/day).  The observed average weight loss was (1 kg, 3.8 kg, 5.4 kg). I don't know if the 0 IU group received a placebo.  Regardless, it looks promising.

On the other hand, this 2013 meta-analysis of 9 studies with 1651 adults total (mainly women) supposedly found no significant weight loss effect for vit D.  However, the studies used between 200 IU/day to 1,100 IU/day, with most between 200 to 400 IU.  Five studies used calcium, five also showed weight loss (not necessarily the same - unclear).  This does not show - at all - what the study claims in its abstract.

In general, medical researchers should not be doing statistics.  That is a job for the tech industry.

Now the vit D and sunlight issue is complex, and it will take much research to really work out all of what is going on.  The current medical system does not appear to be handling this well - why?  Because there is insufficient financial motivation.

Is Big Pharma interested in the sunlight/vit D question?  Well yes - but only to the extent that they can create a patentable analogue!  The various vit D analogue drugs developed or in development is evidence that Big Pharma is at least paying attention.  But assuming that the sunlight hypothesis is mainly correct, there is very little profit in actually fixing the real problem.

There is probably more to sunlight that just vit D and serotonin/melatonin.  Consider the interesting correlation between birth month and a number of disease conditions[6].  Perhaps there is a little grain of truth to astrology after all.

Thus concludes my little vit D pitch.  

In a more sane world I would have already bet on the general theory.  In a really sane world it would have been solved well before I would expect to make any profitable trade.  In that rational world you could actually trust health advertising, because you'd know that health advertisers are strongly financially motivated to convince you of things actually truly important for your health.

Instead of charging by the hour or per treatment, like a mechanic, doctors and healthcare companies should literally invest in their patients long-term health, and profit from improvements to long term outcomes.  The sunlight health connection is a trillion dollar question in terms of medical value, but not in terms of exploitable profits in today's reality.  In a properly constructed market, there would be enormous resources allocated to answer these questions, flowing into legions of profit motivated startups that could generate billions trading on computational health financial markets, all without selling any gadgets.

So in conclusion: the market could solve health, but only if we allowed it to and only if we setup appropriate financial mechanisms to encode the correct value function.  This is the UFAI problem next door.


MIRI needs an Office Manager (aka Force Multiplier)

9 alexvermeer 03 July 2015 01:10AM

(Cross-posted from MIRI's blog.)

MIRI's looking for a full-time office manager to support our growing team. It’s a big job that requires organization, initiative, technical chops, and superlative communication skills. You’ll develop, improve, and manage the processes and systems that make us a super-effective organization. You’ll obsess over our processes (faster! easier!) and our systems (simplify! simplify!). Essentially, it’s your job to ensure that everyone at MIRI, including you, is able to focus on their work and Get Sh*t Done.

That’s a super-brief intro to what you’ll be working on. But first, you need to know if you’ll even like working here.

A Bit About Us

We’re a research nonprofit working on the critically important problem of superintelligence alignment: how to bring smarter-than-human artificial intelligence into alignment with human values.1 Superintelligence alignment is a burgeoning field, and arguably the most important and under-funded research problem in the world. Experts largely agree that AI is likely to exceed human levels of capability on most cognitive tasks in this century—but it’s not clear when, and we aren’t doing a very good job of preparing for the possibility. Given how disruptive smarter-than-human AI would be, we need to start thinking now about AI’s global impact. Over the past year, a number of leaders in science and industry have voiced their support for prioritizing this endeavor:

People are starting to discuss these issues in a more serious way, and MIRI is well-positioned to be a thought leader in this important space. As interest in AI safety grows, we’re growing too—we’ve gone from a single full-time researcher in 2013 to what will likely be a half-dozen research fellows by the end of 2015, and intend to continue growing in 2016.

All of which is to say: we really need an office manager who will support our efforts to hack away at the problem of superintelligence alignment!

If our overall mission seems important to you, and you love running well-oiled machines, you’ll probably fit right in. If that’s the case, we can’t wait to hear from you.

What it’s like to work at MIRI

We try really hard to make working at MIRI an amazing experience. We have a team full of truly exceptional people—the kind you’ll be excited to work with. Here’s how we operate:

Flexible Hours

We do not have strict office hours. Simply ensure you’re here enough to be available to the team when needed, and to fulfill all of your duties and responsibilities.

Modern Work Spaces

Many of us have adjustable standing desks with multiple large external monitors. We consider workspace ergonomics important, and try to rig up work stations to be as comfortable as possible.

Living in the Bay Area

We’re located in downtown Berkeley, California. Berkeley’s monthly average temperature ranges from 60°F in the winter to 75°F in the summer. From our office you’re:

  • A 10-second walk to the roof of our building, from which you can view the Berkeley Hills, the Golden Gate Bridge, and San Francisco.
  • A 30-second walk to the BART (Bay Area Rapid Transit), which can get you around the Bay Area.
  • A 3-minute walk to UC Berkeley Campus.
  • A 5-minute walk to dozens of restaurants (including ones in Berkeley’s well-known Gourmet Ghetto).
  • A 30-minute BART ride to downtown San Francisco.
  • A 30-minute drive to the beautiful west coast.
  • A 3-hour drive to Yosemite National Park.

Vacation Policy

Our vacation policy is that we don’t have a vacation policy. That is, take the vacations you need to be a happy, healthy, productive human. There are checks in place to ensure this policy isn’t abused, but we haven’t actually run into any problems since initiating the policy.

We consider our work important, and we care about whether it gets done well, not about how many total hours you log each week. We’d much rather you take a day off than extend work tasks just to fill that extra day.

Regular Team Dinners and Hangouts

We get the whole team together every few months, order a bunch of food, and have a great time.

Top-Notch Benefits

We provide top-notch health and dental benefits. We care about our team’s health, and we want you to be able to get health care with as little effort and annoyance as possible.

Agile Methodologies

Our ops team follows standard Agile best practices, meeting regularly to plan, as a team, the tasks and priorities over the coming weeks. If the thought of being part of an effective, well-functioning operation gets you really excited, that’s a promising sign!

Other Tidbits

  • Moving to the Bay Area? We’ll cover up to $3,500 in travel expenses.
  • Use public transit to get to work? You get a transit pass with a large monthly allowance.
  • All the snacks and drinks you could want at the office.
  • You’ll get a smartphone and full plan.
  • This is a salaried position. (That is, your job is not to sit at a desk for 40 hours a week. Your job is to get your important work done, even if this occasionally means working on a weekend or after hours.)

It can also be surprisingly motivating to realize that your day job is helping people explore the frontiers of human understanding, mitigate global catastrophic risk, etc., etc. At MIRI, we try to tackle the very largest problems facing humanity, and that can be a pretty satisfying feeling.

If this sounds like your ideal work environment, read on! It’s time to talk about your role.

What an office manager does and why it matters

Our ops team and researchers (and collection of remote contractors) are swamped making progress on the huge task we’ve taken on as an organization.

That’s where you come in. An office manager is the oil that keeps the engine running. They’re indispensable. Office managers are force multipliers: a good one doesn’t merely improve their own effectiveness—they make the entire organization better.

We need you to build, oversee, and improve all the “behind-the-scenes” things that ensure MIRI runs smoothly and effortlessly. You will devote your full attention to looking at the big picture and the small details and making sense of it all. You’ll turn all of that into actionable information and tools that make the whole team better. That’s the job.

Sometimes this looks like researching and testing out new and exciting services. Other times this looks like stocking the fridge with drinks, sorting through piles of mail, lugging bags of groceries, or spending time on the phone on hold with our internet provider. But don’t think that the more tedious tasks are low-value. If the hard tasks don’t get done, none of MIRI’s work is possible. Moreover, you’re actively encouraged to find creative ways to make the boring stuff more efficient—making an awesome spreadsheet, writing a script, training a contractor to take on the task—so that you can spend more time on what you find most exciting.

We’re small, but we’re growing, and this is an opportunity for you to grow too. There’s room for advancement at MIRI (if that interests you), based on your interests and performance.

Sample Tasks

You’ll have a wide variety of responsibilities, including, but not necessarily limited to, the following:

  • Orienting and training new staff.
  • Onboarding and offboarding staff and contractors.
  • Managing employee benefits and services, like transit passes and health care.
  • Payroll management; handling staff questions.
  • Championing our internal policies and procedures wiki—keeping everything up to date, keeping everything accessible, and keeping staff aware of relevant information.
  • Managing various services and accounts (ex. internet, phone, insurance).
  • Championing our work space, with the goal of making the MIRI office a fantastic place to work.
  • Running onsite logistics for introductory workshops.
  • Processing all incoming mail packages.
  • Researching and implementing better systems and procedures.

Your “value-add” is by taking responsibility for making all of these things happen. Having a competent individual in charge of this diverse set of tasks at MIRI is extremely valuable!

A Day in the Life

A typical day in the life of MIRI’s office manager may look something like this:

  • Come in.
  • Process email inbox.
  • Process any incoming mail, scanning/shredding/dealing-with as needed.
  • Stock the fridge, review any low-stocked items, and place an order online for whatever’s missing.
  • Onboard a new contractor.
  • Spend some time thinking of a faster/easier way to onboard contractors. Implement any hacks you come up with.
  • Follow up with Employee X about their benefits question.
  • Outsource some small tasks to TaskRabbit or Upwork. Follow up with previously outsourced tasks.
  • Notice that you’ve spent a few hours per week the last few weeks doing xyz. Spend some time figuring out whether you can eliminate the task completely, automate it in some way, outsource it to a service, or otherwise simplify the process.
  • Review the latest post drafts on the wiki. Polish drafts as needed and move them to the appropriate location.
  • Process email.
  • Go home.

You’re the one we’re looking for if:

  • You are authorized to work in the US. (Prospects for obtaining an employment-based visa for this type of position are slim; sorry!)
  • You can solve problems for yourself in new domains; you find that you don’t generally need to be told what to do.
  • You love organizing information. (There’s a lot of it, and it needs to be super-accessible.)
  • Your life is organized and structured.
  • You enjoy trying things you haven’t done before. (How else will you learn which things work?)
  • You’re way more excited at the thought of being the jack-of-all-trades than at the thought of being the specialist.
  • You are good with people—good at talking about things that are going great, as well as things that aren’t.
  • People thank you when you deliver difficult news. You’re that good.
  • You can notice all the subtle and wondrous ways processes can be automated, simplified, streamlined… while still keeping the fridge stocked in the meantime.
  • You know your way around a computer really well.
  • Really, really well.
  • You enjoy eliminating unnecessary work, automating automatable work, outsourcing outsourcable work, and executing on everything else.
  • You want to do what it takes to help all other MIRI employees focus on their jobs.
  • You’re the sort of person who sees the world, organizations, and teams as systems that can be observed, understood, and optimized.
  • You think Sam is the real hero in Lord of the Rings.
  • You have the strong ability to take real responsibility for an issue or task, and ensure it gets done. (This doesn’t mean it has to get done by you; but it has to get done somehow.)
  • You celebrate excellence and relentlessly pursue improvement.
  • You lead by example.

Bonus Points:

  • Your technical chops are really strong. (Dabbled in scripting? HTML/CSS? Automator?)
  • Involvement in the Effective Altruism space.
  • Involvement in the broader AI-risk space.
  • Previous experience as an office manager.

Experience & Education Requirements

  • Let us know about anything that’s evidence that you’ll fit the bill.

How to Apply

by July 31, 2015!

P.S. Share the love! If you know someone who might be a perfect fit, we’d really appreciate it if you pass this along!


  1. More details on our About page. 

Meetup : Rationality Reading Group (57-61)

1 CBHacking 03 July 2015 06:03AM

Discussion article for the meetup : Rationality Reading Group (57-61)

WHEN: 07 July 2015 06:30:00PM (-0700)

WHERE: Paul G. Allen Center (185 Stevens Way, Seattle, WA) Room 503

Reading group for Yudkowsky's "Rationality: AI to Zombies", which is basically an organized and updated version of the Sequences from LW (see http://wiki.lesswrong.com/wiki/Sequences).

The group meets to discuss the topics in the book, how to apply and benefit from them, and related topics in areas like cognitive biases, applied rationality, and effective altruism. You can get a copy of the book here: https://intelligence.org/rationality-ai-zombies/

The reading list for this week is six topics from the "How to actually change your mind" section. These are the same as from last week's meetup, as that didn't really happen. They are (actually 57-61, LW's auto-formatting is screwing it up):

  1. Politics is the Mind-Killer

  2. Policy Debates Should Not Appear One-Sided

  3. The Scales of Justice, the Notebook of Rationality

  4. Correspondence Bias

  5. Are Your Enemies Innately Evil?

We previously covered the "Map and territory" sequence a few months ago, but please don't feel a need to have read everything up to this point to participate in the group.

Event is also on Facebook: https://www.facebook.com/events/1685501021668755/

We're meeting on the 5th floor. If you show up and the door into the room is locked, knock and look around for us elsewhere on the fifth floor if nobody answers. If the doors to the building are locked, try the other ones and don't believe the little red lights; try anyway. If the doors are, in fact, locked, we'll try to have somebody to let people in.

There's usually snacks at the meetup, though feel free to bring something. We usually get dinner afterward, around 9PM or so.

Discussion article for the meetup : Rationality Reading Group (57-61)

A Roadmap: How to Survive the End of the Universe

7 turchin 02 July 2015 11:01AM

In a sense, this plan needs to be perceived with irony because it is almost irrelevant: we have very small chances of surviving even next 1000 years and if we do, we have a lot of things to do before it becomes reality. And even afterwards, our successors will have completely different plans.

There is one important exception: there are suggestions that collider experiments may lead to a vacuum phase transition, which begins at one point and spreads across the visible universe. Then we can destroy ourselves and our universe in this century, but it would happen so quickly that we will not have time to notice it. (The term "universe" hereafter refers to the observable universe that is the three-dimensional world around us, resulting from the Big Bang.)

We can also solve this problem in next century if we create superintelligence.

The purpose of this plan is to show that actual immortality is possible: that we have an opportunity to live not just billions and trillions of years, but an unlimited duration. My hope is that the plan will encourage us to invest more in life extension and prevention of global catastrophic risks. Our life could be eternal and thus have meaning forever.

Anyway, the end of the observable universe is not an absolute end: it's just one more problem on which the future human race will be able to work. And even at the negligible level of knowledge about the universe that we have today, we are still able to offer more than 50 ideas on how to prevent its end.

In fact, to assemble and come up with these 50 ideas I spent about 200 working hours, and if I had spent more time on it, I'm sure I would have found many new ideas.  In the distant future we can find more ideas; choose the best of them; prove them, and prepare for their implementation.

First of all, we need to understand exactly what kind end to the universe we should expect in the natural course of things. There are many hypotheses on this subject, which can be divided into two large groups:

1. The universe is expected to have a relatively quick and abrupt end, known as the Big Crunch or Big Rip (accelerating expansion of the universe causes it to break apart), or the decay of the false vacuum. Vacuum decay can occur at any time; a Big Rip could happen in about 10-30 billion years, and the Big Crunch has hundreds of billions of years timescale.

2. Another scenario assumes an infinitely long existence of an empty, flat and cold universe which would experience so called "heat death" that is gradual halting of all processes and then disappearance of all matter.

The choice between these scenarios depends on the geometry of the universe, which is determined by the equations of general relativity and, – above all – the behavior of the almost unknown parameter: dark energy.

The recent discovery of dark energy has made Big Rip the most likely scenario, but it is clear that the picture of the end of the universe will change several times.

You can find more at: http://en.wikipedia.org/wiki/Ultimate_fate_of_the_universe

There are five general approaches to solve the end of the universe problem, each of them includes many subtypes shown in the map:

1.     Surf the Wave: Utilize the nature of the process which is ending the universe. (The most known of these type of solutions is Omega Point by Tippler, where the universe's energy collapse is used to make infinite calculations.)

2.     Go to parallel world

3.     Prevent the end of the universe

4.     Survive the end of the universe

5.     Dissolving the problem

 Some of the ideas are on the level of the wildest possible speculations and I hope you will enjoy them.

The new feature of this map is that in many cases mentioned, ideas are linked to corresponding wiki pages in the pdf. 

Download the pdf of the map here: http://immortality-roadmap.com/unideatheng.pdf

 

 

Harper’s Fishing Nets: a review of Plato’s Camera by Paul Churchland

10 eli_sennesh 02 July 2015 02:19AM

Harper’s Fishing Nets: a review of Plato’s Camera by Paul Churchland

Eli Sennesh

July 1, 2015

Abstract

Paul Churchland published Plato’s Camera to defend the thesis that abstract objects and properties are both real and natural, consisting in learned mental representations of the timeless, abstract features of the mind’s environment. He holds that the brain learns, without supervision, high-dimensional maps of objective feature domains – which he calls Domain-Portrayal Semantics. He further elaborates that homomorphisms between these high-dimensional maps allow the brain to occasionally repurpose a higher-quality map to understand a completely different domain, reducing the latter to the former. He finally adds a Map-Portrayal Semantics of language to his Domain-Portrayal Semantics of thought by considering the linguistic, cultural, educational dimensions of human learning.

Part I
Introduction

Surely the title of this review already sounds like some terrible joke is about to be perpetrated, but in fact it merely indicates a philosophical difference between myself and Paul Churchland. Churchland wrote Plato’s Camera[3] not merely to explain a view on philosophy of mind to laypeople and other philosophers, but with the specific goal of defending Platonism about abstract, universal properties and objects (such as those used in mathematics) by naturalizing it. The contrast between such naturalist philosophers as Churchland, Dennett, Flanegan, and Railton and non-naturalist or weakly naturalist philosophy lies precisely in this fact: the latter consider many abstract or intuitive concepts to necessarily form their own part of reality, amenable strictly to philosophical investigation, while the former seek and demand a conscilience of causal explanation for what’s going on in our lives. The results are a breath of fresh air to read.

A great benefit of reading strongly naturalistic philosophy and philosophers is that, over the effort of researching a philosophical position, they tend to absorb so much scientific material that they can’t help but achieve a degree of insight and accuracy in their core thesis – even when getting almost all the details wrong! So it is with Plato’s Camera: reading in 2015 a book published in 2012, that mostly does not cite any scientific research from the past five to ten years, the details can’t help but seem somewhat dated and unrealistic, at least to those of us who’ve been doing our own reading in related scientific literature (or possibly just have partisan opinions). And yet, Plato’s Camera captures and supports a core thesis, this being more or less:

  • The brain contains or embodies (high-dimensional) maps of objective domains, and by Hebbian updating over time, the map comes to resemble the territory, be it conceptual (as with mono-directional neural networks) or causal (as with recurrent networks). This is Churchland’s Domain-Portrayal Semantics theory of thought, and Churchland calls the learning process behind it First-Level Learning.
  • Homomorphisms between these (high-dimensional) maps, albeit imperfect ones, allow the brain to notice when one objective domain is reducible to another, and thus deploy its existing conceptual knowledge in new ways. Churchland calls this process Second-Level Learning, and it further bolsters the organism’s ability to navigate reality (as well as implementing reductionism at the heart of Churchland’s epistemology). In a rather more insightful point for a reader to take away from Churchland’s book, this reduction does not invalidate the old map, but in fact supports its veracity, the accuracy with which the map portrays its territory, in the subdomain where the old map works at all. Churchland thus argues for an “optimistic meta-induction”, by which he means that in a Pragmatic Empiricist sense, our past, present, and future scientific knowledge is and will be reliable knowledge about the world, to the extent it agrees with data, even in the absence of a Grand Unified Theory of All Reality.
  • While the senses allow nonhuman animals to index their maps (a “You Are Here!” marker is how Churchland describes it), language allows humans to deliberately and artificially index each-others’ maps, thus allowing us to create long-lived cultural and institutional traditions of knowledge that accumulate over time rather than dying with individuals. Progress thus extends beyond the span of an individual lifetime. This Third-Level Learning allows Churchland to add an implicit Map Portrayal Semantics theory of language to his Domain-Portrayal theory of thought, although I do not recall him naming that implicit theory as such.

It is these core theses which I regard as largely correct, even where their supporting details are based on old research or the wrong research in the view of the present reviewer. I even believe that had Churchland done as much investigation into my favorite school of computational cognitive science, it would have reinforced his thesis and given him enough material for two books instead of just one. In fact, my disagreements with Churchland can be summed up quite succinctly:

  • I believe, and will supply citations for the belief, that probabilistic representations play more role in human cognition, despite making little appearance in Plato’s Camera. In particular, I find Churchland’s defense of Hebbian learning for encoding causal knowledge in recursive deep neural-nets somewhat unconvincing, preferring instead the presentation of [6].
  • I find Churchland’s thesis that recursive, many-layered learning allows animals (not only humans) to map abstract features of their environment incredibly insightful, but disagree that this can correctly be called Platonism. Platonism concerns itself with abstract universals (and Churchland says it does). I feel that recursive, many-layered learning allows organisms to map the abstract features of their local environment, while making no guarantees regarding the universal applicability of maps learned from finite information about local territory.
  • Platonism is also often about specific objects (such as those of mathematics or ethics) that are claimed to abstractly exist. This notion brings in the important spectrum in cognitive science between feature-governed concepts and causal role-governed concepts. “Electron”, for instance, is actually a theory-laden concept defined chiefly by the causal role(s) involved – but we usually think of electrons as “not very Platonic” while metric spaces are “more Platonic” and Categorical Imperatives are “very extremely Platonic”. I feel that while the mind may posit objects which model certain feature-spaces and fill certain causal roles very elegantly, if those objects are not available, even counterfactually, to multiple modalities from which to sample feature data, I can’t help but suspect they might not really “exist” in a mind-independent sense. This probably sounds like quite a nitpick, but immense portions of the things dreamt-of in human philosophies depend on one’s position on this question. (In fact, confusing a causal role with an object or substance lies at the heart of many superstitions.) On the other hand, we should consider it an open question whether or not “Platonic” abstractions form a necessary component of resource-rational cognition.
  • I feel that imperfect (implied to be linear) homomorphism between maps doesn’t work very well as a theory of Second-Level Learning, as any real computational system capable of representing the entire physical world would have to be Turing-complete. Since the representation language would be Turing-complete, the total extensional equivalence of any two models would necessarily be undecidable[1]. And this undecidability arises long before the creature begins to think in the kinds of self-referential terms for which undecidability theorems have been made famous! Dealing with this issue in a sane way remains a major open research problem for anyone proposing to theorize on the workings of the mind.

And yet, for all that these may sound substantial, they are the sum total of my objections. Churchland has otherwise written an excellent book that gets its point across well, and whose many moments of snark against non-naturalistic philosophies of mind, especially the linguaformal “Hilbert proof system theory of mind”, are actually enjoyable (at least, to one who enjoys snark).

In fact, in addition to just describing Churchland’s work, I will spend some of my review noting where other work bolsters it, particularly from the rational analysis (and resource-rational) school of cognitive science[9]. This school of thought aims to understand the mind by first assuming that the mind is posed particular, constrained problems by its environment, then positing how these problems can be optimally solved, and then comparing the resulting theoretical solutions with experimental data. The mind is thus understand as an approximately boundedly rational engine of inference, forced by its environment to deal with shortages of sample data and computational power in the most efficient way possible, but ultimately trying to perform well-defined tasks such as predicting environmental stimuli or plan rewarding actions for the embodied organism to take.

Why “Harper’s Fishing Nets”, then? Well, because treating abstract universals as computational objects learned by generalizing over many domains seems more along the lines of Robert Harper’s “computational trinitarianism” than true Platonism, and because the noisy, always-incomplete process of recursive learning seems more like a succession of fishing nets, with their ropes spaced differently to catch specific species of fish, than like a camera that takes a single, complete picture. All learning algorithms aim to capture the structural information in their input samples while ignoring the noise, but the difference is, of course, undecidable[10]. Recursive pattern recognition - the unsupervised recognition of patterns in already-transformed feature representations - may thus be applicable for capturing additional levels of structural information, especially where causal learning prevents collapsing all levels of hierarchy into a single function. Or, as Churchland himself puts it:

Since these thousands of spaces or ‘maps’ are all connected to one another by billions of axonal projections and trillions of synaptic junctions, such specific locational information within one map can and does provoke subsequent pointlike activations in a sequence of downstream representational spaces, and ultimately in one or more motor-representation spaces, whose unfolding activations are projected onto the body’s muscle systems, thereby to generate cognitively informed behaviors.

Churchland is especially to be congratulated for approaching cognition as a capability that must have evolved in gradual steps, and coming up with a theory that allows for nonhuman animals to have great cognitive abilities in First-Level Learning, even if not in Second and Third.

Choice quotes from the Introductory section:

  • Since these thousands of spaces or ‘maps’ are all connected to one another by billions of axonal projections and trillions of synaptic junctions, such specific locational information within one map can and does provoke subsequent pointlike activations in a sequence of downstream representational spaces, and ultimately in one or more motor-representation spaces, whose unfolding activations are projected onto the body’s muscle systems, thereby to generate cognitively informed behaviors.
  • The whole point of the synapse-adjusting learning process discussed above was to make the behavior of neurons that are progressively higher in the information-processing hierarchy profoundly and systematically dependent on the activities of the neurons below them.
  • [Trained neural networks represent] a space that has a robust and built-in probability metric against which to measure the likelihood, or unlikelihood, of the objective feature represented by that position’s ever being instantiated.
  • Indeed, the “justified-true-belief” approach is misconceived from the outset, since it attempts to make concepts that are appropriate only at the level of cultural or language-based learning do the job of characterizing cognitive achievements that lie predominantly at the sublinguistic level.

 

Part II
First-Level Learning

It is no understatement to say that First-Level Learning forms the shining star of Churchland’s book. It is the process by which the brain forms and updates increasingly accurate maps of conceptual and causal reality, a deeply Pragmatic process shared with nonhuman animal and taking place largely below conscious awareness. In machine-learning terms, First-Level Learning consists mainly of classification and regression problems: classifying hierarchies of regions of compacted metric spaces to form concepts using feedforward neural learning, and regressing trajectories through state-spaces to form causal understanding using recurrent neural networks. One full chapter each is spent on the former and the latter subjects.

 

1 First-Level Conceptual Learning

He begins in his first chapter on First-Level Learning with a basic introduction to many-layered feedforward neural networks, their training via supervised backpropagation of errors, and their usage for classification of feature-based concepts. He talks about the nonlinear activation functions, like sign and sigmoid, necessary to allow feedforward networks to approximate arbitrary total functions. He gives examples of face-recognition neural networks, which will probably be old-hat for any student of machine learning, but are extremely necessary for laypeople and philosophers untrained in computational approaches to modelling perception. Churchland is also careful to specify that these are not the neural networks of the real human mind, but instead specific examples of what can be done with neural networks. Finally, Churchland begins defending his thesis about Platonism when talking about an artificial neural network designed to classify colors:

[W]e can here point to the first advantage of the information-compression effected by the Hurvich network: it gives us grip on any object’s objective color that is independent of the current background level of illumination.

Or put simply, the kinds of abstract, higher-level features learned by multi-layer neural networks serve to represent certain objective facts about the environment, with each successively lower layer of the network filtering out some perceptual noise and capturing some important structural information.

Churchland also elaborates, in several places, on the compaction of metric-space produced by the nonlinear transformations encoded in neural networks. Neural networks don’t spread their training data uniformly in the output space (or in any of the spaces formed by the intermediate layers of the network)! In fact, they tend to push their training points into highly compacted prototype regions in their output spaces, and when later activated they will try to “divert” any given vector into one of those compacted regions, depending on how well it resembles them in the first place. Since all neural networks receive and produce vectors, and vector spaces are metric spaces, Churchland notes that these neural-network concepts innately and necessarily carry distance metrics for gauging the similarities or differences between any two sensory feature-vectors (or, Churchland implies, real-world objects represented by abstract feature vectors). Churchland even notes, in a rare mention of probability in his book, that these compactions into distinct prototype regions for classes or clusters of training data can even be taken as a sort of emerging set of probability density functions over the training data:

The regions of the two prototypical hot spots represent the highest probability of activation; the region of stretched lines in between them represents a very low probability; and the empty regions outside the deformed grid are not acknowledged as possibilities at all—no activity at the input layer will produce [such] a second-rung activation pattern[.]

Churchland deploys the vector-completion effect in feedforward networks as an example of primitive abductive reasoning himself:

Accordingly, it is at least tempting to see, in this charming capacity for relevant vector completion, the first and most basic instances of what philosophers have called “inference-to-the-best-explanation,” and have tried, with only limited success, to explicate in linguistic or propositional terms.

Churchland deploys his metaphor of concepts as maps of feature-spaces again and again to great effect; I only wish he had taken greater effort to talk of his rarely-mentioned “deformed grids” as topographical maps, measures over the training data, and of the nonlinear transformations taken vectors from their input spaces into topologically-measured maps as flows or rivers. I cannot tell if he took seriously the notion of neural-network training as learning topographical maps of the non-input spaces, or of those topographies as measures in the sense of probability theory. Certainly, the physical metaphor of a river’s flow provides a good intuition pump for describing how well-trained neural network carves out paths from where drops of rain fall to where they ought to go, by whatever criterion trains the network. Certainly, he seems to be thinking something along these lines when he uses metric-space compaction to examine category effects:

This gives us, incidentally, a plausible explanation for so-called ‘category effects’ in perceptual judgments. This is the tendency of normal humans, and of creatures generally, to make similarity judgments that group any two within-category items as being much more similar (to each other) than any other two items, one of which is inside and one of which is outside the familiar category. Humans display this tilt even when, by any “objective measure” over the unprocessed sensory inputs, the similarity measures across the two pairs are the same.

Looking at neural-network training data as measurable would also help us think about how mere perception generates “sensorily simple” random variables, representing qualitative measurements of the world that correspond to the world, which would then be of use according to probabilistic theories of cognition. Certainly, a number of cognitive scientists and neuroscientists have been researching neural mechanisms for representing probabilities[1319]. A number of these even provide exactly the kind of approximate Bayesian inference one would require when working with open-world models that can have countably infinitely many separate random variables, an important component of working with Turing-complete modelling domains. One paper even proposes that the neural implementation and learning of probability density/mass functions can explain certain deviations of human judgements from the probabilistic optimum[13]. Again: Churchland’s book, published in 2012 and sent to press without little mention of probability, still clearly prefigured neural encodings of probability, which have turned out to be a productive research effort. This is a testament to how well Churchland has generalized from what previous neuroscientific research he did have!

Of course, Churchland himself decries any notion of sensorily simple variables:

This story thus assumes that our sensations also admit of a determinate decomposition into an antecedent alphabet of ‘simples,’ simples that correspond, finally, to an antecedent alphabet of ‘simple’ properties in the environment.

Churchland would also have done quite well to cover the Blessing of Abstraction and hierarchical modelling (first mentioned in [7]) for their unique effect: they allow training data to be shared across tasks and categories, and thus ameliorate the Curse of Dimensionality. They are how real embodied minds compress their sensory features so as to reduce the necessary sample-complexity of learning to the absolute minimum: sometimes even one single example[18]. I personally hypothesize that the same effect is at work in hierarchical Bayesian modelling as in the recent fad for “deep” learning in artificial neural networks, which learn hierarchies of features: breadth in the lower layers of the model/network provides large amounts of information to quickly train the higher, “abstract” layer of the model/network, which then provides a strong inductive bias to the lower layers. He does mention something like this, however:

[A]s the original sensory input vector pursues its transformative journey up the processing ladder, successively more background information gets tapped from the synaptic matrices driving each successive rung.

This certainly gives an insight into why deep neural networks with sparse later layers work so well: sample information is aggregated in the top layers and then backpropagated to lower layers.

This brings us right back to the Platonism for which Churchland is trying to argue. As usual, we wish to operate under the “game rules” of a very strong naturalism, in which Platonic entities are surely not allowed to be any kind of ontologically “spooky” stuff. After all, we don’t observe any spooky processes interfering in ordinary physical and computational causality to generate thoughts about Platonic Forms or mathematical structures. Instead, we observe embodied, resource-bounded creatures generalizing from data, even if Churchland is a pure connectionist while I favor a probabilistic language of thought. What sort of Platonism would help us explain what goes on in real minds? I think a productive avenue is to view Platonic abstractions as concepts (necessarily compositional concepts of the kind Churchland doesn’t address much, but which are now sometimes described as stochastic functions[8]) which optimally compress a given type of experiential data. We could thus propose Platonic realism about abstract concepts which any reasoner must necessarily develop as they approach the limit of increasing sample data and computational power, and simultaneously Platonic antirealism about abstract concepts which tend to disappear as reasoners gain more information and compute further.

This will probably sound somewhat overwrought and unnecessary to theorists from backgrounds in algorithmic information theory and artificial intelligence. What need does the optimally intelligent “agent”, AIXI, have for Platonic concepts of anything[12]? It just updates a distribution over all possible causal structures and uses it to make predictions. The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information – AIXI can update optimally on sensory information by falling back on its infinite computing power. Such a reasoner, it seems, has no need to compose or decompose causal structures, no need for concepts, but for everyone else, hierarchical representations compress data very efficiently[14]. They also map well onto probabilistic modelling. This trade-off between the decomposability and the degree of compression achieved by any given representation of a concept will have to play a part in a more complete theory of abstract objects as optimally compressed stochastic functions.

Here, though, is a reason for learned representations to be “white-box”, open to introspection and decomposition into smaller concepts: counterfactual-causal reasoning involves zeroing in on a particular random variable in a model and cutting its links to its causal parents. Only white-box representations allow this “graph surgery”; only open-box representations are friendly to causal reasoning about independent, composable concepts rather than whole possible-worlds.

 

2 Causal reasoning as recurrent-activation-space trajectories

And Churchland does cover causal reasoning! Or at least, he covers reasoning and learning in sequence-prediction tasks, with an elaborate theory of First-Level Learning in recurrent neural networks. Whether this counts as causal reasoning or not depends on whether the reader considers causal reasoning to require modelling counterfactuals and doing graph-surgery to support interventions. Churchland begins by explaining exactly why an embodied organism should want to reason about temporal sequences:

Two complex interacting objects were each outfitted with roughly two dozen small lights attached to various critical parts thereof, and the room lights were then completely extinguished, leaving only the attached lights as effective visual stimuli for any observer. A single snapshot of the two objects, in the midst of their mobile interaction, presented a meaningless and undecipherable scatter of luminous dots to any naïve viewer. But if a movie film or video clip of those two invisible objects were presented, instead of the freeze-frame snapshot, most viewers could recognize, within less than a second and despite the spatial poverty of the visual stimuli described, that they were watching two humans ballroom-dancing in the pitch dark.

Churchland starts his chapter on temporal and causal learning thusly, noting that for an embodied animal, temporal reasoning provides not only an essential way to handle ecologically necessarily tasks, but a dramatic improvement on the performance of moment-to-moment cognitive distinctions. Thus he theorizes that creatures understand causal models as trajectories through metrically-sculpted activation spaces of recurrent neural networks, isomorphic to the execution traces of a computer program. In fact, he tells the reader, extending an animal’s reasoning in Time helps it to cut reality at the joints, so much so that temporal reasoning may have come first.

[I]t is at least worth considering the hypothesis that prototypical causal or nomological processes are what the brain grasps first and best. The creation and fine-tuning of useful trajectories in activation space may be the primary obligation of our most basic mechanisms of learning.

He further points out that, since the function of the autonomic nervous system has always been to regulate cyclical bodily processes, recurrent neural networks may actually be the norm in living animals, and could easily have evolved first for autonomic functions before being adapted to aid in temporal cognition. The brain, then, is conceived as a network-of-networks, capable of activating the recurrent evolution of its sub-networks whenever it needs to imagine how some temporal (or computational) process might proceed:

Our network, that is, is also capable of imaginative activity. It is capable of representational activity that is prompted not by a sensory encounter with an instance of the objective reality therein represented, but rather by ‘top-down’ stimulation of some kind from elsewhere within a larger encompassing network or ‘brain.’

Much of the material from the previous chapter on supervised learning, Hebbian unsupervised learning, and map metaphors is repeated and carried over in this chapter, the better to hammer it home.

 

3 Criticisms

Now the unfortunate negative. Churchland’s account of conceptual and causal First-Level Learning spends too little explanatory effort, for my tastes at least, on causal-role concepts in particular. Philosophy of mind has long given both feature-governed and role-governed notions of concept, and the cognitive sciences have shown how general learning mechanisms can produce concepts governed by mixtures of sensory features and causal or relational roles[17]. In fact, causal-role concepts appear to form a bedrock for uniquely human thought: humans and other highly intelligent, social animals learn concepts abstracted from their available feature data, of “what something does” rather than “how something looks”. This is how human thought gains its infinitely productive compositionality. In fact, we often utilize concepts grounded so thoroughly in causal role, and so little in feature data, that we forget they “look like” anything at all (more on that when we cover Second-Level Learning and naturalization)! Churchland explicitly mentions how we ought to be able to “index” our “maps” via multiple input modalities, thus enabling us to use concepts abstracted from any one way of obtaining or producing feature data:

Choose any family of familiar observational terms, for whichever sensory modality you like (the family of terms for temperature, for example), and ask what happens to the semantic content of those terms, for their users, if every user has the relevant sensory modality suddenly and permanently disabled. The correct answer is that the relevant family of terms, which used to be at least partly observational for those users, has now become a family of purely theoretical terms. But those terms can still play, and surely will play, much the same descriptive, predictive, explanatory, and manipulative roles that they have always played in the conceptual commerce of those users.

He just doesn’t say how the brain does so.

He also gives a theory for identifying maps with each-other, which is to find a homomorphism taking the contents of one map into the contents of the other:

[T]hey do indeed embody the same portrayal, then, for some superposition of respective map elements, the across-map distances, between any distinct map element in (a) and its ‘nearest distinct map element’ in (b), will fall collectively to zero for some rotation of map (b) around some appropriate superposition point.

This works just fine for his given example of two-dimensional highway maps, but (at least we have solid reason to think) cannot work when the maps themselves come to express a Turing-complete mode of computation, as in recurrent neural networks. The equality of lambda expressions in general is undecidable, after all; the only open question is whether we can determine equality in some useful, though algorithmically random, subset of cases (as is common in theoretical computer science), or whether we can find some sort of approximate equality-by-degrees that works well-enough for creatures with limited information.

The “map” metaphor also elides the fact that computation, in neural networks, takes place at the synapses, not in the neurons. The actual work is done by the nonlinear transformations of vectors between layers of neurons.

Churchland also fails to elaborate on the differences between training neural networks via backpropagation of errors and training them via Hebbian update rules. This is important: as far as my own background reasoning can find, backpropagation of errors suffices to train a neural network to approximate any circuit (or even any computable partial function if we deal with recurrent networks), while even the most general form of unsupervised Hebbian learning seems to learn the directions of variation within a set of feature vectors, rather than general total or partial recursive functions over the input data.

 

4 Loose-Leaf Highlights

Churchland on free will:

Freedom, on this view, is ultimately a matter of knowledge—knowledge sufficient to see at least some distance into one’s possible futures at any given time, and knowledge of how to behave so as to realize, or to enhance the likelihood of, some of those alternatives at the expense of the others.

He extends the matter up to whole societies:

And as humanity’s capacity for anticipating and shaping our economic, medical, industrial, and ecological futures slowly expands, so does our collective freedom as a society. That capacity, note well, resides in our collective scientific knowledge and in our well-informed legislative and executive institutions.

On unsupervised learning without a preestablished system of propositions (as is used in most current Bayesian methods), in defense of connectionism:

What is perhaps most important about this kind of learning process, beyond its being biologically realistic right down to the synaptic level of physiological activity, is that it does not require a conceptual framework already in place, a framework fit for expressing propositions, some of which serve as hypotheses about the world, and some of which serve as evidence for or against those hypotheses.

 

Part III
Second-Level Learning: Reductionism, Hierarchies of Theories, Naturalization, and the Progress of the Sciences

If Churchland’s material on First-Level Learning seems, in some ways, like so much outmoded hype about neural networks, his material on Second-Level Learning remains sufficient justification to read his book. Second-Level Learning, the process by which the mind notices that it can repurpose its available conceptual “maps”, and thus comes to form an increasingly unified and coherent picture of the world, is where Churchland hits his (as ever, understated) stride. In addressing Second-Level Learning, Churchland covers the well-worn philosophy-of-science progression of physics from Aristotelian intuitive theories up through Newton and then, eventually, Einstein. This is also where he begins to talk about normatively rational reasoning:

Both history and current experience show that humans are all too ready to interpret puzzling aspects of the world in various fabulous or irrelevant terms, and then rationalize away their subsequent predictive/manipulative failures, if those failures are noticed at all.

Second-Level Learning is described as just turning old ideas to new uses. The brain more-or-less randomly notices the partial homomorphism of two conceptual “maps” (again: high-dimensional vector spaces with metric compaction based on Hebbian learning in neural networks) and repurposes (and re-trains) the more accurate, detailed, and general “map” (call it the larger map) to predict and describe the phenomena once encompassed by the less accurate, less detailed, and less general “map” (call it the smaller one). Viewed in the larger historical context Churchland gives it, however, Second-Level Learning is the methodology of scientific thought as we have come to understand it. Churchland gives solid reason to hypothesize that by means of Second-Level Learning, human beings and humankind have come to understand our world.

In larger terms, Second-Level Learning consists of naturalizing concepts in terms of other concepts, forming hierarchies of theories.

Our knowledge begins as a vast, disconnected, disparate mish-mash of independent concepts and theories, none of which makes sense in terms of the others, and which leaves us no recourse to any universal terms of explanation. Worse, our intuitive theories are often so disconnected that we may have only one modality of causal access to the objective reality behind any particular concept, perhaps even one so utterly unreliable as subjective introspection.

As we proceed to assemble interlocking hierarchies of theories, however, the increased connectedness of our theories allows us to spread the training information derived from experience and experiment throughout, letting us use the feature-modality behind one concept to inquire about the objective reality behind a seemingly different concept. By judicious application of Second-Level Learning, we develop an increasingly coherent, predictive, unified body of knowledge about the objective reality in which we find ourselves. We also become able to dissolve concepts that no longer make sense by showing what explains their training experiences, and sometimes come to be rationally obligated to reject concepts and theories that just no longer fit our experiences. Consilience can thus be seen as the key to truth, overcoming the exclaimed cries - “But thou must!” - of intuition or apparently-logical argumentation.

This is where Churchland feels a definite need to argue with other major philosophers of science, particularly Karl Popper’s falsificationism (still a staple of many methodology and philosophy-of-science lessons given to grad students everywhere):

Popper’s story of the proper relation between science and experience was also too simple-minded. Formally speaking, we can always conjure up an ‘auxiliary premise’ that will put any lunatic metaphysical hypothesis into the required logical contact with a possible refuting observation statement.

The supposedly possible refutation of a scientific hypothesis “H” at the hands of “if H then not-O” and “O” can be only as certain as one’s confidence in the truth of “O.”… Unfortunately, given the theory-laden character of all concepts, and the contextual contingencies surrounding all perceptions, no observation statement is ever known with certainty, a point Popper himself acknowledges. So no hypothesis, even a legitimately scientific one, can be refuted with certainty – not ever. One might shrug one’s shoulders and acquiesce in this welcome consequence, resting content with the requirement that possible observations can at least contradict a genuinely scientific hypothesis, if not refute it with certainty.

Heavy and contentious words already, but well in line with the basic facts about learning and inference discovered by the pioneers of statistical learning theory: as long as one’s theory remains fully deterministic and one’s reasoning fully deductive, one must place absolute faith in experience (which, to wit, experience tells us is unreliable) and meaningfully eliminate hypotheses slowly, if ever. Abductive inference, not deductive, forms the core of real-world scientific reasoning, and one is reminded of Broad’s calling inductive reasoning, “the glory of Science” and yet “the scandal of Philosophy”. Having adopted abduction of inferred models, subject to revision, we can now justify those inferences much better than we could when philosophers talked of inductive reasoning about the certain truth or falsity of propositions. Churchland continues into territory even surer to arouse controversy, among the public if not among professional scientists or philosophers:

But this [revision to Popper given above] won’t draw the required distinction either, even if we let go of the requirement of decisive refutability for generalized hypotheses. The problem is that presumptive metaphysics can also creep into our habits of perceptual judgment, as when an unquestioning devout sincerely avers, “I feel God’s disapproval” or “I see God’s happiness,” when the rest of us would simply say, “I feel guilty” or “I see a glorious sunset.” This possibility is not just a philosopher’s a priori complaint: millions of religious people reflexively approach the perceivable world with precisely the sorts of metaphysical concepts just cited.

Throughout this latter portion of the book, Churchland takes numerous other shots at superstition, religion, model-theoretic philosophical theories of semantics, non-natural normativity, and various other forms of belief in the spooky and weird (whatever joke I may appear to be making here is paraphrased straight from Churchland’s own views). Regarding the last item on the list in particular, Churchland does indeed take an explicit stand in favor of naturalizing normative rationality via Second-Level Learning:

Since we cannot derive an “ought” from an “is,” continues the objection, any descriptive account of the de facto operations of a brain must be strictly irrelevant to the question of how our representational states can be justified, and to the question of how a rational brain ought to conduct its cognitive affairs. … An immediate riposte points out that our normative convictions in any domain always have systematic factual presuppositions about the nature of that domain. … A second riposte points out that a deeper descriptive appreciation of how the cognitive machinery of a normal or typical brain actually functions, so as to represent the world, is likely to give us a much deeper insight into the manifold ways in which it can occasionally fail to function to our representational advantage, and a deeper insight into what optimal functioning might amount to.

This objection to the “is-ought gap” should be happily received by cognitive scientists everywhere: it is certainly impossible to prove that an algorithm solves a given problem optimally, or even approximately, when we do not know what the problem is. What certain schools of thinking about rationality tend to fail to appreciate is that, particularly when dealing with highly constrained problems of abductive reasoning, we also cannot prove that a certain algorithm is very bad (in failing to approximate or approach an optimal solution, even in the limit of increasing resources) without knowing what the problem to be solved actually is.

Churchland backs up these ideas with a cogent analogy:

Imagine now a possible eighteenth century complaint, raised just as microbiology and biochemistry were getting started, that such descriptive scientific undertakings were strictly speaking a waste of our time, at least where normative matters such as Health are concerned, a complaint based on the ‘principle’ that “you can’t derive an ought from an is.” … Our subsequent appreciation of the various viral and bacteriological origins of the pantheon of diseases that plague us, of the operations of the immune system, and of the endless sorts of degenerative conditions that undermine our normal metabolic functions, gave us an unprecedented insight into the underlying nature of Health and its many manipulable dimensions. Our normative wisdom increased a thousand-fold, and not just concerning means-to-ends, but concerning the identity and nature of the ’ultimate’ ends themselves.

… The nature of Rationality, in sum, is something we humans have only just begun to penetrate, and the cognitive neurosciences are sure to play a central role in advancing our normative as well as our descriptive understanding, just as in the prior case of Health.

 

5 Hierarchies of Theories and Reductionism

How, then, does Second-Level Learning proceed in the actual, physical brain?

Here the issue is whether the acquired structure of one of our maps mirrors in some way (that is, whether it is homomorphic with) some substructure of the second map under consideration. Is the first map, perhaps, simply a more familiar and more parochial version of a smallish part of the larger and more encompassing second map?

Churchland has, earlier in the book, already proposed an algorithm for inferring the degree to which two maps seem to portray the same domain, and he is deploying it here to explain how the brain can perform inter-theoretic reductions. The only problem, to my eyes, is that as stated above, this algorithm proposes to solve an undecidable problem when we begin to deal with the Turing-complete hypothesis-space represented by recurrent neural networks (and considering finite recurrent networks as learning deterministic finite-state automata just reduces our problem from undecidable to EXPTIME-complete).

On the question of how we come to intertheoretic reductions, Churchland opined that they occur more-or-less randomly, or at least unpredictably:

Most importantly, such singular events are flatly unpredictable, being the expression of the occasionally turbulent transitions, from one stable regime to another, of a highly nonlinear dynamical system: the brain.

Thanks to later work, we know that Churchland erred at least somewhat on this point, but that doesn’t make Churchland’s view of intertheoretic reductions irredeemable. Quite to the contrary, later work has ridden to the rescue of Churchland’s Second-Level Learning, presenting us with a map of the landscape of scientific hierarchies. The statistical nature of this map of maps is worth quoting directly for its elegance[16]:

Recent studies of nonlinear, multiparameter models drawn from disparate areas in science have shown that predictions from these models largely depend only on a few ’stiff’ combinations of parameters [6, 8, 9]. This recurring characteristic (termed ’sloppiness’) appears to be an inherent property of these models and may be a manifestation of an underlying universality [11]. Indeed, many of the practical and philosophical implications of sloppiness are identical to those of the renormalization group (RG) and continuum limit methods of statistical physics: models show weak dependance of macroscopic observables (defined at long length and time scales) on microscopic details. They thus have a smaller effective model dimensionality than their microscopic parameter space [12].

The objective reality we confront on a daily basis not only can be modelled at multiple levels of abstraction, but in order to utilize our experiential data as efficiently as possible, we must model it at multiple levels of abstraction. Macroscopic models explain more of the variation in observable data with fewer parameters, while microscopic models successfully explain a larger portion of the total available data by including even the “sloppier” parameters. How large is the trade-off between these models, in terms of necessary data and generalization power? Extremely large:

Eigenvalues [of the Fisher Information Matrix] are normalized to unit stiffest value; only the first 10 decades are shown. This means that inferring the parameter combination whose eigenvalue is smallest shown would require ~1010 times more data than the stiffest parameter combination. Conversely, this means that the least important parameter combination is √---- 1010 times less important for understanding system behavior.

The amounts of variation explained by expanding combinations of parameters are distributed exponentially: the plurality of variation can usually be captured with very few parameters (as with intuitive theories that are “fuzzy” even on the mesoscopic scale), the majority with relatively few parameters (as with macroscopically accurate models that ignore microscopic reality), and the whole of variation explained by recourse to increasingly many parameters (as in microscopic models). Note that this exponential distribution of variance explanation adds weight to the Platonism of optimal compressions advocated above, and to Churchland’s Platonism: in order to make efficient use of available experiential data to explain variance and predict well in varying environments, we must form certain abstract concepts, and we must either form them into hierarchies (or to take from mathematical logic, entailment preorders of probabilistic conditioning). An embodied mind most likely cannot feasibly function in real-time without modelling what Churchland calls “the timeless landscape of abstract universals that collectively structure the universe” (even if one doesn’t accord those abstracts any vaunted metaphysical status).

What, then, can we call an intertheoretic reduction, on a modelling level? The perfect answer would be: a deterministic, continuous function from the high-dimensional parameter space of a microscopic model (which has a simple deterministic component but vast uncertainty about parameters) to the low-dimensional parameter space of a macroscopic model (which makes less precise, more stochastic predictions, but allows for more certainty about parameters). In a rare few cases, we can even construct such a function: consider temperature as the average kinetic energy, thus derived from the average velocity, of a body of particles. Even though we cannot feasibly obtain the sample data to know the individual velocity of tens of millions of particles in a jar of air, our microscopic model tells us that averaging those tens of millions of parameters will give us the single macroscopic parameter we call temperature, which is as directly observable as anything via a simple thermometer (whose usage is just another model for the human scientist to learn and employ). Churchland even gives us an example of how these connections between theories aid a nonhuman creature in its everyday cognition:

Who would have felt that the local speed of sound was something which could be felt? But it can, and quite accurately, too. Of what earthly use might that be? Well, suppose you are a bat, for example. The echo-return time of a probing squeak, to which bats have accurate access, gives you the exact distance to an edible target moth, if you have running access to the local speed of sound.

Usually, intertheoretic reductions are more probabilistic than this, though. Newton generalized his Laws of Motion and calculated the motion of the planets under his laws of gravitation for himself, rather than possessing a function that would construct Kepler’s equations from his. This looks more like evaluating a likelihood function and selecting as his “microscopic” theory the one which gave a higher likelihood to the available data while having a larger support set, as in probabilistic interpretations of scientific reasoning.

 

6 Naturalization and the Progress of the Sciences

We face a substantial difficulty in employing hierarchies of theories to explain the natural world around us: our meso-scale observable variables are very distantly abstracted from the microscopic phenomena that, under our best scientific theories, form the foundations of reality. On the one hand, this is reassuring: our microscopic theories require huge amounts of free parameters precisely because they reduce large, complex things to aggregations of smaller, simpler things. Since we need many small things to make a large thing, we should find that thinking of the large thing in terms of its constituent small things requires huge amounts of information. However, this also implies that our descriptions of fundamental reality are far more theory-laden than our descriptions of our everyday surroundings. We suffer from a polarization in which humanly intuitive theories and theories of the fundamentals of reality come to occupy the opposite sides of our hierarchy. Thus:

The process presents itself as a meaningless historical meander, without compass or convergence. Except, it would seem, in the domain of the natural sciences.

We might call it a symptom of that very polarization that human beings require strict intellectual training to successfully think in a naturalistic, scientific way – Churchland has really switched to philosophy of science instead of mind in this part of the book. Our intuitive theories tend to explain most of the variance visible in our observables, but nonetheless don’t predict all that well. As a result, we tend to just intuitively accept that we can’t entirely understand the world. In fact, modern science has obtained more success from trying to find additional observables that will let us get accurate data about the (usually) less influential, smaller-scale structure and parameters of reality. As Churchland describes it:

Such experimental indexings can also be evaluated for their consistency with distinct experimental indexings made within distinct but partially overlapping interpretive practices, as when one uses distinct measuring instruments and distinct but overlapping conceptual maps to try to ‘get at’ one and the same phenomenon.

“Naturalization” of concepts thus turns out to come in two kinds of inference rather than one. “Upwards” naturalizations, let us say, string a connection from more microscopic theories to more macroscopic concepts. “Downwards” naturalizations, the traditional mode of intertheoretic reduction, connect existing macroscopic concepts and theories to more microscopic theories, exploiting the thoroughness and simplicity of the microscopic theory to provide a well-informed inductive bias to the more macroscopic theory. This inductive bias embodies what we learned, as we developed the microscopic theory, about all the observables we used to learn that theory. We can thus see that both kinds of naturalizations connect our concepts and theories to additional observable variables, thus enabling quicker and more accurate inductive training.

In combination with causal-role concepts and theories thereof, this all comes back to Churchland’s defense of the thesis that abstract objects and properties are both real and natural. The greater the degree of unity we attain in our hierarchical forests of abstract concepts and theories, the more we can justify those abstractions by reference to their role in successful causal description of concrete observations, rather than by abstracted argumentation. The more we naturalize our concepts, the more we feel licensed by Indespensability Arguments to call them real abstract universals (or at least, real abstract generalities of the neighborhood of reality we happen to live in), despite their being mere inferred theories bound ultimately to empirical data[15].

Certain naive forms of scientific realism would thus say that we are thus, through our scientific progress, coming to understand reality on a single, supreme, fundamental level. Churchland disagrees, and I concur with his disagreement.

That our sundry overlapping maps frequently enjoy adjustments that bring them into increasing conformity with one another (even as their predictive accuracy continues to increase) need not mean that there is some Ur map toward which all are tending.

To the contrary, a single Ur-map would be an extremely high-dimensional model, would require an extremely large amount of data to train, and would carry an extraordinarily large chance of overfitting after we had trained it. Entailment preorders of maps compress and represent experiential data far more efficiently than a single Ur-map, even if we know there exists a single underlying objective reality. In fact, we might often possess multiple maps of similar, or even identical, objective domains:

Two maps can differ substantially from each other, and yet still be, both of them, highly accurate maps of the same objective reality. For they may be focusing on distinct aspects or orthogonal dimensions of that shared reality. Reality, after all, is spectacularly complex, and it is asking too much of any given map that it capture all of reality (see, e.g., Giere 2006).

Churchland emphasizes that the final emphasis must be on empiricism and (sometimes counterfactual) observability:

What is important, for any map to be taken seriously as a representation of reality, is that somehow or other, however indirectly, it is possible to index it. …So long as every aspect of reality is somehow in causal interaction with the rest of reality, then every aspect of reality is, in principle, at least accessible to critical cognitive activity. Nothing guarantees that we will succeed in getting a grip on any given aspect. But nothing precludes it, either.

Churchland is, of course, reciting the naturalist creed by stating that “every aspect of reality is somehow in causal interaction with the rest of reality” (or at least, it was in its past or will be in its future). This is a bullet both he and I can gladly bite, however. I can also add that since Second-Level Learning enables us to cohere our concepts into vast, inter-related preorders over time, it also enables us to gain increasing certainty about which conceptual maps refer to real abstract objects (optimal generalizations of properties of other maps), real concrete objects (which participate directly in causality), and apparent objects actually derived from erroneous inferences. As we learn more and integrate our concepts, real concrete and abstract objects come to be tied together, whereas unreal concrete objects (like superstitions) or abstract objects (like false philosophical frameworks) come to be increasingly isolated in our framework of maps of the world. A more integrated, naturalistic explanation for the experiential phenomena which originally gave birth to a model of unreal concrete or abstract objects can, if we allow ourselves to admit it into our worldview, clear up the experiential confusion and clear away the “zombie concepts”.

 

Part IV
Third-Level Learning: Cultural Progress

In the third major part of the book, although the shortest, we finally arrive to the domain of learning and thought in which we deal exclusively with human beings communicating via language. Churchland opens the chapter almost apologetically:

The reader will have noticed, in all of the preceding chapters, a firm skepticism concerning the role or significance of linguaformal structures in the business of both learning and deploying a conceptual framework. This skepticism goes back almost four decades …. In the intervening period, my skepticism on this point has only expanded and deepened, as the developments - positive and negative - in cognitive psychology, classical AI, and the several neurosciences gave both empirical and theoretical substance to those skeptical worries. As I saw these developments, they signaled the need to jettison the traditional epistemological playing field of accepted or rejected sentences, and the dynamics of logical or probabilistic inference that typically went with it.

Unfortunately, this statement appears to ignore the close links between probabilistic inference and the entire rest of statistical learning theory, including the neural networks that form the foundation for Churchland’s theory of cognition in the First-Level Learning chapters. Alas.

Still, Churchland’s skepticism regarding the “language of thought” hypothesis makes a great deal of intuitive sense. It takes thorough study to learn the difference between formal systems (sets of axioms demonstrated to have a model) from the foundations of mathematics, and formal languages (notations for computations) in the science of computing, although Douglas Hofstadter did write the world’s premier “pop comp-sci” text on exactly that matter[11]. Furthermore, any given spoken or written sentence, in formal or informal language, contains fairly little communicable information relative to the size of an entire mental model of a relevant domain, as Churchland has spotted:

We must doubt this [sentential] perspective, indeed, we must reject it, because theories themselves are not sets of sentences at all. Sentences are just the comparatively low-dimensional public stand-ins that allow us to make rough mutual coordinations of our endlessly idiosyncratic conceptual frameworks or theories, so that we can more effectively apply them and evaluate them.

Unlike in much of analytic philosophy, the science of computing takes programs and programming languages to simply be different ways of writing down calculations, to the point that the field of denotational semantics for programming remains relatively small relative to the study of proving which computations the program carries out. A hypothesis regarding neurocomputation which can explain how learning and commonsense reasoning take place would apply, via the Church-Turing Thesis, to neural nets as well as Turing machines.

Third-Level Learning is perhaps a misnomer, since as far as I know, it does not actually come third in any particular causal or historical ordering. After all, humans communicated ideas, and thus carried out Third-Level Learning, long before we ever engaged seriously in reductionist science, and if standardized test scores show anything at all, they surely show that our societies have invented sophisticated systems devoted to ensuring that existing ideas are passed down to children as-is. In fact, the educational system often performs quite reliably, in the sense that the children consistently pass their exams, even if we all ritually lament the failure to pass down the true understanding and clarity once achieved by discoverers, inventors, and teachers. Such true understanding, Churchland would say, involves a high-dimensional conceptual map sculpted by large sums of experiential data. Perhaps we indeed ought to pessimistically expect that such high-dimensional understanding cannot be passed down accurately, even though teaching is a well-developed science (albeit, one prone to fads whose occasional serious results are also often ignored in favor of “how it’s always been done” or “the strong students will survive”). After all, as Churchland says:

[W]e have no public access to those raw sensory activation patterns [which sculpted our conceptual frameworks], as such.

Third-Level Learning, then, consists in using a Map-Portrayal Semantics for language (and other forms of human communication) to pass down maps that, according to the Domain Portrayal Semantics Churchland posits, accurately portray some piece of local reality. It may come before or after Second-Level Learning in our history, but it surely occurs. By means of evocative and descriptive language, human beings can index each-other’s maps and even, through carefully chosen series of evocations, describe their conceptual maps to each-other. Although other vocalizing species - such as wolves, nonhuman great apes, and some marine mammals - display the former ability to signal to each-other with sound, humans are exclusive in having the latter ability: to systematically educate each-other, passing on whole conceptual frameworks from their original discoverers to vast social peer-groups. By this means, human intellectual life surpasses the individual human:

While the collective cognitive process steadily loses some of its participants to old age, and adds fresh participants in the form of newborns, the collective process itself is now effectively immortal. It faces no inevitable termination.

One might think that little can be said about education by someone other than a professional expert on education, but Churchland does have an important point to make in describing Third-Level Learning: it is a form of learning, not a form of something other than learning. In particular, he explicitly criticizes the “memetic” theory of cultural “evolution”, for attempting to ground culture in Darwinist principles without making any reference to such obvious participants in culture as the mind and brain:

The dynamical parallels between a virus-type and a theory-type are pretty thin. …Dawkins’ story, though novel and agreeably naturalistic, once again attempts, like so many other accounts before it, to characterize the essential nature of the scientific enterprise without making any reference to the unique kinematics and dynamics of the brain.

Similarly, no account of science or rationality that confines itself to social-level mechanisms alone will ever get to the heart of that matter. For that, the microstructure of the brain and the nature of its microactivities are also uniquely essential.

Churchland also notes that reasoning can work, even when individual reasoners don’t quite understand how or why they reason, as in the case of scientists with too little knowledge of methodology:

For the scientists themselves may indeed be confabulating their explanations within a methodological framework that positively misrepresents the real causal factors and the real dynamics of their own cognitive behaviors.

In fact, he even demands that we account for the Third-Level Learning and reasoning of others in such “unclean” fields as politics:

For better or for worse, the moral convictions of those agents will play a major role in determining their voting behavior. To be sure, one may be deeply skeptical of the moral convictions of the citizens, or the senators, involved. Indeed, one may reject those convictions entirely, on the grounds that they presuppose some irrational religion, for example. But it would be foolish to make a policy of systematically ignoring those assembled moral convictions (even if they are dubious), if one wants to understand the voting behavior of the individuals involved.

Churchland also notes how successful Third-Level Learning ultimately requires engaging, sometimes, in successful Second-Level Learning, attributed to Kuhnian “paradigm shifts”:

As we have seen, Kuhn describes such periods of turmoil as ‘crisis science,’ and he explains in some illustrative detail how the normal pursuit of scientific inquiry is rather well-designed to produce such crises, sooner or later. I am compelled to agree with his portrayal, for, on the present account, ‘normal science,’ as discussed at length by Kuhn, just is the process of trying to navigate new territory under the guidance of an existing map, however rough, and of trying to articulate its often vague outlines and to fill in its missing details as the exploration proceeds.

He then ends the book on a positive note:

All told, the metabolisms of humans are wrapped in the benign embrace of an interlocking system of mechanisms that help to sustain, regulate, and amplify their (one hopes) healthy activities, just as the cognitive organs of humans are wrapped in the benign embrace of an interlocking system of mechanisms that help to sustain, regulate, and amplify their (one hopes) rational activities.

Unfortunately, I do feel that this “up-ending” opens Churchland to a substantive criticism, namely: he has failed to address anything outside the sciences. Since most actually existing humans are neither scientists nor science hobbyists, one would think that a book about the brain would bother to address the vast domains of human life outside the halls of academic science, lest one be reminded of Professor Smith in Piled Higher and Deeper justifying the professorial career pyramid just by making everything outside academic science sound scary.

I suppose that Churchland’s own career and position as a philosopher of mind and science led him to write as chiefly addressing domains he thoroughly understands, but I, at least, think his core thesis draws strength from its potential applications outside those domains. If Churchland, and much other literature, can explain a naturalistic theory of how the brain comes to understand abstract, immaterial objects and properties in such domains as science and mathematics, then why not in, say, aesthetics, ethics, or the emotional life? Among the first abstract properties posited at the beginnings of any human culture are beauty and goodness, among the first abstract objects, the soul. It may sound suddenly religious to speak of the soul when talking about science and statistical modelling, but eliminativism on these “soulful” objects and properties has always stood as the largest bullet for naturalists to bite. Having a constructive-naturalist theory to apply to “soulful” subjects of inquiry could turn the bitter bullet into a harmless sugar pill.

Churchland also spent an entire book talking about the brain without ever once mentioning subjective consciousness/experience, for reasons of, I suspect, the same sort of greedy eliminativism.

However, that might just mean I need to put both Churchland’s earlier work - like Matter and Consciousness[4], Engine of Reason, Seat of the Soul[5], and Patricia Churchland’s Braintrust[2] - on my reading list to see what they have to say on such subjects.

 

References

 

[1]   Alonzo Church. An unsolvable problem of elementary number theory. American Journal of Mathematics, 58(2):345–363, April 1936.

[2]   Patricia Smith. Churchland. Braintrust: What Neuroscience Tells Us about Morality. Princeton University Press, Princeton, N.J., 2011.

[3]   Paul Churchland. Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals. MIT Press, 2012.

[4]   Paul Churchland. Matter and Consciousness. MIT Press, Cambridge, 2013.

[5]   Paul M. Churchland. The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain. MIT Press, Cambridge, 1995.

[6]   C. E. Freer, D. M. Roy, and J. B. Tenenbaum. Towards common-sense reasoning via conditional simulation: Legacies of Turing in Artificial Intelligence. Turing’s Legacy (ASL Lecture Notes in Logic), 2012.

[7]   N. D. Goodman, T. D. Ullman, , and J. B. Tenenbaum. Learning a theory of causality. Psychological review, 2011.

[8]   Noah D Goodman, Joshua B Tenenbaum, and T Gerstenberg. Concepts in a probabilistic language of thought. MIT Press, 2015.

[9]   T. L. Griffiths, F. Lieder, and N. D. Goodman. Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, To appear.

[10]   Peter D. Grünwald and Paul M. B. Vitányi. Algorithmic information theory. CoRR, abs/0809.2754, 2008.

[11]   Douglas R. Hofstadter. Godel, Escher, Bach: An Eternal Golden Braid. Basic Books, Inc., New York, NY, USA, 1979.

[12]   Marcus Hutter. Universal algorithmic intelligence: A mathematical top-down approach. In B. Goertzel and C. Pennachin, editors, Artificial General Intelligence, Cognitive Technologies, pages 227–290. Springer, Berlin, 2007.

[13]   Milad Kharratzadeh and Thomas Shultz. Neural implementation of probabilistic models of cognition.

[14]   John C. Kieffer. A tutorial on hierarchical lossless data compression. In Moshe Dror, Pierre L’Ecuyer, and Ferenc Szidarovszky, editors, Modeling Uncertainty, volume 46 of International Series in Operations Research & Management Science, pages 711–733. Springer US, 2005.

[15]   David Liggins. Quine, Putnam, and the ‘Quine-Putnam’ indispensability argument. Erkenntnis (1975-), 68(1):pp. 113–127, 2008.

[16]   Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, and James P. Sethna. Parameter space compression underlies emergent theories and predictive models. Science, 342(6158):604–607, 2013.

[17]   Thomas L. Griffiths Noah D. Goodman, Joshua B. Tenenbaum and Jacob Feldman. Compositionality in rational analysis: Grammar-based induction for concept learning. In Nick Chater and Mike Oaksford, editors, The Probabilistic Mind: Prospects for Bayesian Cognitive Science. Oup Oxford, 2008.

[18]   Ruslan Salakhutdinov, Joshua B. Tenenbaum, and Antonio Torralba. One-shot learning with a hierarchical nonparametric bayesian model. Journal of Machine Learning Research - Proceedings Track, 27:195–206, 2012.

[19]   Lei Shi and Thomas L. Griffiths. Neural implementation of hierarchical bayesian inference by importance sampling. In Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1669–1677. Curran Associates, Inc., 2009.

Rationality Reading Group: Part D: Mysterious Answers

9 Gram_Stone 02 July 2015 01:55AM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.


Welcome to the Rationality reading group. This week we discuss Part D: Mysterious Answers (pp. 117-191)This post summarizes each article of the sequence, linking to the original LessWrong post where available.

C. Noticing Confusion

30. Fake Explanations - People think that fake explanations use words like "magic," while real explanations use scientific words like "heat conduction." But being a real explanation isn't a matter of literary genre. Scientific-sounding words aren't enough. Real explanations constrain anticipation. Ideally, you could explain only the observations that actually happened. Fake explanations could just as well "explain" the opposite of what you observed.

31. Guessing the Teacher's Password - In schools, "education" often consists of having students memorize answers to specific questions (i.e., the "teacher's password"), rather than learning a predictive model that says what is and isn't likely to happen. Thus, students incorrectly learn to guess at passwords in the face of strange observations rather than admit their confusion. Don't do that: any explanation you give should have a predictive model behind it. If your explanation lacks such a model, start from a recognition of your own confusion and surprise at seeing the result.

32. Science as Attire - You don't understand the phrase "because of evolution" unless it constrains your anticipations. Otherwise, you are using it as attire to identify yourself with the "scientific" tribe. Similarly, it isn't scientific to reject strongly superhuman AI only because it sounds like science fiction. A scientific rejection would require a theoretical model that bounds possible intelligences. If your proud beliefs don't constrain anticipation, they are probably just passwords or attire.

33. Fake Causality - It is very easy for a human being to think that a theory predicts a phenomenon, when in fact is was fitted to a phenomenon. Properly designed reasoning systems (GAIs) would be able to avoid this mistake with our knowledge of probability theory, but humans have to write down a prediction in advance in order to ensure that our reasoning about causality is correct.

34. Semantic Stopsigns - There are certain words and phrases that act as "stopsigns" to thinking. They aren't actually explanations, or help to resolve the actual issue at hand, but they act as a marker saying "don't ask any questions."

35. Mysterious Answers to Mysterious Questions - The theory of vitalism was developed before the idea of biochemistry. It stated that the mysterious properties of living matter, compared to nonliving matter, was due to an "elan vital". This explanation acts as a curiosity-stopper, and leaves the phenomenon just as mysterious and inexplicable as it was before the answer was given. It feels like an explanation, though it fails to constrain anticipation.

36. The Futility of Emergence - The theory of "emergence" has become very popular, but is just a mysterious answer to a mysterious question. After learning that a property is emergent, you aren't able to make any new predictions.

37. Say Not "Complexity" - The concept of complexity isn't meaningless, but too often people assume that adding complexity to a system they don't understand will improve it. If you don't know how to solve a problem, adding complexity won't help; better to say "I have no idea" than to say "complexity" and think you've reached an answer.

38. Positive Bias: Look into the Dark - Positive bias is the tendency to look for evidence that confirms a hypothesis, rather than disconfirming evidence.

39. Lawful Uncertainty - Facing a random scenario, the correct solution is really not to behave randomly. Faced with an irrational universe, throwing away your rationality won't help.

40. My Wild and Reckless Youth - Traditional rationality (without Bayes' Theorem) allows you to formulate hypotheses without a reason to prefer them to the status quo, as long as they are falsifiable. Even following all the rules of traditional rationality, you can waste a lot of time. It takes a lot of rationality to avoid making mistakes; a moderate level of rationality will just lead you to make new and different mistakes.

41. Failing to Learn from History - There are no inherently mysterious phenomena, but every phenomenon seems mysterious, right up until the moment that science explains it. It seems to us now that biology, chemistry, and astronomy are naturally the realm of science, but if we had lived through their discoveries, and watched them reduced from mysterious to mundane, we would be more reluctant to believe the next phenomenon is inherently mysterious.

42. Making History Available - It's easy not to take the lessons of history seriously; our brains aren't well-equipped to translate dry facts into experiences. But imagine living through the whole of human history - imagine watching mysteries be explained, watching civilizations rise and fall, being surprised over and over again - and you'll be less shocked by the strangeness of the next era.

43. Explain/Worship/Ignore? - When you encounter something you don't understand, you have three options: to seek an explanation, knowing that that explanation will itself require an explanation; to avoid thinking about the mystery at all; or to embrace the mysteriousness of the world and worship your confusion.

44. "Science" as Curiosity-Stopper - Although science does have explanations for phenomena, it is not enough to simply say that "Science!" is responsible for how something works -- nor is it enough to appeal to something more specific like "electricity" or "conduction". Yet for many people, simply noting that "Science has an answer" is enough to make them no longer curious about how it works. In that respect, "Science" is no different from more blatant curiosity-stoppers like "God did it!" But you shouldn't let your interest die simply because someone else knows the answer (which is a rather strange heuristic anyway): You should only be satisfied with a predictive model, and how a given phenomenon fits into that model.

45. Truly Part of You - Any time you believe you've learned something, you should ask yourself, "Could I re-generate this knowledge if it were somehow deleted from my mind, and how would I do so?" If the supposed knowledge is just empty buzzwords, you will recognize that you can't, and therefore that you haven't learned anything. But if it's an actual model of reality, this method will reinforce how the knowledge is entangled with the rest of the world, enabling you to apply it to other domains, and know when you need to update those beliefs. It will have become "truly part of you", growing and changing with the rest of your knowledge.

Interlude: The Simple Truth



This has been a collection of notes on the assigned sequence for this week. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part E: Overly Convenient Excuses (pp. 211-252). The discussion will go live on Wednesday, 15 July 2015 at or around 6 p.m. PDT, right here on the discussion forum of LessWrong.

View more: Next