Related to: Humans are not automatically strategic, The mystery of the haunted rationalist, Striving to accept, Taking ideas seriously
I argue that many techniques for epistemic rationality, as taught on LW, amount to techniques for reducing compartmentalization. I argue further that when these same techniques are extended to a larger portion of the mind, they boost instrumental, as well as epistemic, rationality.
Imagine trying to design an intelligent mind.
One problem you’d face is designing its goal.
Every time you designed a goal-indicator, the mind would increase action patterns that hit that indicator[1]. Amongst these reinforced actions would be “wireheading patterns” that fooled the indicator but did not hit your intended goal. For example, if your creature gains reward from internal indicators of status, it will increase those indicators -- including by such methods as surrounding itself with people who agree with it, or convincing itself that it understood important matters others had missed. It would be hard-wired to act as though “believing makes it so”.
A second problem you’d face is propagating evidence. Whenever your creature encounters some new evidence E, you’ll want it to update its model of “events like E”. But how do you tell which events are “like E”? The soup of hypotheses, intuition-fragments, and other pieces of world-model is too large, and its processing too limited, to update each belief after each piece of evidence. Even absent wireheading-driven tendencies to keep rewarding beliefs isolated from threatening evidence, you’ll probably have trouble with accidental compartmentalization (where the creature doesn’t update relevant beliefs simply because your heuristics for what to update were imperfect).
Evolution, AFAICT, faced just these problems. The result is a familiar set of rationality gaps:
I. Accidental compartmentalization
a. Belief compartmentalization: We often fail to propagate changes to our abstract beliefs (and we often make predictions using un-updated, specialized components of our soup of world-model). Thus, learning modus tolens in the abstract doesn’t automatically change your answer to the Wason card test. Learning about conservation of energy doesn’t automatically change your fear when a bowling ball is hurtling toward you. Understanding there aren’t ghosts doesn’t automatically change your anticipations in a haunted house. (See Will's excellent post Taking ideas seriously for further discussion).
b. Goal compartmentalization: We often fail to propagate information about what “losing weight”, “being a skilled thinker”, or other goals would concretely do for us. We also fail to propagate information about what specific actions could further these goals. Thus (absent the concrete visualizations recommended in many self-help books) our goals fail to pull our behavior, because although we verbally know the consequences of our actions, we don’t visualize those consequences on the “near-mode” level that prompts emotions and actions.
c. Failure to flush garbage: We often continue to work toward a subgoal that no longer serves our actual goal (creating what Eliezer calls a lost purpose). Similarly, we often continue to discuss, and care about, concepts that have lost all their moorings in anticipated sense-experience.
II. Reinforced compartmentalization:
Type 1: Distorted reward signals. If X is a reinforced goal-indicator (“I have status”; “my mother approves of me”[2]), thinking patterns that bias us toward X will be reinforced. We will learn to compartmentalize away anti-X information.
The problem is not just conscious wishful thinking; it is a sphexish, half-alien mind that distorts your beliefs by reinforcing motives, angles or approach or analysis, choices of reading material or discussion partners, etc. so as to bias you toward X, and to compartmentalize away anti-X information.
Impairment to epistemic rationality:
- “[complex reasoning]... and so my past views are correct!” (if I value “having accurate views”, and so I’m reinforced for believing my views accurate)
- “... and so my latest original theory is important and worth focusing my career on!” (if I value “doing high-quality research”)
- “... and so the optimal way to contribute to the world, is for me to continue in exactly my present career...” (if I value both my present career and “being a utilitarian”)
- “... and so my friends’ politics is correct.” (if I have value both “telling the truth” and “being liked by my friends”)
Impairment to instrumental rationality:
- “... and so the two-fingered typing method I’ve used all my life is effective, and isn’t worth changing” (if I value “using effective methods” and/or avoiding difficulty)
- “... and so the argument was all his fault, and I was blameless” (if I value “treating my friends ethically”)
- “... and so it’s because they’re rotten people that they don’t like me, and there’s nothing I might want to change in my social habits.”
- “... and so I don’t care about dating anyhow, and I have no reason to risk approaching someone.”
Type 2: “Ugh fields”, or “no thought zones”. If we have a large amount of anti-X information cluttering up our brains, we may avoid thinking about X at all, since considering X tends to reduce compartmentalization and send us pain signals. Sometimes, this involves not-acting in entire domains of our lives, lest we be reminded of X.
Impairment to epistemic rationality:
- We find ourselves just not-thinking about our belief’s real weak points, until we’re worse at such thinking than an unbiased child.
- If we notice inconvenient possibilities, we just somehow don’t get around to following them up;
- If a subject is unusually difficult and confusing, we may either avoid thinking about it at all, or rush rapidly to a fake “solution”. (And the more pain we feel around not understanding it, e.g. because the subject is important to us, the more we we avoid thoughts that would make our non-knowledge salient).
Impairment to instrumental rationality:
- Many of us avoid learning new skills (e.g., taking a dance class, or practicing social banter), because practicing them reminds us of our non-competence, and sends pain signals.
- The longer we’ve avoided paying a bill, starting a piece of writing, cleaning out the garage, etc., the harder it may be to think about the task at all (if we feel pain about having avoided it);
- The more we care about our performance on a high-risk task, the harder it may be to start working on it (so that the highest value tasks, with the most uncertain outcomes, are those we leave to the last minute despite the expected impact of such procrastination);
- We may avoid making plans for death, disease, break-up, unemployment, or other unpleasant contingencies.
Type 3: Wireheading patterns that fill our lives, and prevent other thoughts and actions. [3]
Impairment to epistemic rationality:
- We often spend our thinking time rehearsing reasons why our beliefs are correct, or why our theories are interesting, instead of thinking new thoughts.
Impairment to instrumental rationality:
- We often take actions to signal to ourselves that we have particular goals, instead of acting to achieve those goals. For example, we may go through the motions of studying or working, and feel good about our diligence, while paying little attention to the results.
- We often take actions to signal to ourselves that we already have particular skills, instead of acting to acquire those skills. For example, we may prefer to play games against folks we often beat, request critiques from those likely to praise our abilities, rehearse yet more projects in our domains of existing strength, etc.
Strategies for reducing compartmentalization:
A huge portion of both Less Wrong and the self-help and business literatures amounts to techniques for integrating your thoughts -- for bringing your whole mind, with all your intelligence and energy, to bear on your problems. Many fall into the following categories, each of which boosts both epistemic and instrumental rationality:
1. Something to protect (or, as Napoleon Hill has it, definite major purpose[4]): Find an external goal that you care deeply about. Visualize the goal; remind yourself of what it can do for you; integrate the desire across your mind. Then, use your desire to achieve this goal, and your knowledge that actual inquiry and effective actions can help you achieve it, to reduce wireheading temptations.
2. Translate evidence, and goals, into terms that are easy to understand. It’s more painful to remember “Aunt Jane is dead” than “Aunt Jane passed away” because more of your brain understands the first sentence. Therefore use simple, concrete terms, whether you’re saying “Aunt Jane is dead” or “Damn, I don’t know calculus” or “Light bends when it hits water” or “I will earn a million dollars”. Work to update your whole web of beliefs and goals.
3. Reduce the emotional gradients that fuel wireheading. Leave yourself lines of retreat. Recite the litanies of Gendlin and Tarski; visualize their meaning, concretely, for the task or ugh field bending your thoughts. Think through the painful information; notice the expected update, so that you need not fear further thought. On your to-do list, write concrete "next actions", rather than vague goals with no clear steps, to make the list less scary.
4. Be aware of common patterns of wireheading or compartmentalization, such as failure to acknowledge sunk costs. Build habits, and perhaps identity, around correcting these patterns.
I suspect that if we follow up on these parallels, and learn strategies for decompartmentalizing not only our far-mode beliefs, but also our near-mode beliefs, our models of ourselves, our curiosity, and our near- and far-mode goals and emotions, we can create a more powerful rationality -- a rationality for the whole mind.
[1] Assuming it's a reinforcement learner, temporal difference learner, perceptual control system, or similar.
[2] We receive reward/pain not only from "primitive reinforcers" such as smiles, sugar, warmth, and the like, but also from many long-term predictors of those reinforcers (or predictors of predictors of those reinforcers, or...), such as one's LW karma score, one's number theory prowess, or a specific person's esteem. We probably wish to regard some of these learned reinforcers as part of our real preferences.
[3] Arguably, wireheading gives us fewer long-term reward signals than we would achieve from its absence. Why does it persist, then? I would guess that the answer is not so much hyperbolic discounting (although this does play a role) as local hill-climbing behavior; the simple, parallel systems that fuel most of our learning can't see how to get from "avoid thinking about my bill" to "genuinely relax, after paying my bill". You, though, can see such paths -- and if you search for such improvements and visualize the rewards, it may be easier to reduce wireheading.
[4] I'm not recommending Napoleon Hill. But even this unusually LW-unfriendly self-help book seems to get most points right, at least in the linked summary. You might try reading the summary as an exercise in recognizing mostly-accurate statements when expressed in the enemy's vocabulary.
In the spirit of learning and not wireheading, could a couple people for whom this post didn't work well explain what didn't work about it? A few folks praised it, but it seems to be getting less upvotes than other posts, and I'd love to figure out how to make posts that are widely useful.
Personally, I don't have the foundation in relevant knowledge to easily understand much of the post content, so I'm not qualified to vote on it one way or the other. I may come back later, when I do, and vote then.