"Flinching away from truth” is often about *protecting* the epistemology

AnnaSalamon

There’s a story I like, about this little kid who wants to be a writer. So she writes a story and shows it to her teacher.

“You misspelt the word ‘ocean’”, says the teacher.

“No I didn’t!”, says the kid.

The teacher looks a bit apologetic, but persists: “‘Ocean’ is spelt with a ‘c’ rather than an ‘sh’; this makes sense, because the ‘e’ after the ‘c’ changes its sound…”

“No I didn’t!” interrupts the kid.

“Look,” says the teacher, “I get it that it hurts to notice mistakes. But that which can be destroyed by the truth should be! You did, in fact, misspell the word ‘ocean’.”

“I did not!” says the kid, whereupon she bursts into tears, and runs away and hides in the closet, repeating again and again: “I did not misspell the word! I can too be a writer!”.

I like to imagine the inside of the kid’s head as containing a single bucket that houses three different variables that are initially all stuck together:

Original state of the kid's head:

The goal, if one is seeking actual true beliefs, is to separate out each of these variables into its own separate bucket, so that the “is ‘oshun’ spelt correctly?” variable can update to the accurate state of "no", without simultaneously forcing the "Am I allowed to pursue my writing ambition?" variable to update to the inaccurate state of "no".

Desirable state (requires somehow acquiring more buckets):

The trouble is, the kid won’t necessarily acquire enough buckets by trying to “grit her teeth and look at the painful thing”. A naive attempt to "just refrain from flinching away, and form true beliefs, however painful" risks introducing a more important error than her current spelling error: mistakenly believing she must stop working toward being a writer, since the bitter truth is that she spelled 'oshun' incorrectly.

State the kid might accidentally land in, if she naively tries to "face the truth":

(You might take a moment, right now, to name the cognitive ritual the kid in the story *should* do (if only she knew the ritual). Or to name what you think you'd do if you found yourself in the kid's situation -- and how you would notice that you were at risk of a "buckets error".)

More examples:

It seems to me that bucket errors are actually pretty common, and that many (most?) mental flinches are in some sense attempts to avoid bucket errors. The following examples are slightly-fictionalized composites of things I suspect happen a lot (except the "me" ones; those are just literally real):

Diet: Adam is on a diet with the intent to lose weight. Betty starts to tell him about some studies suggesting that the diet he is on may cause health problems. Adam complains: “Don’t tell me this! I need to stay motivated!”

One interpretation, as diagramed above: Adam is at risk of accidentally equating the two variables, and accidentally *assuming* that the studies imply that the diet must stop being viscerally motivating. He semi-consciously perceives that this risks error, and so objects to having the information come in and potentially force the error.

Pizza purchase: I was trying to save money. But I also wanted pizza. So I found myself tempted to buy the pizza *really quickly* so that I wouldn't be able to notice that it would cost money (and, thus, so I would be able to buy the pizza):

On this narration: It wasn't *necessarily* a mistake to buy pizza today. Part of me correctly perceived this "not necessarily a mistake to buy pizza" state. Part of me also expected that the rest of me wouldn't perceive this, and that, if I started thinking it through, I might get locked into the no-pizza state even if pizza was better. So it tried to 'help' by buying the pizza *really quickly, before I could think and get it wrong*. [1]

On the particular occasion about the pizza (which happened in 2008, around the time I began reading Eliezer's LW Sequences), I actually managed to notice that the "rush to buy the pizza before I could think" process was going on. So I tried promising myself that, if I still wanted the pizza after thinking it through, I would get the pizza. My resistance to thinking it through vanished immediately. [2]

To briefly give several more examples, without diagrams (you might see if you can visualize how a buckets diagram might go in these):

Carol is afraid to notice a potential flaw in her startup, lest she lose the ability to try full force on it.
Don finds himself reluctant to question his belief in God, lest he be forced to conclude that there's no point to morality.
As a child, I was afraid to allow myself to actually consider giving some of my allowance to poor people, even though part of me wanted to do so. My fear I was that if I allowed the "maybe you should give away your money, because maybe everyone matters evenly and you should be consequentialist" theory to fully boot up in my head, I would end up having to give away *all* my money, which seemed bad.
Eleanore believes there is no important existential risk, and is reluctant to think through whether that might not be true, in case it ends up hijacking her whole life.
Fred does not want to notice how much smarter he is than most of his classmates, lest he stop respecting them and treating them well.
Gina has mixed feelings about pursuing money -- she mostly avoids it -- because she wants to remain a "caring person", and she has a feeling that becoming strategic about money would somehow involve giving up on that.

It seems to me that in each of these cases, the person has an arguably worthwhile goal that they might somehow lose track of (or might accidentally lose the ability to act on) if they think some *other* matter through -- arguably because of a deficiency of mental "buckets".

Moreover, "buckets errors" aren't just thingies that affect thinking in prospect -- they also get actually made in real life. It seems to me that one rather often runs into adults who decided they weren't allowed to like math after failing a quiz in 2nd grade; or who gave up on meaning for a couple years after losing their religion; or who otherwise make some sort of vital "buckets error" that distorts a good chunk of their lives. Although of course this is mostly guesswork, and it is hard to know actual causality.

How I try to avoid "buckets errors":

I basically just try to do the "obvious" thing: when I notice I'm averse to taking in "accurate" information, I ask myself what would be bad about taking in that information.[3] Usually, I get a concrete answer, like "If I noticed I could've saved all that time, I'll have to feel bad", or "if AI timelines are maybe-near, then I'd have to rethink all my plans", or what have you.

Then, I remember that I can consider each variable separately. For example, I can think about whether AI timelines are maybe-near; and if they are, I can always decide to not-rethink my plans anyhow, if that's actually better. I mentally list out all the decisions that *don't* need to be simultaneously forced by the info; and I promise myself that I can take the time to get these other decisions not-wrong, even after considering the new info.

Finally, I check to see if taking in the information is still aversive. If it is, I keep trying to disassemble the aversiveness into component lego blocks until it isn't. Once it isn't aversive, I go ahead and think it through bit by bit, like with the pizza.

This is a change from how I used to think about flinches: I used to be moralistic, and to feel disapproval when I noticed a flinch, and to assume the flinch had no positive purpose. I therefore used to try to just grit my teeth and think about the painful thing, without first "factoring" the "purposes" of the flinch, as I do now. But I think my new ritual is better, at least now that I have enough introspective skill that I can generally finish this procedure in finite time, and can still end up going forth and taking in the info a few minutes later.

(Eliezer once described what I take to be the a similar ritual for avoiding bucket errors, as follows: When deciding which apartment to rent (he said), one should first do out the math, and estimate the number of dollars each would cost, the number of minutes of commute time times the rate at which one values one's time, and so on. But at the end of the day, if the math says the wrong thing, one should do the right thing anyway.)

[1]: As an analogy: sometimes, while programming, I've had the experience of:

Writing a program I think is maybe-correct;
Inputting 0 as a test-case, and knowing ahead of time that the output should be, say, “7”;
Seeing instead that the output was “5”; and
Being really tempted to just add a “+2” into the program, so that this case will be right.

This edit is the wrong move, but not because of what it does to MyProgram(0) — MyProgram(0) really is right. It’s the wrong move because it maybe messes up the program’s *other* outputs.

Similarly, changing up my beliefs about how my finances should work in order to get a pizza on a day when I want one *might* help with getting the right answer today about the pizza — it isn’t clear — but it’d risk messing up other, future decisions.

The problem with rationalization and mental flinches, IMO, isn’t so much the “intended” action that the rationalization or flinch accomplishes in the moment, but the mess it leaves of the code afterward.

[2] To be a bit more nitpicky about this: the principle I go for in such cases isn’t actually “after thinking it through, do the best thing”. It’s more like “after thinking it through, do the thing that, if reliably allowed to be the decision-criterion, will allow information to flow freely within my head”.

The idea here is that my brain is sometimes motivated to achieve certain things; and if I don’t allow that attempted achievement to occur in plain sight, I incentivize my brain to sneak around behind my back and twist up my code base in an attempt to achieve those things. So, I try not to do that.

This is one reason it seems bad to me when people try to take “maximize all human well-being, added evenly across people, without taking myself or my loved ones as special” as their goal. (Or any other fake utility function.)

[3] To describe this "asking" process more concretely: I sometimes do this as follows: I concretely visualize a 'magic button' that will cause me to take in the information. I reach toward the button, and tell my brain I'm really going to press it when I finish counting down, unless there are any objections ("3... 2... no objections, right?... 1..."). Usually I then get a bit of an answer — a brief flash of worry, or a word or image or association.

Sometimes the thing I get is already clear, like “if I actually did the forms wrong, and I notice, I’ll have to redo them”. Then all I need to do is separate it into buckets (“How about if I figure out whether I did them wrong, and then, if I don’t want to redo them, I can always just not?”).

Other times, what I get is more like quick nonverbal flash, or a feeling of aversion without knowing why. In such cases, I try to keep “feeling near” the aversion. I might for example try thinking of different guesses (“Is it that I’d have to redo the forms?… no… Is it that it’d be embarrassing?… no…”). The idea here is to see if any of the guesses “resonate” a bit, or cause the feeling of aversiveness to become temporarily a bit more vivid-feeling.

For a more detailed version of these instructions, and more thoughts on how to avoid bucket errors in general (under different terminology), you might want to check out Eugene Gendlin’s audiobook “Focusing”.

There’s a story I like, about this little kid who wants to be a writer. So she writes a story and shows it to her teacher.

“You misspelt the word ‘ocean’”, says the teacher.

“No I didn’t!”, says the kid.

The teacher looks a bit apologetic, but persists: “‘Ocean’ is spelt with a ‘c’ rather than an ‘sh’; this makes sense, because the ‘e’ after the ‘c’ changes its sound…”

“No I didn’t!” interrupts the kid.

“Look,” says the teacher, “I get it that it hurts to notice mistakes. But that which can be destroyed by the truth should be! You did, in fact, misspell the word ‘ocean’.”

“I did not!” says the kid, whereupon she bursts into tears, and runs away and hides in the closet, repeating again and again: “I did not misspell the word! I can too be a writer!”.

I like to imagine the inside of the kid’s head as containing a single bucket that houses three different variables that are initially all stuck together:

Original state of the kid's head:

Desirable state (requires somehow acquiring more buckets):

State the kid might accidentally land in, if she naively tries to "face the truth":

More examples:

To briefly give several more examples, without diagrams (you might see if you can visualize how a buckets diagram might go in these):

Carol is afraid to notice a potential flaw in her startup, lest she lose the ability to try full force on it.
Don finds himself reluctant to question his belief in God, lest he be forced to conclude that there's no point to morality.
As a child, I was afraid to allow myself to actually consider giving some of my allowance to poor people, even though part of me wanted to do so. My fear I was that if I allowed the "maybe you should give away your money, because maybe everyone matters evenly and you should be consequentialist" theory to fully boot up in my head, I would end up having to give away *all* my money, which seemed bad.
Eleanore believes there is no important existential risk, and is reluctant to think through whether that might not be true, in case it ends up hijacking her whole life.
Fred does not want to notice how much smarter he is than most of his classmates, lest he stop respecting them and treating them well.
Gina has mixed feelings about pursuing money -- she mostly avoids it -- because she wants to remain a "caring person", and she has a feeling that becoming strategic about money would somehow involve giving up on that.

How I try to avoid "buckets errors":

[1]: As an analogy: sometimes, while programming, I've had the experience of:

Writing a program I think is maybe-correct;
Inputting 0 as a test-case, and knowing ahead of time that the output should be, say, “7”;
Seeing instead that the output was “5”; and
Being really tempted to just add a “+2” into the program, so that this case will be right.

This edit is the wrong move, but not because of what it does to MyProgram(0) — MyProgram(0) really is right. It’s the wrong move because it maybe messes up the program’s *other* outputs.

Qiaochu_Yuan9y140

The bucket diagrams are too coarse, I think; they don't keep track of what's causing what and in what direction. That makes it harder to know what causal aliefs to inspect. And when you ask yourself questions like "what would be bad about knowing X?" you usually already get the answer in the form of a causal alief: "because then Y." So the information's already there; why not encode it in your diagram?

LawrenceC9y30

Fair point.

244

"Flinching away from truth” is often about protecting the epistemology

244

More examples:

How I try to avoid "buckets errors":

244

244

"Flinching away from truth” is often about protecting the epistemology

244

More examples:

How I try to avoid "buckets errors":

244