(Brainstem, Neocortex) ≠ (Base Motivations, Honorable Motivations)

The neocortex (well, the telencephalon and thalamus, but let’s just call it “neocortex” for short) holds all of our world-knowledge, consciousness, intelligence, planning, reasoning, etc.
The brainstem (well, brainstem & hypothalamus, but I’ll just call it “brainstem” for short) is full of lots of circuits that say “eating chocolate is good”, “leaning over a precipice is scary”, etc. (It also regulates your heart-rate and so on.)

This idea has gotten a bad rap from its association with discredited “triune brain theory”. Relatedly, people persist in describing the brainstem with scientifically-inaccurate nicknames like “old brain”, or “lizard brain”, or “reptilian brain”, etc. (See my discussion here.) I’ve even heard the brainstem called “monkey brain”—as if monkeys didn’t have a neocortex?! (Elon Musk is guilty of this, see this interview - 14:45.)

Still, despite its bad rap, I very much like this idea, and am trying to salvage its reputation.

The trope I don’t like

There’s a trope that goes along with this idea in the popular imagination. It goes something like this:

The brainstem is the source of base and dishonorable goals like “I want to eat candy and watch TV”
The neocortex is the source of respectable and honorable goals like “I want to unravel the mysteries of the universe”, “I want to get off my lazy butt and go to the gym”, etc.

For example, a couple places where I’ve recently seen this trope are Jeff Hawkins’s recent book and an old LessWrong post The AI Alignment Problem Has Already Been Solved(?) Once.

I don't like this trope. I propose to throw into the garbage, right next to the “lizard brain” terminology.

That said, it's not pulled out of thin air. People got the idea from somewhere—it’s gesturing towards a real thing, and I will talk about what I think that is below.

Why I think this trope is basically wrong

I’m a big believer in within-lifetime reinforcement learning as one of the key drivers of cognition—see Big Picture of Phasic Dopamine. So maybe 5 times per second, you think a little thought or make a little plan or whatever (“I’m going to pick up this pencil”, “I’m going to rewrite this sentence”, etc. etc.), that thought / plan gets evaluated by various other parts of your brain, culminating in the brainstem, and the brainstem then issues a reward pertaining to that thought, and that reward drives both decision-making (e.g. you won't pick up the pencil if that plan is judged as bad) and gradual learning to think better thoughts in the future.

There’s no room in this RL-type process for “motivations that come from inside the learning algorithm”. Like, that’s just not where motivations come from! Motivations come from rewards. Rewards are calculated in the brainstem. It’s as simple as that.

The kernel of truth

I think this wrong idea is pointing at a real phenomenon, and I'm going to try to explain it.

1. The neocortex can issue “the same plan” (same sequence of motor actions) framed in many different ways, and those different framings will get different rewards from the brainstem.

The brainstem is judging thoughts / plans, not actual "futures". It’s not omniscient! It's judging a map, not a territory! So you can have different ways to think about the same plan which result in different brainstem rewards.

For example:

You can have a thought "I will go to the gym" which is negative-reward, and so you don't do it.
Or you can have a thought "I will go to the gym and thus be healthy" which mixes negative and positive aspects, so maybe it's positive-reward on net, and so you do it!

There are within-neocortex dynamics that determine which of those two options gets proposed to the brainstem.

…And therefore there’s a sense in which the neocortex “gets credit” for the fact that you do in fact go to the gym.

…As long as we don’t forget that it’s the brainstem that judges “I will go to the gym” as bad, and it’s the brainstem that judges “I will be healthy” as good, and it’s the brainstem that judges the combined thought “I will go to the gym and thus be healthy” as slightly-good, and thus it’s the brainstem that enables that plan to actually get executed.

2. Some plans are rejected by the brainstem, but they have a desirable aspect / consequence, and then the neocortex may be rewarded for dwelling on the desirable aspect, and may run a search algorithm for ways to make that aspect actually happen

Continuing with our example, let’s say the brainstem is getting signals that the body is tired, so any plan that involves exertion loses a lot of points. It loses so many points that the trick from above no longer works; the “I will go to the gym and thus be healthy” thought is now net-negative (compared to its alternatives). So I’m not going to the gym.

Nevertheless, the thought “I will be healthy” continues to be appealing. So the neocortex—ever the goal-seeking algorithm—keeps searching, trying to construct a plan that will result in “I will be healthy” actually happening.

Hmm, is there a way that I can get healthy while continuing to sit on the couch? Nope.

OK, what about piling more positive-reward thought pieces into the mix? How about this plan: “I will go to the gym, and thus be healthy, and also I’ll be the kind of person who follows through on my commitments, and also I’ll impress my friends with my rock-solid abs, etc. etc.” Again, it’s the same sequence of motor actions, but this thought is framed to make it even more appealing to the brainstem. Nope, the brainstem still says “all things considered, this is still a bad plan”. And I notice that I am still sitting on the couch.

Anyway, that neocortex search process that I'm talking about here gets mapped intuitively into “my neocortex wants to be healthy, but my brainstem wants to stay on the couch”.

That’s not literally true: my brainstem wants both things! My brainstem’s endorsement of “I want to be healthy” is critical here—it’s powering the ongoing neocortex search algorithm activity! (Again, in my view, the neocortex won’t take any actions or even think any thoughts without the brainstem rewarding it for doing so.) But it’s also my brainstem’s dislike of getting off the couch that is causing the search to come up empty.

3. This search algorithm tends to immediately “snuff itself out” unless the desirable aspect is “honorable” / appealing upon reflection / leading to desirable follow-on consequences

So far this seems to be unrelated to base vs noble motivations. Where does that come from?

Well, let’s try flipping it around. The reverse would be as follows:

My neocortex proposes “I will go to the gym and be healthy”, the brainstem endorses it and that’s now the plan. But the gym doesn’t open for 10 minutes, so I’m sitting and waiting. Now my neocortex thinks the thought “I won't go to the gym, I will stay on the couch instead”. That’s an appealing thought! But the brainstem rejects it as a plan, because its appeal is not strong enough to outweigh the appeal of “I will be healthy”. So the neocortex search algorithm whirrs into action! Is there a way to frame the thought “I will stay on the couch instead” so as to make it more appealing to the brainstem? How about this: “I will stay on the couch, and thus avoid the risk of dropping a heavy weight on my foot, and also I won’t have to talk to Tony at the gym entrance, man I hate that guy.” Nope, not good enough, the brainstem says I’m sticking with the original plan of going to the gym.

So this neocortex search algorithm should correspondingly get mapped intuitively into “my neocortex wants to stay on the couch, but my brainstem wants to be healthy”!! As above, that’s not a technically correct description, just an intuition. But clearly, it’s not impossible for this kind of thinking process to happen.

Still, I do think this case I just walked through is an unusual case. We are more likely to have our neocortical search algorithm searching for ways to do honorable things like go to the gym, not searching for ways to stay on the couch.

Why isn't it symmetric?

I think the difference is: some things just get more appealing when you think about them more, and others get less appealing.

More mechanistically, the neocortex works on a principle of “thoughts tend to trigger other thoughts”, and those “other thoughts” can be positive-reward or negative-reward. Remember, the brainstem is powering this search process; we’ll only keep searching as long as the brainstem is liking what it’s seeing. If the search algorithm winds up spawning a bunch of secondary thoughts that the brainstem hates, those secondary thoughts will snuff out the search algorithm itself. And we’ll start thinking about something else instead.

That’s a bit abstract. Let’s try an example.

Let’s say I’m in a hurry to leave for an appointment, but my neocortex entertains a glimmer of the idea “I will stop and eat candy before I go”. The brainstem says “Nope, candy is good, but being late is really bad, plan rejected.” So the neocortex search algorithm spins up: is there a way to eat candy without being late? What does that search entail? Well, it immediately brings to the forefront of your mind the idea “I will eat candy”, and then it tries to build an acceptable and plausible plan that incorporates that idea—like filling in the missing pieces of a puzzle. However, holding “I will eat candy” at the forefront of your mind spawns a bunch of negative follow-on thoughts like “…and thus I will break my promise to myself”, “…and thus I won’t fit in my pants”, etc. The brainstem gets a whiff of those thoughts and it snuffs out the search process itself. The brainstem no longer sees any benefit in thoughts that involve searching for a way to eat candy, so the neocortex stops doing that search.

By contrast, if your search algorithm is looking for a way to do something that’s appealing on reflection, i.e. something that's desirable and whose follow-on consequences are also desirable, then the search algorithm won’t immediately snuff itself out. It can keep running until success, or until frustration sets in.

So I think that when people introspect about times that they’ve been searching for a way to “make themselves do something”, they’ll typically come up with examples where the thing was honorable / appealing upon reflection / having desirable consequences, since those are the searches that last for more than a fraction of a second.

3A. An important special case: self-reflective thoughts

When I talk about plans with “desirable consequences”, you’ll probably think of normal, causal consequences, like “I will go to the gym” → “I will be healthy”. That's one possibility, but another important case is "consequences for how we see ourselves, how others see us, how we present ourselves, etc.". For example, if you’re committed to dieting, then it’s possible that:

The idea of “eating candy” is appealing
The idea of “myself eating candy” is aversive

Get it? The first one involves taking certain actions, tasting certain tastes, etc. The second one involves the abstract idea that I will have eaten candy, and I’ll remember having done it, and when people ask me I’ll have to tell them about it, or lie.

The second thing is a consequence of the first thing. If I eat candy, then I will have eaten candy! So just as above, the desirability of self-reflective thoughts can determine whether the neocortex search algorithm works long and hard on finding a way to make a plan that passes muster with the brainstem, or whether the search algorithm immediately snuffs itself out.

I think that we have a lot of strong motivations about the kind of person we want to be. (…And the brainstem turns these motivations into thoughts and actions, just like every other motivation.) I think this is an important component of our social instincts.

So this gets back to “honorable motivations”, here in the sense of “motivations that we’re proud to have”. When we want something that we associate with an “honorable motivation”, it’s much likelier that the neocortex will search for a way to make it happen despite internal (motivational) obstacles, rather than the search algorithm running for a fraction of a second and then snuffing itself out. Again, this leads to the (misleading) intuition that the neocortex is trying to make us "act honorably" over the protests of our brainstem.

4. Willpower, akrasia, and “guilt by association”

Let’s go back to the example from earlier:

A: “I will go to the gym” is aversive (negative-reward)

B: “I will be healthy” is attractive (positive-reward)

A+B: “I will go to the gym and thus be healthy” is net slightly attractive (slightly positive reward)

One aspect of this is that, by yoking together A+B, A has become more appealing. That’s what I was talking about above.

But there’s another aspect: by yoking together A+B, B has become less appealing. (“Ugh, I’m not so sure about “being healthy”, it’s pretty exhausting!”)

I think the supervised learning parts of our brain literally learn the pattern “being healthy leads to exertion”. (Related: this Scott Alexander post.)

This is a factor that constrains the ability of the neocortex search algorithm to successfully find strategies to frame inherently-unpleasant plans in a way that the brainstem finds net attractive.

Specifically, it’s not as easy as taking one strongly-motivating thought like “I want to follow through on my commitments”, and then attach that one thought to 500 different inherently-unpleasant tasks, one after another, and then we'll actually do all 500 of those things. Instead, each of those tasks drags down the concept of “I want to follow through on my commitments”, until that concept is so saddled with negative associations that it's no longer even able to escort itself through the brainstem, let alone anything else!

Obligatory side-notes

What about “the brainstem is stupid”?

I often talk about how the brainstem is stupid, it doesn’t know what’s going on in the world, etc. That seems inconsistent with me breezily talking about how the brainstem sees “going to the gym” as bad and “being healthy” as good—aren’t those kinda complex concepts? Well, my answer is that certain parts of the brain (agranular prefrontal and insular and cingulate cortex, ventral striatum, amygdala, hippocampus) use supervised learning to distill an arbitrary thought into a maybe dozens-of-dimensional vector space that the brainstem (and hypothalamus) can interpret. We can interpret this vector as answering questions like “If I do this plan, how appropriate would it be to cringe? To salivate? To release cortisol? To laugh? Etc.” This enables the brainstem to understand what’s going on well enough to issue appropriate rewards. See Big Picture of Phasic Dopamine. I glossed over this stuff in this post, in order to keep things simple.

What about “The Monkey And The Machine”?

Many readers here have seen Paul Christiano’s post The Monkey And The Machine. I don’t have any particular complaints there. I think of the post as being mostly about stuff that happens within the neocortex, where “monkey” is the trained model learned by the cortex, and “deliberator” is a subset of those learned processes that implements things like "explicit reasoning using language". (Somewhat related: Kaj Sotala’s post System 2 as working-memory augmented System 1 reasoning.) This is a reasonable and useful way to think about certain things. I don't think it has anything to do with what I'm talking about here.

New to LessWrong?

Getting Started

FAQ

Library

Akrasia2Motivations2Neocortex2

Frontpage

62

Mentioned in

95[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering

69[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL

46[Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation

45Value loading in the human brain: a worked example

22Model-based RL, Desires, Brains, Wireheading

(Brainstem, Neocortex) ≠ (Base Motivations, Honorable Motivations)

3Nicholas / Heather Kross

New Comment

14 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:26 PM

[-]Scott Alexander4y120

Can you link to an explanation of why you're thinking of the brainstem as plan-evaluator? I always thought it was the basal ganglia.

[-]Steven Byrnes4y*100

Good question!

Yes I can link it but it's very long, sorry: Big Picture Of Phasic Dopamine.
The midbrain dopamine centers (VTA, SNc) are traditionally "part of the basal ganglia" AND "part of the brainstem". I think that these regions are where you find the "final answer" about whether a plan is good or bad, and that the dopamine signals from these regions can just directly shut down bad ideas.
But of course a lot of processing happens before you get to the "final answer"…
Specifically, I think there are basically three layers of "plan evaluation":
- First, you start with a bubbly soup of partially-formed thoughts in the neocortex, and the (dorsal) striatum does a quick rough guess of how promising the different pieces look, and gently suppresses the less-promising bits / enhances the more-promising bits, so that when you get a fully-formed thought, it's at least reasonably promising.
- Second, once you have a stable fully-formed thought, I think various parts of the brain (parts of prefrontal & cingulate & insular cortex, ventral striatum, amygdala, hippocampus (sorta)) score that thought along maybe dozens of genetically-hardcoded axes like “If I'm gonna do this plan, how appropriate would it be to cringe? To salivate? To release cortisol? To laugh? How much salt would I wind up eating? How much umami? Etc. etc.” (They learn to do these evaluations through experience, a.k.a. supervised learning.) And they send all those "scores" down to the hypothalamus and brainstem.
- Finally, we're at the hypothalamus & brainstem. They look at the information from the previous bullet point, and they combine that information with other information streams, like metabolic status information (if I'm hungry, a plan that involves eating gets extra points), and whether the superior colliculus thinks there's a snake in the field-of-view, and so on. Taking all that information into account, they make the final decision as to whether the plan is good or bad, using a genetically-hardcoded algorithm.

Happy to discuss more; also, I'm still reading about this / talking to experts / etc., and I reserve the right to change my mind :)

[-]Scott Alexander4y80

Thanks, I read that, and while I wouldn't say I'm completely enlightened, I feel like I have a good basis for reading it a few more times until it sinks in.

I interpret you as saying in this post: there is no fundamental difference between base and noble motivations, they're just two different kinds of plans we can come up with and evaluate, and we resolve conflicts between them by trying to find frames in which one or the other seems better. Noble motivations seem to "require more willpower" only because we often spend more time working on coming up with positive frames for them, because this activity flatters our ego and so is inherently rewarding.

I'm still not sure I agree with this. My own base motivation here is that I posted a somewhat different model of willpower at https://astralcodexten.substack.com/p/towards-a-bayesian-theory-of-willpower , which is similar to yours except that it does keep a role for the difference between "base" and "noble" urges. I'm trying to figure out if I still want to defend it against this one, but my thoughts are something like:

- It feels like on stimulants, I have more "willpower" : it's easy to take the "noble" choice when it might otherwise be hard. Likewise, when I'm drunk I have less ability to override base motivations with noble ones, and (although I guess I can't prove it) this doesn't seem like a purely cognitive effect where it's harder for me to "remember" the important benefits of my noble motivations. The same is true of various low-energy states, eg tired, sick, stressed - I'm less likely to choose the noble motivation in all of them. This suggests to me that baser and nobler motivations are coming from different places, and stimulants strengthen (in your model) the connection between the noble-motivation-place and the striatum relative to the connection between the base-motivation-place the striatum, and alcohol/stress/etc weaken it.

- I'm skeptical of your explanation for the "asymmetry" of noble vs. base thoughts. Are thoughts about why I should stay home really less rewarding than thoughts about why I should go to the gym? I'm imagining the opposite - I imagine staying home in my nice warm bed, and this is a very pleasant thought, and accords with what I currently really want (to not go to the gym). On the other hand, thoughts about why I should go to the gym, if I were to verbalize them, would sound like "Ugh, I guess I have to consider the fat that I'll be a fat slob if I don't go, even though I wish I could just never have to think about that".

- Base thoughts seem like literally animalistic desires - hunger seems basically built on top of the same kind of hunger a lizard or nematode feels. We know there are a bunch of brain areas in the hypothalamus etc that control hunger. So why shouldn't this be ontologically different from nobler motivations that are different from lizards'? It seems perfectly sensible that eg stimulants strengthen something about the neocortex relative to whatever part of the hypothalamus is involved in hunger. I guess I'm realizing now how little I understand about hunger - surely the plan to eat must originate in the cortex like every other plan, but it sure feels like it's tied into the hypothalamus in some really important way. I guess maybe hunger could have a plan-generator exactly like every other, which is modulated by hypothalamic connections? It still seems like "plans that need outside justification" vs. "plans that the hypothalamus will just keep active even if they're stupid" is a potentially important dichotomy.

- Base motivations also seem like things which have a more concrete connection to reinforcement learning. There's a really short reinforcement loop between "want to eat candy" and "wow, that was reinforcing", and a really long (sometimes nonexistent) loop between going to the gym and anything good happening. Again, this makes me suspicious that the base motivations are "encoded" in some way that's different from the nobler motivations and which explains why different substances can preferentially reinforce one relative to the other.

- The reasons for thinking of base motivations as more like priors, discussed in that post.

- Kind of a dumb objection, but this feels analogous to other problems where a conscious/intellectual knowledge fails to percolate to emotional centers of the brain, for example someone who knows planes are very safe but is scared of flying anyway. I'm not sure how to use your theory here to account for this situation, whereas if I had a theory that explained the plane phobia problem I feel like it would have to involve a concept of lower-level vs. higher-level systems that would be easy to plug into this problem.

- Another dumb anecdotal objection, but this isn't how I consciously experience weakness of will. The example that comes to mind most easily is wanting to scratch an itch while meditating, even though I'm supposed to stay completely still. When I imagine my thought process while worrying about this, it doesn't feel like trying to think up new reframings of the plan. It feels like some sensory region of the brain saying "HEY! ITCH! YOU SHOULD SCRATCH IT!" and my conscious brain trying to exert some effort to overcome that. The effort doesn't feel like thinking of new framings, and the need for the effort persists long after every plausible new framing has been thought of. And it does seem relevant that "scratch itch" has no logical justification (it's just a basic animal urge that would persist even if someone told you there was no biological cause of the itch and no way that not scratching it could hurt you), whereas wanting to meditate well has a long chain of logical explanations.

[-]Steven Byrnes4y*40

Thanks for your helpful comments!!! :)

surely the plan to eat must originate in the cortex like every other plan, but it sure feels like it's tied into the hypothalamus in some really important way

One thing is: I think you’re assuming a parallel model of decision-making—all plans are proposed in parallel, and the striatum picks a winner.

My scheme does have that, but then it also has a serial part: you consider one plan, then the next plan, etc. And each time you switch plans, there’s a dopamine signal that says whether this new plan is better or worse than the status quo / previous plan.

I think there’s good evidence for partially-serial consideration of options, at least in primates (e.g. Fig. 2b here). I mean, that’s obvious from introspection. My hunch is that partially-serial decision-making is universal in vertebrates.

Like, imagine the lamprey is swimming towards place A, and it gets to a fork where it could instead turn and go to place B. I think "the idea of going to place B" pops into the lamprey's brain (pallium), displacing the old plan, at least for a moment. Then a dopamine signal promptly appears that says whether this new plan is better or worse than the old plan. If it's worse (dopamine pause), the lamprey continues along its original trajectory without missing a beat. This is partially-serial decision-making. I don't know how else the system could possibly work. Different pallium location memories are (at least partially) made out of the activations of different sparse subsets of neurons from the same pool of neurons, I think. You just can't activate a bunch of them at once, it wouldn't work, they would interfere with each other, AFAICT.

Anyway, if options are considered serially, things become simpler. All you really need is a mechanism for the hypothalamus to guess “if we do the current plan, how much and what type of food will I eat?”. (Such a mechanism does seem to exist AFAICT—in fact, I think mammals have two such mechanisms!)

OK, so then imagine a back-and-forth dialog.

The neocortex proposes a plan.
The hypothalamus & brainstem say “I’m hungry, but I notice that this plan won’t lead to eating any food. Bzzzz, Rejected! Try again.”
The neocortex proposes a different plan.

…Etc. etc.

(And eventually, the cortex learns that when the body is hungry, maybe don’t even bother proposing plans that won’t involve eating!)

Base thoughts seem like literally animalistic desires

I’m happy for you to intuitively think of “the desire to eat” as an “animalistic instinct”. I guess I’m encouraging you to also intuitively think of things like “the desire to be well-respected by the people whom I myself most respect” as being also an “animalistic instinct”.

The thing is, everything that comes out of the latter instinct is highly ego-syntonic, so we tend to unthinkingly identify with them, instead of externalizing them, I think.

For example, if I binge-eat, I’m happy to say to myself “my brainstem made me do it”. Whereas if I do something that delights all my favorite in-group people, I would find it deeply threatening and insulting to say to myself “my brainstem made me do it”.

It feels like on stimulants, I have more "willpower" : it's easy to take the "noble" choice when it might otherwise be hard. Likewise, when I'm drunk...

I think that the brainstem always penalizes (subtracts points from) plans that are anticipated to entail mental concentration. I have no idea why this is the case (evolutionarily)—maybe it’s something about energy, or opportunity cost, or the tendency to not notice lions sneaking up behind us because we’re so lost in thought. Beats me.

And I think that the more tired you are, the bigger the penalty that the brainstem applies to plans that entail mental concentration. (Just like physical exertion.)

(BTW this is my version of what you described as "motionlessness prior".)

I think this impacts willpower in two ways:

First, we are often “applying willpower” towards things that entail mental concentration and/or physical exertion, like homework and exercise. So the more tired we are, the more brainstem skepticism we have to overcome. So we're more likely to fail.

Second, the very process of trying to “apply willpower” (= reframing plans to pass muster with the brainstem while still satisfying various constraints) itself requires mental concentration. So if you’re sufficiently tired, the brainstem will veto even the idea of trying to “apply willpower”, and also be quicker to shut down the process if it’s not immediately successful.

(Same for feeling sick or stressed. And the opposite for being on stimulants.)

(As for being drunk, we have both those items plus a third item, which is that “applying willpower” requires mental concentration, and maybe a sufficiently drunk neocortex is just plain incapable of concentrating on anything in particular, for reasons unrelated to motivation.)

Are thoughts about why I should stay home really less rewarding than thoughts about why I should go to the gym? I'm imagining the opposite

Let’s suppose that you do in fact find it more pleasant to stay home in bed than go to the gym. And let’s further suppose that you eventually wind up going to the gym anyway. As an onlooker, I want to ask the question: Why did you do that? There must have been something appealing to you about going to the gym, or else you wouldn’t have done it. People don’t do unpleasant things for no reason whatsoever.

So if “the act of exercising” is less pleasant to you than “the act of staying home” … and yet I just saw you drive off to the gym … I think that the only explanation is: the thought of “I will have exercised” is much more appealing to you than “I will have stayed home”—so much so that it more-than-compensates for the immediate aversiveness.

(Or if the motivation is “if I stay home I’ll think of myself as a fat slob”, we can say “much less unappealing” instead of “much more appealing”.)

So I stand by the asymmetry. In terms of immediately salient aspects, staying at home is more desirable. In terms of less salient aspects, like all the consequences and implications that you think of when you hold each of the plans in your mind, going to the gym is more desirable (or less undesirable), at least for the person who actually winds up going.

Base motivations also seem like things which have a more concrete connection to reinforcement learning.

I think “noble motivations” are often ultimately rooted in our social instincts (“everyone I respect is talking about how great X is, I want to do X too.”)

I haven’t seen a satisfying (to me) gears-level explanation of social instincts that relates them to RL and everything else we know about the brain. I aspire to rectify this someday… I have vague ideas, but it's a work in progress. :-)

In terms of how long or short the RL loop is, things like imagination and memory can bridge gaps in time. Like, if all the cool kids talk about how getting ripped is awesome, then going to the gym can be immediately rewarding, because while you’re there, you’re imagining the cool kids being impressed by your muscles two months from now. Or in the other direction: two months later the cool kids are impressed by your muscles, and you remember how you’re ripped because you went to the gym, and then your brain flags the concept of “going to the gym” as a thing that leads to praise. I think the brain’s RL system can handle those kinds of things.

someone who knows planes are very safe but is scared of flying anyway.

I think one of the “assessment calculators” (probably in the amygdala) is guessing which plans will lead to mortal danger, and this calculator is giving very high scores to any plan that involves flying in planes. We (= high-level planning cortex) don’t have conscious control over any of these assessment calculators. (Evolution designed it like that for good reason, to avoid wireheading.) The best we can do is try to nudge things on the margin by framing plans in different ways, attending to different aspects of them, etc. (And of course we can change the assessment calculators through experience—e.g. exposure therapy.)

wanting to scratch an itch while meditating

I think there’s a special thing for itches—along with getting poked on the shoulder, flashing lights, sudden sounds, pains, etc. I think the brainstem detects these things directly (superior colliculus etc.) and then "forces" the high-level-planning part of the neocortex (global neuronal workspace or whatever) to pay attention to them.

So if you get poked, the brainstem forces the high-level planner to direct attention towards the relevant part of somatosensory cortex. If there’s a flashing light, the brainstem forces attention towards the corresponding part of visual cortex. If there's an itch, I think it’s interoceptive cortex (in the insula), etc.

Not sure but I think the mechanism for this involves PPN sending acetylcholine to the corresponding sensory cortex.

Basically, it doesn't matter whether the current top-down model is "trying" to make strong predictions or weak predictions about that part of the sensory field. The exogenous injection of acetylcholine simply overrules it. So you're more-or-less forced to keep thinking about the itch. Any other thoughts tend to get destabilized. (Dammit, evil brainstem!)

The effort doesn't feel like thinking of new framings

I don’t think it has to…

What is "effort", objectively? Maybe something like (1) you’re doing something, (2) it entails mental concentration or physical exertion (which as mentioned above are more penalized when you’re tired), (3) doing it is causing unpleasant feelings.

Well, that fits the itch example! The thing I’m doing is “thinking a certain thought” (namely, a thought that involves not scratching my itch). It entails a great feat of mental concentration—thanks to that acetylcholine thing I mentioned above, constantly messing with my attention. And every second that I continue to think that thought causes an unpleasant itching sensation to continue. So, maybe that’s all the ingredients we need for it to feel like “effort”.

(A plan can get a high brainstem reward while also feeling unpleasant.)

wanting to meditate well has a long chain of logical explanations.

Hmm, I don't think that's relevant. Imagine you had an itch where every time you scratch it, it makes your whole body hurt really bad for 30 seconds. And then it starts itching again. You would be in the same place as the meditation example, “exerting willpower” not to scratch the itch. Right? Sorry if I'm missing your point here.

[-]Steven Byrnes4y20

Since this keeps coming up—Big Picture of Phasic Dopamine is still the best resource, but I just summarized this aspect of it in 20× fewer words: A model of decision-making in the brain (the short version). It's pretty similar to what I wrote in my other reply comment though.

[-]Vanilla_cabs4y100

I don't know how common or obvious it is, but I noticed music works for me the same way as concepts do in your model. By listening to a song while having an experience arising positive affects, some positive affect is loaded into the song. Then, when I want to do something hard like study or work, listening to the song makes it more doable by tapping into its positive charge. This works for a time, until the song feels 'discharged', and even starts feeling as unpleasant as the hard situation it's now associated with.

[-]Nicholas / Heather Kross4y30

Possible idea here of a "motivation treadmill", where we can only motivate to do hard things by "using up" the supply of good things to link them to. (Relatedly: the demotivating ideas of "nothing motivates me" or "nothing I try works").

[-]Steven Byrnes4y30

I'm not particularly knowledgeable about self-help and whatnot, but I generally like Nate Soares's advice on this kind of stuff, e.g. finding things you really care about, avoiding weak generic motivations like "I will do X because I'm supposed to", etc.

[-]MaxRa4y10

Very interesting. This reminded me of Keith Stanovich's idea of the master rationality motive, which he defines as a desire to integrate higher-order preferences with first-order preferences. He gives an example of wanting to smoke and not wanting to want to smoke, which sounds like you would consider this as two conflicting preferences, health vs. the short-term reward from smoking. His idea how these conflicts are resolved are to have a "decoupled" simulation in which we can simulate adapting our first-order desires (I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?) and finding better solutions.

The master rationality motive seems to aim at something slightly different, though, e.g. given the questionnaire items Stanovich envisions to measure it, for example

I am only confident of decisions that are made after careful analysis of all available information.
I don’t feel I have to have reasons for what I do. (R)

https://www.researchgate.net/publication/220041090_Higher-order_preferences_and_the_Master_Rationality_Motive

Regarding the asymmetry, I have the intuition that the asymmetry of honorability comes through a different weighing of desires, e.g. you'd expect some things to be more important for our survival and reproduction, e.g. food, sex, not freezing, avoiding danger > honesty, caring for non-kin, right?

[-]Steven Byrnes4y20

Thanks!

I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?

I'm not sure exactly what you mean ... I guess I would say "wanting to smoke" is being in a state where plans-that-will-not-lead-to-smoking get docked a ton of points by the brainstem, plus maybe there's some mechanism that is incessantly forcing high-level attention to be focused on the (unpleasant) sensations related to not-smoking-right-now (see my comment here), and so on.

food, sex, not freezing, avoiding danger > honesty, caring for non-kin

I want to push back on that. Humans are an intensely social species, and an individual's prospects for survival and reproduction are extremely dependent on their winning and maintaining allies, being popular and well-respected, etc. I think this is reflected in our behavior: e.g. among humanity's favorite collective activities are talking about people, thinking about people, talking to people, etc. Social experiences are probably well-represented among the best and worst experiences of most people's lives. I mean, there are lots of non-social species, they just don't do those kinds of things. This book says that if an early human made a lot of enemies, the enemies would probably gang up on that person and kill them. This book is about how practically every thought we think gets distorted by our social instincts, etc. etc. I think I read somewhere that in small tribes, the most charismatic and popular and high-status people are likelier to have multiple wives (for men) and lots of children etc.

It would be weird for two desires to have a strict hierarchical relationship. When given a choice between food and water, sometimes we choose food, sometimes we choose water, depending on our current metabolic state, how much food or water is at stake, etc. It's definitely not the case that "water always automatically trumps food, regardless of context"; that would be weird.

So by the same token, if your friend is holding a drink, you probably won't stab them in the back and steal their drink, as you would under a "quenching thirst >> social instincts" model. But if you're super-duper-desperately thirsty, then maybe you would stab them in the back and steal their drink. So to me this looks very similar to the thirst vs hunger situation: there's a tradeoff between satisfying competing desires, a desire to be kind to your friends (an evolved desire which ultimately serves the purpose of having allies / being popular / etc.) and a desire to drink when you're thirsty.

[-]MaxRa4y30

It would be weird for two desires to have a strict hierarchical relationship.

I agree, I didn't mean to imply a strict hierarchical relationship, and I think you don't need a strict relationship to explain at least some part of the asymmetry. You just would need less honorable desires on average having more power over the default, e.g.

taking care of hunger,
thirst,
breath,
looking at aesthetically pleasing things,
remove discomforts

versus

taking care of long-term health
clean surrounding
expressing gratitude

And then we can try to optimize the default by searching for good compromises or something like that, which more often involve more honorable desires, like self-actualization, social relationships, or something like that. (I expect all of this to vary across individuals and probably also cultures).

there's a tradeoff between satisfying competing desires

I agree it depends on the current state, e.g. of course if your satiated you won't care much about food. But, similar to your example, could you make somebody stab their friend by starving them in their need for showing gratitude, or the desire for having fun? I suspect not. But could you do it by starving them in their need of breathing oxygen, or making them super-duper-depesperately thirsty? I (also) suspect more often yes. That seems to imply some more general weighing?

> I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?
I'm not sure exactly what you mean ...

What you replied makes sense to me, thanks.

[-]Steven Byrnes4y20

You just would need less honorable desires on average having more power over the default

I guess I sort of have a different way of thinking about it. On my perspective, if someone takes an action to satisfy a social-related desire at the expense of a food-related desire, then that means social-related desire was the more powerful desire at that particular time.

So if Alice in point of fact chooses an action that advances friendship over an action that would satisfy her mild hunger right now, I would say the straightforward thing: "Well, I guess Alice's desire to advance friendship was a more powerful desire for her, at this particular moment, than her desire to satisfy her mild hunger". Or at least, in my mind, this is the straightforward and obvious way to think about it. I guess you would disagree, but I'm not quite sure what you would say instead.

What do weak desires look like? Here's an example. I have a very weak desire that, when sitting down, I prefer to put my legs up, other things equal. I wouldn't even bother to walk across a room to get an ottoman, I don't think about it at all, the only effect of this weak desire on my behavior is that, if I happen to be in a situation where putting my legs up is super easy and convenient and has essentially no costs whatsoever, I'll go ahead and put my legs up. In my model, the mark of a weak desire is that it has very little influence on my thoughts and behaviors.

…And in particular, my model does not have a thing where weak desires heroically fight Jason-vs-Goliath battles to overcome stronger desires.

See also the thing I wrote about internalizing ego-syntotic desires here, maybe that will help.

could you make somebody stab their friend by starving them in their need for showing gratitude, or the desire for having fun? I suspect not. But could you do it by starving them in their need of breathing oxygen, or making them super-duper-depesperately thirsty? I (also) suspect more often yes. That seems to imply some more general weighing?

I guess I would say, any given desire has some range of how strong it can be in different situations, and if you tell me that the very strongest possible air-hunger-related desire is stronger than the very strongest possible social-instinct-related desire, I would say "OK sure, that's plausible." But it doesn't seem particularly relevant to me. The relevant thing to me is how strong the desires are at the particular time that you're making a decision or thinking a thought.

That said, I'm not sure that it is true that the very strongest possible air hunger desire is definitely stronger than the very strongest possible social-instinct-related desire. My impression is that some people will not betray their friends even while being tortured, even if the torture involves inducing extreme air-hunger, thirst, etc.

Also, social instincts can prompt people to take premeditated actions that they know will lead to instant death, or extreme pain, or spending the rest of their lives in prison, etc. It's powerful stuff. :-P

[-]MaxRa4y30

Thanks for elaborating!

I guess I would say, any given desire has some range of how strong it can be in different situations, and if you tell me that the very strongest possible air-hunger-related desire is stronger than the very strongest possible social-instinct-related desire, I would say "OK sure, that's plausible." But it doesn't seem particularly relevant to me. The relevant thing to me is how strong the desires are at the particular time that you're making a decision or thinking a thought.

I think that almost captures what I was thinking, only that I expect the average intensity within these ranges to differ, e.g. for some individuals the desire for social interaction is usually very strong or for others rather weak (which I expect you to agree with). And this should explain which desires more often supply the default plan and for which additional "secondary" desires the neocortex has to work for to find an overall better compromise.

For example, you come home and your body feels tired and the desire that is strongest at this moment is the desire for rest, and the plan that suits this desire most is lying in bed and watching TV. But then another desire for feeling productive pushes for more plan suggestions and the neocortex comes up with lying on the coach and reading a book. And then the desire for being social pushes a bit and the revised plan is for reading the book your mum got you as a present.

[-]Steven Byrnes4y20

Hmm, when I think "default plan", I think something like "what's the first thing I think to do, based on what's most salient in my mind right now?". So this can be related to the acetylcholine dynamic I mentioned here, where things like itches and annoying car alarms are salient in my mind even if I don't want them to be. Hunger is definitely capable of forcibly pulling attention. But I do also think you can get a similar dynamic from social instincts. Like if someone shouts your name "Hey MaxRa!!", your "default plan" is to immediately pay attention to that person. Or a more pleasant example is: if you're snuggling under the blanket with your significant other, then the associated pleasant feelings are very salient in your mind, and the "default plan" is to remain under the blanket.

That acetylcholine dynamic is just one example; there can be other reasons for things to be more or less salient. Like, maybe I'm thinking: "I could go to the party…", but then I immediately think: "…my ex might be at the party and oh geez I don't want to see them and have to talk to them". That's an example where there are social instincts on both sides of the dilemma, but still, the downsides of going to the party (seeing my ex) pop right out immediately to the forefront of my mind when I think of the party, whereas the benefits of going to the party (I'll be really glad I did etc.) are strong but less salient. So the latter can spawn very powerful desires if I'm actively thinking of them, but they're comparatively easy to overlook.

Moderation Log