Thanks for writing this! I am very excited that this post exists. I think what this model suggests about procrastination and addiction alone (namely, that they're things that managers and firefighters are doing to protect exiles) are already huge, and resonate strongly with my experience.
In the beginning of 2018 I experienced a dramatic shift that I still don't quite understand; my sense of it at the time was that there was this crippling fear / shame that had been preventing me from doing almost anything, that suddenly lifted (for several reasons, it's a long story). That had many dramatic effects, and one of the most noticeable ones was that I almost completely stopped wanting to watch TV, read manga, play video games, or any of my other addiction / procrastination behaviors. It became very clear that the purpose of all of those behaviors was numbing and distraction ("general purpose feeling obliterators" used by firefighters, as waveman says in another comment) from how shitty I felt all the time, and after the shift I basically felt so good that I didn't want or need to do that anymore.
(This lasted for awhile but not forever; I crashed hard in Se...
things that had been too scary for me to think about became thinkable (e.g. regrettable dynamics in my romantic relationships), and I think this is a crucial observation for the rationality project. When you have exile-manager-firefighter dynamics going on and you don't know how to unblend from them, you cannot think clearly about anything that triggers the exile, and trying to make yourself do it anyway will generate tremendous internal resistance in one form or another (getting angry, getting bored, getting sleepy, getting confused, all sorts of crap), first from managers trying to block the thoughts and then from firefighters trying to distract you from the thoughts. Top priority is noticing that this is happening and then attending to the underlying emotional dynamics.
Yes!
Valentine has also written some good stuff on this, in e.g. The Art of Grieving Well:
I think the first three so-called “stages of grief” — denial, anger, and bargaining — are avoidance behaviors. They’re attempts to distract oneself from the painful emotional update. Denial is like trying to focus on anything other than the hurt foot, anger is like clutching and yelling and getting mad at the situation,...
Curated.
The internal family systems model has seen a lot of discussion in various rationalist and rationalist-adjaecent places, but:
a) usually among people who were already familiar with it,
b) usually with a vague disclaimer of being a fake-framework, without delving into the details of where the limits of the framework lay or how to contextualize it in a broader reductionist worldview.
I think it's been a long-time coming for someone to write up a comprehensive case for why the model is worth taking seriously, placing it in terms that can be concretely reasoned about, built off of and/or falsified.
Really what I want is for Kaj's entire sequence to be made into a book. Barring that, I'll settle for nominating this post.
Have you read Minsky's _Society of Mind_? It is an AI-flavored psychological model of subagents that draws heavily on psychotherapeutic ideas. It seems quite similar in flavor to what you propose here. It inspired generations of students at the MIT AI Lab (although attempts to code it never worked out).
I've been attempting to use IFS for years without having read much more than brief summaries of it. This post put me on a much firmer footing with it and I was able to much more clearly categorize a bunch of things that have been happening over the past six months or so. Then over the weekend I had a low-level background internal screaming going on, and while my first couple rounds of attempts at resolving it only helped a little, I was finally able to isolate the issue and fix what turned out to be a massive misalignment. I have not felt this aligned in years.
So thank you very, very much for writing this.
My 2 cents:
1 cent: It seems that sub-personalities do not actually exist, but are created by the human mind at the moment of query. The best way to explain this is to look at improvisation theatre, as described in the post by Valentine Intelligent social web. The consequence of this non-actual existence of the subpersonalities is that we could have different expectations about types of personalities, and still get therapeutically useful and consistently sounding results. For example, some people try to cure psychological problems by making a person to rem...
It doesn't really much matter whether this is true or not.
I think it matters from the perspective that if subagents are simulated at query time, then a non-subagent model should be able to produce similar results to IFS, with fewer complications.
My own experience comparing subagent-oriented approaches (e.g. IFS, Core Transformation) with non-subagent ones, the non-subagent ones generally require less work to figure out what is going on, because simulating parts that want to hide or deflect stuff is more energy-intensive and frustrating than just helping someone notice that they are hiding or deflecting things.
For example, when I segregate my own desires into parts, it increases the odds of an argument or of parts withholding information or motives, vs. presupposing that all my desires are mine and that I have good reasons even for doing apparently self-destructive things.
That being said, I can think of all kinds of situations where IFS as a metaphor would be superior to more direct approaches... but they all involve people for whom the subagent metaphor is an easier introduction to metacognition, and/or the stuff being dealt with is traumatic enough that you really want to keep
...I'm actually kind of surprised that IFS seems so popular in rationalist-space, as I would've thought rationalists more likely to bite the bullet and accept the existence of their unendorsed desires as a simple matter of fact.
Some reasons for the popularity of IFS which seem true to me, and independent of whether you accept your desires:
(adding to my other comment)
dividing people into lots of mini-people isn't a reduction.
And like, the post you're responding to just spent several thousand words building up a version of IFS which explicitly doesn't have "mini-people" and where the subagents are much closer to something like reinforcement learning agents which just try to prevent/achieve something by sending different objects to consciousness, and learn based on their success in doing so...
For example, firefighters are called that because “they are willing to let the house burn down to contain the fire”; that is, when they are triggered, they typically act to make the pain stop, without any regard for consequences (such as loss of social standing). At the same time, managers tend to be terrified of exactly the kind of lack of control that’s involved with a typical firefighter response. This makes firefighters and managers typically polarized - mutually opposed - with each other.
In my experience, this distinction merely looks like normal reinforcement: you can be short-term reinforced to do things that are against your interests in the long-term. This happens with virtually every addictive behavior; in fact, Dodes’ theory of addiction is that people feel better the moment they decide to drink, gamble, etc., and it is that decision that is immediately reinforced, while the downsides of the action are still distant. (Indeed, he notes that people often make that decision hours in advance of the actual behavior.)
...If we only talked about various behaviors getting reinforced, we wouldn’t predict that the system simultaneously considers a loss of a social standing to b
The content of this and the other comment thread seems to be overlapping, so I'll consolidate (pun intended) my responses to this one. Before we go on, let me check that I've correctly understood what I take to be your points.
Does the following seem like a fair summary of what you are saying?
Re: IFS as a reductionist model:
I don't know what you mean by this at all. Can you give (or maybe point to) an example?
So, let's take the example of my mother stressing over deadlines. Until I reconsolidated that belief structure... or hell, since UTEB seems to call it a "schema", let's just call it that. I had a schema that said I needed to be stressed out if the goal was serious. I wasn't aware of that, though: it just seemed like "serious projects are super stressful and I never know what to do", except wail and grind my teeth (figuratively speaking) until stuff gets done.
Now, I was aware I was stressed, and knew this wasn't helpful, so I did all sorts of things to calm down. People (like my wife) would tell me everything was fine, I was doing great, go easier/don't be so hard on yourself, etc. I would try practicing self-compassion, but it didn't do anything, except maybe momentarily, because structurally, being not-stressed was incompatible with my schema.
In fact, a rather weird thing happened: the more I managed to let go of judgments I had about how well I was doing, and the better I got at being self-compassionate, the worse I felt. It wasn't the same kind of stress, but it was actually worse, d
...It seems to me that the emotional schemas that Unlocking the Emotional Brain talks about, are basically the same as what IFS calls parts. You didn't seem to object to the description of schemas; does your objection also apply to them?
AFAICT, there's a huge difference between UTEB's "schema" (a "mental model of how the world functions", in their words) and IFS' notion of "agent" or "part". A "model" is passive: it merely outputs predictions or evaluations, which are then acted on by other parts of the brain. It doesn't have any goals, it just blindly maps situations to "things that might be good to do or avoid". An "agent" is implicitly active and goal-seeking, whereas a model is not. "Model" implies a thing that one might change, whereas an "agent" might be required to change itself, if a change is to happen.
UTEB also describes the schema as "wordlessly [defining] how the world is" -- which is quite coherent (no pun intended) with my own models of mindhacking. I'm actually looking forward to reading UTEB in full, as the introduction makes it sound like the models I've developed of how this stuff works, are quite similar to theirs.
(Indeed, my own approach is specifically tar
...Wow. So glad I ended up on a Goodreads review for the IFS main book and this article was recommended. Just wanted to say thank you for the metaphor presented, really helpful.
So I finally read up on it, and have been successfully applying it ever since.
Could you give some examples of where you've been applying IFS and how it's been helpful in those situations?
So I find IFS, Focusing, IDC, and some aspects of TMI-style meditation to basically have blended together into one big hybrid technique for me; they all feel like different aspects of what's essentially the same skill of "listening to what your subagents want and bringing their desires into alignment with each other"; IFS has been the thing that gave me the biggest recent boost, but it's not clear to me that I'm always doing "entirely pure IFS", even though I think there's nearly always a substantial IFS component. (Probably most important has been the part about getting into Self, which wasn't a concept I explicitly had before this.)
That said, a few examples. I already mentioned a few in an earlier post:
My experience is that usually if I have an unpleasant emotion, I will try to do one of two things: either reject it entirely and push it out of my mind, or buy into the story that it’s telling and act accordingly. Once I learned the techniques for getting into Self, I got the ability to sort of… just hang out with the emotion, neither believing it to be absolutely true nor needing to show it to be false. And then if I e.g. had feelings o...
I am not OP but I can give an example.
As background there are some activities that are general purpose feeling obliterators and thus are commonly used by firefighters: binge-eating, drinking alcohol, drugs, sex, TV, video games...
I have been fighting with my weight for many (26!) years. I did lose a lot of weight but still at BMI 26 and could not get off that last 7kg. Using the IFS process I identified the firefighters which used eating to make various feelings go away:
Social stress, anxiety about food being available (from when I was young = "Jimmi"), feelings of emotional deprivation (childhood situation), feelings of frustration when I could not understand something, feeling tired, feeling frightened (childhood situation)
Once I connected with these protectors and made friends with them, connected (with their permission) with the original exiles, and established that the problems have solutions, I have been able to stick to my diet for 50 days straight and lose 2.5kg in less than two months. This takes me almost half way to my target.
As an example how much has changed I have had a packet of chocolate biscuits in my refrigerator for the last few weeks with no drama at...
The back-and-forth (here and elsewhere) between Kaj & pjeby was an unusually good, rich, productive discussion, and it would be cool if the book could capture some of that. Not sure how feasible that is, given the sprawling nature of the discussion.
Nomination for 2019 review:
I originally tried to read Self-Therapy, but bounced off of it because it was aimed too much at people with major life-impacting traumas. This post was much more approachable, and I liked the robot metaphor. Since reading it, I started to notice the ways in which my own mind is behaving like a manager or firefighter with respect to embarrassing incidents in the past.
I came back to this post because I was thinking about Scott's criticism of subminds where he complains about "little people who make you drink beer because they like beer".
I'd already been considering how your robot model is nice for seeing why something submind-y would be going on. However, I was still confused about thinking about these various systems as basically people who have feelings and should be negotiated with, using basically the same techniques I'd use to negotiate with people.
Revisiting, the "Personalized characters" section was pretty useful
...I really enjoyed this post and starting with the plausible robot design was really helpful for me accessing the IFS model. I also enjoyed reflecting on your previous objections as a structure for the second part.
The part with repeated unblending sounds reminiscent of the "Clearing a space" stage of Focusing, in which one acknowledges and sets slightly to the side the problems in one's life. Importantly, you don't "go inside" the problems (I take 'going inside' to be more-or-less experiencing the affect associated with the problems). This seems pretty simil
...I've read a lot of books in the self-help/therapy/psychology cluster, but this is the first which gives a clear and plausible model of why the mental structure they're all working with (IFS exiles, EMDR unprocessed memories, trauma) has enough fitness-enhancing value to evolve despite the obvious costs.
I'm a little late to the party, but I just read through and did the exercises of The Self Therapy last week and feeling very excited about how many components of the model "clicked" with me. Reading this post gave me insights into why those components resonated with me, so thank you very much for taking the time to write up this supremely helpful post!
The one aspect of the model that I've been having a lot of trouble with, which I view as problematic since the entire model essentially hinges on this practice, is to have an "organic...
Seems like directly entering a Catastrophic situation (burning hand on hot stove) without going through Distress would lead to a more severe Manager (or Exile) like PTSD. I.e, a soldier walking into a firefight & being vs. being shot by sniper. Related: losing a limb suddenly vs. having it amputated (with advance warning) seems to make it more likely you'd have Phantom Limb pain b/c your mind never registered the limb was missing.
I'm finding it fruitful to consider the "exiles" discussion in this post alongside Hunting the Shadow.
This is a great post; particularly in how you narrate bouncing off of it and then building a model by which it or something like it is plausible.
I actually had the luck of having an in-person demonstration of this (IFS-style therapy) from someone in the LW/rationalist community years ago and I've been discussing it and recommending it to others ever since.
Wow, this is all very interesting.
I have been using this framework for a bit and I think I have found some important clues about some exile-manager-firefighter dynamics in myself. Although I'm just starting and I still have to clarify my next steps, I feel hopeful that this is the right direction.
There are some things which I would like to know more about. Feel free to answer any.
Which agent should the sympathetic listener be talking to? The manager, the exile, or both?
Assuming that one correctly identifies which thoughts (and ultimately, which situat...
Really enjoyed the post, thanks!
I started the Earley book and it's definitely a struggle. I usually can handle "soft skills" books like this one without getting frustrated by the vague, hand-wavy models—I really enjoyed Gendlin's Focusing, for example—but this one's been especially hard. That said, having your model in mind while I'm reading has kept me going as I'm using it as a sort of Rosetta's stone for some of Earley's claims.
When I first read the post, I expected that "family systems" are related to Hellinger's family constellations: this is a different method of psychotherapy which assumes completely different set of "subagents" to define human mind and its problems. In the Hellinger's constellation method is assumed that actual family relations of a person has the biggest impact on the person's wellbeing (and motivation), and that the family structure is somehow internalised. This family structure could be invoked by group of people (assign...
A visceral, real world example:
Workers who are killed who can't let go of their tools because it's part of their identity. I suspect there is a Part (in IFS parlance) that tells them "this is your identity".
From the book Range (highly recommended):
In four separate fires in the 1990s, twenty-three elite wildland firefighters refused orders to drop their tools and perished beside them. Even when Rhoades eventually dropped his chainsaw, he felt like he was doing something unnatural. Weick found similar phenomena in Navy seamen who ignore...
Gensler is a practical/applied framework of Freud, whose influence continues to grow in the humanities (outside of the psychology department, wherever that chimera sits). Most of the commentary above would benefit from a basic understanding of primary Freud (Interpretation of Dreams, Ego and Id, Basic Introduction, Civilization and its Discontents). The key to Freud is his dogged insistence on the importance of non-empirical structures (metaphor, analogy) to human thought. My personal belief is that these are incidental artifacts of the development of lang...
This is very similar to the Lifespan Integration Therapy which I had in April 2020. The logic of this therapy is to connect you with your memories and dissolve the past traumas. I think I greatly benefited from it because I have stopped being afraid of certain moments of my life associated with having depression.
In general, I am reading this sequence because one of my dreams is to understand what consciousness and enlightenment are. There are few gears in my current models of these phenomena.
A psychologist told me that the newer "version" of this is Coherence Therapy. I've only just started to read up on this.
I've gotten enormous benefit just from being aware of the my "parts" without even distinguishing b/t what role they play. Just realizing that what they aren't having the effect they THINK they are.
Introduction
Internal Family Systems (IFS) is a psychotherapy school/technique/model which lends itself particularly well for being used alone or with a peer. For years, I had noticed that many of the kinds of people who put in a lot of work into developing their emotional and communication skills, some within the rationalist community and some outside it, kept mentioning IFS.
So I looked at the Wikipedia page about the IFS model, and bounced off, since it sounded like nonsense to me. Then someone brought it up again, and I thought that maybe I should reconsider. So I looked at the WP page again, thought “nah, still nonsense”, and continued to ignore it.
This continued until I participated in CFAR mentorship training last September, and we had a class on CFAR’s Internal Double Crux (IDC) technique. IDC clicked really well for me, so I started using it a lot and also facilitating it to some friends. However, once we started using it on more emotional issues (as opposed to just things with empirical facts pointing in different directions), we started running into some weird things, which it felt like IDC couldn’t quite handle… things which reminded me of how people had been describing IFS. So I finally read up on it, and have been successfully applying it ever since.
In this post, I’ll try to describe and motivate IFS in terms which are less likely to give people in this audience the same kind of a “no, that’s nonsense” reaction as I initially had.
Epistemic status
This post is intended to give an argument for why something like the IFS model could be true and a thing that works. It’s not really an argument that IFS is correct. My reason for thinking in terms of IFS is simply that I was initially super-skeptical of it (more on the reasons of my skepticism later), but then started encountering things which it turned out IFS predicted - and I only found out about IFS predicting those things after I familiarized myself with it.
Additionally, I now feel that IFS gives me significantly more gears for understanding the behavior of both other people and myself, and it has been significantly transformative in addressing my own emotional issues. Several other people who I know report it having been similarly powerful for them. On the other hand, aside for a few isolated papers with titles like “proof-of-concept” or “pilot study”, there seems to be conspicuously little peer-reviewed evidence in favor of IFS, meaning that we should probably exercise some caution.
I think that, even if not completely correct, IFS is currently the best model that I have for explaining the observations that it’s pointing at. I encourage you to read this post in the style of learning soft skills - trying on this perspective, and seeing if there’s anything in the description which feels like it resonates with your experiences.
But before we talk about IFS, let’s first talk about building robots. It turns out that if we put together some existing ideas from machine learning and neuroscience, we can end up with a robot design that pretty closely resembles IFS’s model of the human mind.
What follows is an intentionally simplified story, which is simpler than either the full IFS model or a full account that would incorporate everything that I know about human brains. Its intent is to demonstrate that an agent architecture with IFS-style subagents might easily emerge from basic machine learning principles, without claiming that all the details of that toy model would exactly match human brains. A discussion of what exactly IFS does claim in the context of human brains follows after the robot story.
Wanted: a robot which avoids catastrophes
Suppose that we’re building a robot that we want to be generally intelligent. The hot thing these days seems to be deep reinforcement learning, so we decide to use that. The robot will explore its environment, try out various things, and gradually develop habits and preferences as it accumulates experience. (Just like those human babies.)
Now, there are some problems we need to address. For one, deep reinforcement learning works fine in simulated environments where you’re safe to explore for an indefinite duration. However, it runs into problems if the robot is supposed to learn in a real life environment. Some actions which the robot might take will result in catastrophic consequences, such as it being damaged. If the robot is just doing things at random, it might end up damaging itself. Even worse, if the robot does something which could have been catastrophic but narrowly avoids harm, it might then forget about it and end up doing the same thing again!
How could we deal with this? Well, let’s look at the existing literature. Lipton et al. (2016) proposed what seems like a promising idea for addressing the part about forgetting. Their approach is to explicitly maintain a memory of danger states - situations which are not the catastrophic outcome itself, but from which the learner has previously ended up in a catastrophe. For instance, if “being burned by a hot stove” is a catastrophe, then “being about to poke your finger in the stove” is a danger state. Depending on how cautious we want to be and how many preceding states we want to include in our list of danger states, “going near the stove” and “seeing the stove” can also be danger states, though then we might end up with a seriously stove-phobic robot.
In any case, we maintain a separate storage of danger states, in such a way that the learner never forgets about them. We use this storage of danger states to train a fear model: a model which is trying to predict the probability of ending up in a catastrophe from some given novel situation. For example, maybe our robot poked its robot finger at the stove in our kitchen, but poking its robot finger at stoves in other kitchens might be dangerous too. So we want the fear model to generalize from our stove to other stoves. On the other hand, we don’t want it to be stove-phobic and run away at the mere sight of a stove. The task of our fear model is to predict exactly how likely it is for the robot to end up in a catastrophe, given some situation it is in, and then make it increasingly disinclined to end up in the kinds of situations which might lead to a catastrophe.
This sounds nice in theory. On the other hand, Lipton et al. are still assuming that they can train their learner in a simulated environment, and that they can label catastrophic states ahead of time. We don’t know in advance every possible catastrophe our robot might end up in - it might walk off a cliff, shoot itself in the foot with a laser gun, be beaten up by activists protesting technological unemployment, or any number of other possibilities.
So let’s take inspiration from humans. We can’t know beforehand every bad thing that might happen to our robot, but we can identify some classes of things which are correlated with catastrophe. For instance, being beaten or shooting itself in the foot will cause physical damage, so we can install sensors which indicate when the robot has taken physical damage. If these sensors - let’s call them “pain” sensors - register a high amount of damage, we consider the situation to have been catastrophic. When they do, we save that situation and the situations preceding it to our list of dangerous situations. Assuming that our robot has managed to make it out of that situation intact and can do anything in the first place, we use that list of dangerous situations to train up a fear model.
At this point, we notice that this is starting to remind us about our experience with humans. For example, the infamous Little Albert experiment. A human baby was allowed to play with a laboratory rat, but each time that he saw the rat, a researcher made a loud scary sound behind his back. Soon Albert started getting scared whenever he saw the rat - and then he got scared of furry things in general.
Something like Albert’s behavior could be implemented very simply using something like Hebbian conditioning to get a learning algorithm which picks up on some features of the situation, and then triggers a panic reaction whenever it re-encounters those same features. For instance, it registers that the sight of fur and loud sounds tend to coincide, and then it triggers a fear reaction whenever it sees fur. This would be a basic fear model, and a “danger state” would be “seeing fur”.
Wanting to keep things simple, we decide to use this kind of an approach as the fear model of our robot. Also, having read Consciousness and the Brain, we remember a few basic principles about how those human brains work, which we decide to copy because we’re lazy and don’t want to come up with entirely new principles:
So here is our design:
So if the robot sees things which remind it of poking at hot stove, it will be inclined to go somewhere else; if it imagines doing something which would cause it to poke at the hot stove, then it will be inclined to imagine doing something else.
Introducing managers
But is this actually enough? We've now basically set up an algorithm which warns the robot when it sees things which have previously preceded a bad outcome. This might be enough for dealing with static tasks, such as not burning yourself at a stove. But it seems insufficient for dealing with things like predators or technological unemployment protesters, who might show up in a wide variety of places and actively try to hunt you down. By the time you see a sign of them, you're already in danger. It would be better if we could learn to avoid them entirely, so that the fear model would never even be triggered.
As we ponder this dilemma, we surf the web and run across this blog post summarizing Saunders, Sastry, Stuhlmüller & Evans (2017). They are also concerned with preventing reinforcement learning agents from running into catastrophes, but have a somewhat different approach. In their approach, a reinforcement learner is allowed to do different kinds of things, which a human overseer then allows or blocks. A separate “blocker” model is trained to predict which actions the human overseer would block. In the future, if the robot would ever take an action which the “blocker” predicts the human overseer would disallow, it will block that action. In effect, the system consists of two separate subagents, one subagent trying to maximize rewards and the other subagent trying to block non-approved actions.
Since our robot has a nice modular architecture into which we can add various subagents which are listening in and taking actions, we decide to take inspiration from this idea. We create a system for spawning dedicated subprograms which try to predict and and block actions which would cause the fear model to be triggered. In theory, this is unnecessary: given enough time, even standard reinforcement learning should learn to avoid the situations which trigger the fear model. But again, trial-and-error can take a very long time to learn exactly which situations trigger fear, so we dedicate a separate subprogram to the task of pre-emptively figuring it out.
Each fear model is paired with a subagent that we’ll call a manager. While the fear model has associated a bunch of cues with the notion of an impending catastrophe, the manager learns to predict which situations would cause the fear model to trigger. Despite sounding similar, these are not the same thing: one indicates when you are already in danger, the other is trying to figure out what you can do to never end up in danger in the first place. A fear model might learn to recognize signs which technological unemployment protesters commonly wear. Whereas a manager might learn the kinds of environments where the fear model has noticed protesters before: for instance, near the protester HQ.
Then, if a manager predicts that a given action (such as going to the protester HQ) would eventually trigger the fear model, it will block that action and promote some other action. We can use the interaction of these subsystems to try to ensure that the robot only feels fear in situations which already resemble the catastrophic situation so much as to actually be dangerous. At the same time, the robot will be unafraid to take safe actions in situations from which it could end up in a danger zone, but are themselves safe to be in.
As an added benefit, we can recycle the manager component to also do the same thing as the blocker component in the Saunders et al. paper originally did. That is, if the robot has a human overseer telling it in strict terms not to do some things, it can create a manager subprogram which models that overseer and likewise blocks the robot from doing things which the model predicts that the overseer would disapprove of.
Putting together a toy model
If the robot does end up in a situation where the fear model is sounding an alarm, then we want to get it out of the situation as quickly as possible. It may be worth spawning a specialized subroutine just for this purpose. Technological unemployment activists could, among other things, use flamethrowers that set the robot on fire. So let’s call these types of subprograms dedicated to escaping from the danger zone, firefighters.
So how does the system as a whole work? First, the different subagents act by sending into the consciousness workspace various mental objects, such as an emotion of fear, or an intent to e.g. make breakfast. If several subagents are submitting identical mental objects, we say that they are voting for the same object. On each time-step, one of the submitted objects is chosen at random to become the contents of the workspace, with each object having a chance to be selected that’s proportional to its number of votes. If a mental object describing a physical action (an “intention”) ends up in the workspace and stays chosen for several time-steps, then that action gets executed by a motor subsystem.
Depending on the situation, some subagents will have more votes than others. E.g. a fear model submitting a fear object gets a number of votes proportional to how strongly it is activated. Besides the specialized subagents we’ve discussed, there’s also a default planning subagent, which is just taking whatever actions (that is, sending to the workspace whatever mental objects) it thinks will produce the greatest reward. This subagent only has a small number of votes.
Finally, there’s a self-narrative agent which is constructing a narrative of the robot’s actions as if it was a unified agent, for social purposes and for doing reasoning afterwards. After the motor system has taken an action, the self-narrative agent records this as something like “I, Robby the Robot, made breakfast by cooking eggs and bacon”, transmitting this statement to the workspace and saving it to an episodic memory store for future reference.
Consequences of the model
Is this design any good? Let’s consider a few of its implications.
First, in order for the robot to take physical actions, the intent to do so has to be in its consciousness for a long enough time for the action to be taken. If there are any subagents that wish to prevent this from happening, they must muster enough votes to bring into consciousness some other mental object replacing that intention before it’s been around for enough time-steps to be executed by the motor system. (This is analogous to the concept of the final veto in humans, where consciousness is the last place to block pre-consciously initiated actions before they are taken.)
Second, the different subagents do not see each other directly: they only see the consequences of each other’s actions, as that’s what’s reflected in the contents of the workspace. In particular, the self-narrative agent has no access to information about which subagents were responsible for generating which physical action. It only sees the intentions which preceded the various actions, and the actions themselves. Thus it might easily end up constructing a narrative which creates the internal appearance of a single agent, even though the system is actually composed of multiple subagents.
Third, even if the subagents can’t directly see each other, they might still end up forming alliances. For example, if the robot is standing near the stove, a curiosity-driven subagent might propose poking at the stove (“I want to see if this causes us to burn ourselves again!”), while the default planning system might propose cooking dinner, since that’s what it predicts will please the human owner. Now, a manager trying to prevent a fear model agent from being activated, will eventually learn that if it votes for the default planning system’s intentions to cook dinner (which it saw earlier), then the curiosity-driven agent is less likely to get its intentions into consciousness. Thus, no poking at the stove, and the manager’s and the default planning system’s goals end up aligned.
Fourth, this design can make it really difficult for the robot to even become aware of the existence of some managers. A manager may learn to support any other mental processes which block the robot from taking specific actions. It does it by voting in favor of mental objects which orient behavior towards anything else. This might manifest as something subtle, such as a mysterious lack of interest towards something that sounds like a good idea in principle, or just repeatedly forgetting to do something, as the robot always seems to get distracted by something else. The self-narrative agent, not having any idea of what’s going on, might just explain this as “Robby the Robot is forgetful sometimes” in its internal narrative.
Fifth, the default planning subagent here is doing something like rational planning, but given its weak voting power, it’s likely to be overruled if other subagents disagree with it (unless some subagents also agree with it). If some actions seem worth doing, but there are managers which are blocking it and the default planning subagent doesn’t have an explicit representation of them, this can manifest as all kinds of procrastinating behaviors and numerous failed attempts for the default planning system to “try to get itself to do something”, using various strategies. But as long as the managers keep blocking those actions, the system is likely to remain stuck.
Sixth, the purpose of both managers and firefighters is to keep the robot out of a situation that has been previously designated as dangerous. Managers do this by trying to pre-emptively block actions that would cause the fear model agent to activate; firefighters do this by trying to take actions which shut down the fear model agent after it has activated. But the fear model agent activating is not actually the same thing as being in a dangerous situation. Thus, both managers and firefighters may fall victim to Goodhart’s law, doing things which block the fear model while being irrelevant for escaping catastrophic situations.
For example, “thinking about the consequences of going to the activist HQ” is something that might activate the fear model agent, so a manager might try to block just thinking about it. This has obvious consequence that the robot can’t think clearly about that issue. Similarly, once the fear model has already activated, a firefighter might Goodhart by supporting any action which helps activate an agent with a lot of voting power that’s going to think about something entirely different. This could result in compulsive behaviors which were effective at pushing the fear aside, but useless for achieving any of the robot’s actual aims.
At worst, this could cause loops of mutually activating subagents pushing in opposite directions. First, a stove-phobic robot runs away from the stove as it was about to make breakfast. Then a firefighter trying to suppress that fear, causes the robot to get stuck looking at pictures of beautiful naked robots, which is engrossing and thus great for removing the fear of the stove. Then another fear model starts to activate, this one afraid of failure and of spending so much time looking at pictures of beautiful naked robots that the robot won’t accomplish its goal of making breakfast. A separate firefighter associated with this second fear model has learned that focusing the robot’s attention on the pictures of beautiful naked robots even more is the most effective action for keeping this new fear temporarily subdued. So the two firefighters are allied and temporarily successful at their goal, but then the first one - seeing that the original stove fear has disappeared - turns off. Without the first firefighter’s votes supporting the second firefighter, the fear manages to overwhelm the second firefighter, causing the robot to rush into making breakfast. This again activates its fear of the stove, but if the fear of failure remains strong enough, it might overpower its fear of the stove so that the robot manages to make breakfast in time...
Hmm. Maybe this design isn’t so great after all. Good thing we noticed these failure modes, so that there aren’t any mind architectures like this going around being vulnerable to them!
The Internal Family Systems model
But enough hypothetical robot design; let’s get to the topic of IFS. The IFS model hypothesizes the existence of three kinds of “extreme parts” in the human mind:
Exiles are not limited to being created from the kinds of situations that we would commonly consider seriously traumatic. They can also be created from things like relatively minor childhood upsets, as long as the child didn’t feel like they could handle the situation.
IFS further claims that you can treat these parts as something like independent subpersonalities. You can communicate with them, consider their worries, and gradually persuade managers and firefighters to give you access to the exiles that have been kept away from consciousness. When you do this, you can show them that you are no longer in the situation which was catastrophic before, and now have the resources to handle it if something similar was to happen again. This heals the exile, and also lets the managers and firefighters assume better, healthier roles.
As I mentioned in the beginning, when I first heard about IFS, I was turned off by it for several different reasons. For instance, here were some of my thoughts at the time:
Hopefully, I’ve already answered my past self’s concerns about the first point. The model itself talks in terms of managers protecting the mind from pain, exiles being exiled from consciousness in order for their pain to remain suppressed, etc. Which is a reasonable description of the subjective experience of what happens. But the evolutionary logic - as far as I can guess - is slightly different: to keep us out of dangerous situations.
The story of the robot describes the actual “design rationale”. Exiles are in fact subagents which are “frozen in the time of a traumatic event”, but they didn’t split off to protect the rest of the mind from damage. Rather, they were created as an isolated memory block to ensure that the memory of the event wouldn’t be forgotten. Managers then exist to keep the person away from such catastrophic situations, and firefighters exist to help escape them. Unfortunately, this setup is vulnerable to various failure modes, similar to those that the robot is vulnerable to.
With that said, let’s tackle the remaining problems that I had with IFS.
Personalized characters
IFS suggests that you can experience the exiles, managers and firefighters in your mind as something akin to subpersonalities - entities with their own names, visual appearances, preferences, beliefs, and so on. Furthermore, this isn’t inherently dysfunctional, nor indicative of something like Dissociative Identity Disorder. Rather, even people who are entirely healthy and normal may experience this kind of “multiplicity”.
Now, it’s important to note right off that not everyone has this to a major extent: you don’t need to experience multiplicity in order for the IFS process to work. For instance, my parts feel more like bodily sensations and shards of desire than subpersonalities, but IFS still works super-well for me.
In the book Internal Family Systems Therapy, Richard Schwartz, the developer of IFS, notes that if a person’s subagents play well together, then that person is likely to feel mostly internally unified. On the other hand, if a person has lots of internal conflict, then they are more likely to experience themselves as having multiple parts with conflicting desires.
I think that this makes a lot of sense, assuming the existence of something like a self-narrative subagent. If you remember, this is the part of the mind which looks at the actions that the mind-system has taken, and then constructs an explanation for why those actions were taken. (See e.g. the posts on the limits of introspection and on the Apologist and the Revolutionary for previous evidence for the existence of such a confabulating subagent with limited access to our true motivations.) As long as all the exiles, managers and firefighters are functioning in a unified fashion, the most parsimonious model that the self-narrative subagent might construct is simply that of a unified self. But if the system keeps being driven into strongly conflicting behaviors, then it can’t necessarily make sense of them from a single-agent perspective. Then it might naturally settle on something like a multiagent approach and experience itself as being split into parts.
Kevin Simler, in Neurons Gone Wild, notes how people with strong addictions seem particularly prone to developing multi-agent narratives:
This doesn’t seem like it explains all of it, though. I’ve frequently been very dysfunctional, and have always found very intuitive the notion of the mind being split into very parts. Yet I mostly still don’t seem to experience my subagents anywhere near as person-like as some others clearly do. I know at least one person who ended up finding IFS because of having all of these talking characters in their head, and who was looking for something that would help them make sense of it. Nothing like that has ever been the case for me: I did experience strongly conflicting desires, but they were just that, strongly conflicting desires.
I can only surmise that it has something to do with the same kinds of differences which cause some people to think mainly verbally, others mainly visually, and others yet in some other hard-to-describe modality. Some fiction writers spontaneously experience their characters as real people who speak to them and will even bother the writer when at the supermarket, and some others don’t.
It’s been noted that the mechanisms which use to model ourselves and other people overlap - not very surprisingly, since both we and other people are (presumably) humans. So it seems reasonable that some of the mechanisms for representing other people, would sometimes also end up spontaneously recruited for representing internal subagents or coalitions of them.
Why should this technique be useful for psychological healing?
Okay, suppose it’s possible to access our subagents somehow. Why would just talking with these entities in your own head, help you fix psychological issues?
Let’s consider that a person having exiles, managers and firefighters is costly in the sense of constraining that person’s options. If you never want to do anything that would cause you to see a stove, that limits quite a bit of what you can do. I strongly suspect that many forms of procrastination and failure to do things we’d like to do are mostly a manifestation of overactive managers. So it’s important not to create those kinds of entities unless the situation really is one which should be designated as categorically unacceptable to end up in.
The theory for IFS mentions that not all painful situations turn into trauma: just ones in which we felt helpless and like we didn’t have the necessary resources for dealing with it. This makes sense, since if we were capable of dealing with it, then the situation can’t have been that catastrophic. The aftermath of the immediate event is important as well: a child who ends up in a painful situation doesn’t necessarily end up traumatized, if they have an adult who can put the event in a reassuring context afterwards.
But situations which used to be catastrophic and impossible for us to handle before, aren’t necessarily that any more. It seems important to have a mechanism for updating that cache of catastrophic events and for disassembling the protections around it, if the protections turn out to be unnecessary.
How does that process usually happen, without IFS or any other specialized form of therapy?
Often, by talking about your experiences with someone you trust. Or writing about them in private or in a blog.
In my post about Consciousness and the Brain, I mentioned that once a mental object becomes conscious, many different brain systems synchronize their processing around it. I suspect that the reason why many people have such a powerful urge to discuss their traumatic experiences with someone else, is that doing so is a way of bringing those memories into consciousness in detail. And once you’ve dug up your traumatic memories from their cache, their content can be re-processed and re-evaluated. If your brain judges that you now do have the resources to handle that event if you ever end up in it again, or if it’s something that simply can’t happen anymore, then the memory can be removed from the cache and you no longer need to avoid it.
I think it’s also significant that, while something like just writing about a traumatic event is sometimes enough to heal, often it’s more effective if you have a sympathetic listener who you trust. Traumas often involve some amount of shame: maybe you were called lazy as a kid and are still afraid of others thinking that you are lazy. Here, having friends who accept you and are willing to nonjudgmentally listen while you talk about your issues, is by itself an indication that the thing that you used to be afraid of isn’t a danger anymore: there exist people who will stay by your side despite knowing your secret.
Now, when you are talking to a friend about your traumatic memory, you will be going through cached memories that have been stored in an exile subagent. A specific memory circuit - one of several circuits specialized for the act of holding painful memories - is active and outputting its contents into the global workspace, from which they are being turned into words.
Meaning that, in a sense, your friend is talking directly to your exile.
Could you hack this process, so that you wouldn’t even need a friend, and could carry this process out entirely internally?
In my earlier post, I remarked that you could view language as a way of joining two people’s brains together. A subagent in your brain outputs something that appears in your consciousness, you communicate it to a friend, it appears in their consciousness, subagents in your friend’s brain manipulate the information somehow, and then they send it back to your consciousness.
If you are telling your friend about your trauma, you are in a sense joining your workspaces together, and letting some subagents in your workspace, communicate with the “sympathetic listener” subagents in your friend’s workspace.
So why not let a “sympathetic listener” subagent in your workspace, hook up directly with the traumatized subagents that are also in your own workspace?
I think that something like this happens when you do IFS. You are using a technique designed to activate the relevant subagents in a very specific way, which allows for this kind of a “hooking up” without needing another person.
For instance, suppose that you are talking to a manager subagent which wants to hide the fact that you’re bad at something, and starts reacting defensively whenever the topic is brought up. Now, one way by which its activation could manifest, is feeding those defensive thoughts and reactions directly into your workspace. In such a case, you would experience them as your own thoughts, and possibly as objectively real. IFS calls this “blending”; I’ve also previously used the term “cognitive fusion” for what’s essentially the same thing.
Instead of remaining blended, you then use various unblending / cognitive defusion techniques that highlight the way by which these thoughts and emotions are coming from a specific part of your mind. You could think of this as wrapping extra content around the thoughts and emotions, and then seeing them through the wrapper (which is obviously not-you), rather than experiencing the thoughts and emotions directly (which you might experience as your own). For example, the IFS book Self-Therapy suggests this unblending technique (among others):
I think of this as something like, you are taking the subagent in question, routing its responses through a visualization subsystem, and then you see a talking fox or whatever. And this is then a representation that your internal subsystems for talking with other people can respond to. You can then have a dialogue with the part (verbally or otherwise) in a way where its responses are clearly labeled as coming from it, rather than being mixed together with all the other thoughts in the workspace. This lets the content coming from the sympathetic-listener subagent and the exile/manager/firefighter subagent be kept clearly apart, allowing you to consider the emotional content as you would as an external listener, preventing you from drowning in it. You’re hacking your brain so as to work as the therapist and client as the same time.
The Self
IFS claims that, below all the various parts and subagents, there exists a “true self” which you can learn to access. When you are in this Self, you exhibit the qualities of “calmness, curiosity, clarity, compassion, confidence, creativity, courage, and connectedness”. Being at least partially in Self is said to be a prerequisite for working with your parts: if you are not, then you are not able to evaluate their models objectively. The parts will sense this, and as a result, they will not share their models properly, preventing the kind of global re-evaluation of their contents that would update them.
This was the part that I was initially the most skeptical of, and which made me most frequently decide that IFS was not worth looking at. I could easily conceptualize the mind as being made up of various subagents. But then it would just be numerous subagents all the way down, without any single one that could be designated the “true” self.
But let’s look at IFS’s description of how exactly to get into Self. You check whether you seem to be blended with any part. If you are, you unblend with it. Then you check whether you might also be blended with some other part. If you are, you unblend from it also. You then keep doing this until you can find no part that you might be blended with. All that’s left are those “eight Cs”, which just seem to be a kind of a global state, with no particular part that they would be coming from.
I now think that “being in Self” represents a state where there no particular subagent is getting a disproportionate share of voting power, and everything is processed by the system as a whole. Remember that in the robot story, catastrophic states were situations in which the organism should never end up. A subagent kicking in to prevent that from happening is a kind of a priority override to normal thinking. It blocks you from being open and calm and curious because some subagent thinks that doing so would be dangerous. If you then turn off or suspend all those priority overrides, then the mind’s default state absent any override seems to be one with the qualities of the Self.
This actually fits at least one model of the function of positive emotions pretty well. Fredrickson (1998) suggests that an important function of positive emotions is to make us engage in activities such as play, exploration, and savoring the company of other people. Doing these things has the effect of building up skills, knowledge, social connections, and other kinds of resources which might be useful for us in the future. If there are no active ongoing threats, then that implies that the situation is pretty safe for the time being, making it reasonable to revert to a positive state of being open to exploration.
The Internal Family Systems Therapy book makes a somewhat big deal out of the fact that everyone, even most traumatized people, ultimately has a Self which they can access. It explains this in terms of the mind being organized to protect against damage, and with parts always splitting off from the Self when it would otherwise be damaged. I think the real explanation is much simpler: the mind is not accumulating damage, it is just accumulating a longer and longer list of situations not considered safe.
As an aside, this model feels like it makes me less confused about confidence. It seems like people are really attracted to confident people, and that to some extent it’s also possible to fake confidence until it becomes genuine. But if confidence is so attractive and we can fake it, why hasn’t evolution just made everyone confident by default?
Turns out that it has. The reason why faked confidence gradually turns into genuine confidence is that by forcing yourself to act in confident ways which felt dangerous before, your mind gets information indicating that this behavior is not as dangerous as you originally thought. That gradually turns off those priority overrides that kept you out of Self originally, until you get there naturally.
The reason why being in Self is a requirement for doing IFS, is the existence of conflicts between parts. For instance, recall the stove-phobic robot having a firefighter subagent that caused it to retreat from the stove into watching pictures of beautiful naked robots. This triggered a subagent which was afraid of the naked-robot-watching preventing the robot from achieving its goals. If the robot now tried to do IFS and talk with the firefighter subagent that caused it to run away from stoves, this might bring to mind content which activated the exile that was afraid of not achieving things. Then that exile would keep flooding the mind with negative memories, trying to achieve its priority override of “we need to get out of this situation”, and preventing the process from proceeding. Thus, all of the subagents that have strong opinions about the situation need to be unblended from, before integration can proceed.
IFS also has a separate concept of “Self-Leadership”. This is a process where various subagents eventually come to trust the Self, so that they allow the person to increasingly remain in Self even in various emergencies. IFS views this as a positive development, not only because it feels nice, but because doing so means that the person will have more cognitive resources available for actually dealing with the emergency in question.
I think that this ties back to the original notion of subagents being generated to invoke priority overrides for situations which the person originally didn’t have the resources to handle. Many of the subagents IFS talks about seem to emerge from childhood experiences. A child has many fewer cognitive, social, and emotional resources for dealing with bad situations, in which case it makes sense to just categorically avoid them, and invoke special overrides to ensure that this happens. A child’s cognitive capacities, models of the world, and abilities to self-regulate are also less developed, so she may have a harder time staying out of dangerous situations without having some priority overrides built in. An adult, however, typically has many more resources than a child does. Even when faced with an emergency situation, it can be much better to be able to remain calm and analyze the situation using all of one’s subagents, rather than having a few of them take over all the decision-making. Thus, it seems to me - both theoretically and practically - that developing Self-Leadership is really valuable.
That said, I do not wish to imply that it would be a good goal to never have negative emotions. Sometimes blending with a subagent, and experiencing resulting negative emotions, is the right thing to do in that situation. Rather than suppressing negative emotions entirely, Self-Leadership aims to get to a state where any emotional reaction tends to be endorsed by the mind-system as a whole. Thus, if feeling angry or sad or bitter or whatever feels appropriate to the situation, you can let yourself feel so, and then give yourself to that emotion without resisting it. As a result, negative emotions become less unpleasant to experience, since there are fewer subagents trying to fight against them. Also, if it turns out that being in a negative emotional state is no longer useful, the system as a whole can just choose to move back into Self.
Final words
I’ve now given a brief summary of the IFS model, and explained why I think it makes sense. This is of course not enough to establish the model as true. But it might help in making the model plausible enough to at least try out.
I think that most people could benefit from learning and doing IFS on themselves, either alone or together with a friend. I’ve been saying that exiles/managers/firefighters tend to be generated from trauma, but it’s important to realize that these events don’t need to be anything immensely traumatic. The kinds of ordinary, normal childhood upsets that everyone has had can generate these kinds of subagents. Remember, just because you think of a childhood event as trivial now, doesn’t mean that it felt trivial to you as a child. Doing IFS work, I’ve found exiles related to memories and events which I thought left no negative traces, but actually did.
Remember also that it can be really hard to notice the presence of some managers: if they are doing their job effectively, then you might never become aware of them directly. “I don’t have any trauma so I wouldn’t benefit from doing IFS” isn’t necessarily correct. Rather, the cues that I use for detecting a need to do internal work are:
If not, there is often some internal conflict which needs to be addressed - and IFS, combined with some other practices such as Focusing and meditation - has been very useful in learning to solve those internal conflicts.
Even if you don’t feel convinced that doing IFS personally would be a good idea, I think adopting its framework of exiles, managers and firefighters is useful for better understanding the behavior of other people. Their dynamics will be easier to recognize in other people if you’ve had some experience recognizing them in yourself, however.
If you want to learn more about IFS, I would recommend starting with Self-Therapy by Jay Earley. In terms of What/How/Why books, my current suggestions would be:
This post was written as part of research supported by the Foundational Research Institute. Thank you to everyone who provided feedback on earlier drafts of this article: Eli Tyre, Elizabeth Van Nostrand, Jan Kulveit, Juha Törmänen, Lumi Pakkanen, Maija Haavisto, Marcello Herreshoff, Qiaochu Yuan, and Steve Omohundro.