My essay continues the line of reasoning from Yudkowsky’s article "The Mind Projection Fallacy" and attempts to expand on it by providing more gears to explain how the fallacy operates. When I understood the mechanism he described (not on the first try), it was a huge leap. Here, I want to present my perspective on how the mind projection fallacy influences people's motivations, distorting their mental maps.
I went through a phase of analyzing the shifts in my motivations during reading and conversations, noting the specific phrases that triggered spikes in motivation. This included while reading Yudkowsky’s articles. I noticed a pattern that repeated too frequently to ignore—spikes in motivation when encountering words like "useful," "wrong," "right," "good," "bad," "should," "important," and others that from the inside seemed to me like 1-place functions, not 2-place functions.
Here’s how I understood Yudkowsky’s "mind projection fallacy":
What is the mind projection fallacy? It’s a cognitive bias that occurs all around us. "This thing is good, you’re a sweetheart, you’re a bad person, this movie is terrible, you’re a monster, the woman is sexy, this is harmful, this is useful, this is important, this is unimportant, this is right"—all of these phrases may or may not involve the mind projection fallacy. How can we tell if a phrase falls into the pattern indicated by the label?
The mind projection fallacy is the perception of one’s feelings about an object not as personal feelings about the object but as intrinsic properties of the object, inherently tied to it and independent of the observer.
You look at a cute cat, and it seems to you that the cat possesses qualities like “cute,” “beautiful,” and “pleasant.” You experience certain sensations, which you verbalize as “pleasant.” From the inside, it feels as though the cat possesses some quality like "beautiful" and "pleasant," similar to how the sky possesses the quality of "blue," or a stone's surface has the "property" of smoothness.
Let me define “property” here as the stable patterns of behavior we observe in an object.
You see the sky as blue, so it seems like the sky has a certain "blue" property, directly "attached" to the sky, and it seems you can verify it somehow. The smoothness of a stone can be verified in certain ways as well. And so, a habit forms: if you notice a certain pattern in a piece of reality, and this pattern is confirmed when other people see it the same way—you form a habit of generalizing this pattern beyond the specific instance where you observed it. For example, if you see that the grass is green in several parts of the forest, you’ll generalize the expectation that it’ll be roughly the same color in the unexplored parts of the forest as well, and you won’t be wrong.
The mind projection fallacy arises as a result of this habit.
The explanation might be difficult to grasp at first, so I'll offer an example you can always refer to in order to catch the intuition.
This example comes from Yudkowsky’s article "The Mind Projection Fallacy". Yudkowsky borrowed the term from the mathematician E. T. Jaynes, although Jaynes originally used it to describe a mistake in probability interpretation.
In the early days of science fiction, alien invaders might occasionally abduct a girl in a torn dress and drag her away with the intent to assault her, as depicted on many old magazine covers. Strangely, aliens never seemed to go after men in torn shirts.
Sometimes people assume that all minds work similarly because they lack access to how an alien experiences the world from within. You simply don’t know how to model it, so there’s a temptation to take the easy path and model it based on your own experience. I feel this way—so others likely feel the same.
From the inside, it might seem that sexuality is an inherent, direct attribute of the woman, rather than a term the alien uses to describe its own feelings when looking at the woman. The woman is attractive, so the alien will see the attribute "sexuality" and feel drawn to her—logical, right?
Now imagine that capybaras suddenly became sentient, learned human language, and started claiming that female capybaras are universally sexy, possessing the inherent property of “sexuality” that every intelligent species must notice. Humans, whose brains weren’t wired to experience certain feelings, called sexuality, when looking at female capybaras, would start arguing with the capybaras, saying that this "sexuality" doesn’t exist. They’d argue that the capybaras are mistaking their own feelings for an objective property of the capybaras.
Even a child could see this mistake if a capybara started pushing this idea.
But when people talk among themselves and say, "this building is beautiful," for some reason the analogy with the capybaras becomes less obvious. If in the case of the capybaras, you don’t see this property of “sexuality” in the female capybara from any angle, then when a person points out the “beautiful” quality of a building, you might—if you try hard—be able to imagine how this supposedly existing property of “beauty” feels from the inside. And since the desire to argue depends on whether or not you feel this supposed property inside you and how strongly, if you do feel it, you might even accept “the building is beautiful” as a true statement.
How do false properties (false patterns that create false predictions) emerge? Let’s say your friend guessed what you were doing last night based on a couple of sentences. And since you think it takes a “genius” to guess so precisely, you attach the property “genius” to your friend.
From the inside, this is experienced as, “My friend has some kind of powerful and impressive trait, he knows something smart, and it’s too much effort to understand, so I’ll bow to the mystery and will experience feelings of a fan over this vague trait of genius, which I generalized from one instance. And to save cognitive resources, I’ll lump all my intuitions into the word ‘genius.’” Now, your friend possesses the false property of “genius,” which generates the false prediction that this property will manifest similarly in the future—for example, that your friend will continue to guess what you did with amazing accuracy. But you forgot to turn off your microphone, and your friend overheard, which is why they knew. They don’t have the "genius" property, but you can project it onto them. And then feel disappointment and confusion when your friend makes a dumb mistake. How could it happen? Weren’t they a “genius”?
Similarly, you can project the false property of “good,” “harmful,” or “disgusting” onto someone or something. And forget that you were using those words to describe your own feelings. After all, if your friend possesses the property of “genius,” what does it matter what you feel? This property is internally experienced as objective—it’s a pattern.
Given that people, when they speak, usually optimize their words to evoke specific associations and feelings in their listeners, meaning they generally avoid using unfamiliar language or words you won’t clearly understand, it follows that the mind projection fallacy flourishes everywhere.
If your beloved wife or husband says, “the building is beautiful,” and you look at the building and experience feelings that you could describe with the word "beautiful," why waste extra cognitive resources adding a layer of indirection like, “I’m labeling my feelings toward this building with the word ‘beautiful’”? People usually don’t speak like this, and you won’t automatically add this layer of indirection (the indication of how many perceptions the information is filtered through) to the words of people you trust. When people quickly name an object or strategy as good, there’s a temptation to simply activate your familiar feelings for the word “good,” specifically—calm, loyalty, a sense of value, and stress over the fear of losing it.
If an alien crystal says that this purple crystal is sexy, but you don’t feel any sexuality toward that crystal, why would you attempt to activate the same feelings of sexuality you’re familiar with for this crystal? The crystal doesn’t have the property of "sexuality," so you won’t even try to mirror those feelings to avoid arguing with the alien. Or will you? Do people often train their brains to be aroused by crystals? It seems unlikely.
But if another person, someone similar to you, says that a girl is attractive, your brain is already wired such that you labeled specific feelings toward some girls with the word "attractive." You try to activate those feelings, and it works. And to avoid arguing with this person, it’s easier for you to agree with the misconception that the girl has the property of attractiveness, attached solely to her (a 1-place word), rather than to her plus your perception (a 2-place word).
If people chatter away, spewing dozens of words per minute, with a projection fallacy in every sentence, then to avoid cognitive overload from adding one layer of indirection after another 20 times per minute, you just relax and begin automatically interpreting words as sensations about the object. The mind-projection fallacy has been deeply ingrained in humanity for centuries and shows no signs of fading. If you try to completely rid yourself of it, your speech may begin to sound like mine (I speak the same way I write in this article), which people often describe as "robotic," "weird," "cringe," or "lacking in emotion." Moreover, the mind-projection fallacy is arguably one of the main sources of emotion in human speech. Emotions are often generated by our mental models and expectations. Perceiving the word "cool" with and without the fallacy will evoke different emotional responses. With the fallacy in place, if a friend compliments you by saying, "You’re so cool," you might experience a rush of pleasant emotions because it exceeds your expectations in a favorable way. It feels like your friend is testifying to a quality within you — "coolness" — that exists independently of their perception, much like how a woman might be said to possess the "objective" trait of "sexiness," only in this case, you are the one being admired.
If you associate the word "cool" with certain emotions, like admiration and exceeding expectations, and if you start to internalize this as a constant trait, it may seem like this "built-in quality" of coolness is something that others will notice as well, just like they notice a building’s beauty or a woman’s sexiness. In your day-to-day life, you haven't found much evidence that this supposed trait exists within you, as you walk down the street and people aren’t in awe of you. But if this quality of "coolness" were truly present, they certainly would be.
Then someone tells you, "You’re actually cool." You didn’t think you had the trait, but now someone sees it in you. This feels like confirmation of the hypothesis "I am cool." You fall into the trap of the mind-projection fallacy and deceive yourself, but at the sensory level, you don’t even need to engage any analytical thought to fall for it. And in return for this self-deception, you receive the reward of pleasant feelings from validated expectations.
And since this carousel of feelings provides you with a rich and pleasurable sensory experience, why would you want to give it up? -People haven’t for centuries.
But what’s the alternative? What happens if you try to fully reject the mind-projection fallacy? For six months now, I’ve been rejecting it in my internal judgments, but I still sometimes use it to quickly trigger pleasant emotions in someone else, like when I say, "You’re a cutie." Here, I’m exploiting the mind-projection fallacy for my own purposes, such as reinforcing behavior so the person is more likely to do me another favor or subscribe to a channel for a new hit of oxytocin that I predict they’ll feel after hearing "You’re a cutie."
In theory, rejecting the mind-projection fallacy should completely change your worldview. What you once thought was "objectively good, right, or important" would be replaced with "I feel a certain way about this."
Likewise, what you previously considered "disgusting or bad" would be reframed as "I feel unpleasant about this for some reason." After rejecting the mind-projection fallacy, you would no longer deceive yourself into thinking that a woman possesses an objective sense of "sexiness" that everyone around her will notice.
The process of rejecting this fallacy, if it existed in you — and I expect it inevitably exists in everyone who hasn’t actively fought against it — will inevitably lead to the destruction of expectations formed by standard thought patterns, with corresponding side emotions. If you know that even minor disruptions of your expectations cause emotional drama for you, then rejecting the mind-projection fallacy will lead to incredible, maximum drama and a variety of secondary unpleasant emotions from being unable to comfortably experience the familiar feeling attached to that mental cluster. But your brain will adapt if you do this often enough.
If you want to experiment with rejecting the mind-projection fallacy, my method is the "add a level of indirection" technique (acknowledging the perceiver). Every time you notice yourself making statements like "this thing is good, bad, or whatever," and there’s no indication or intuition that this is just your perception, you add the phrase "I verbalize my feelings toward this thing as..." because that’s what you’re really doing. You have certain feelings toward something, and you’re naming them—there’s no deception in that. But the way you perceive that thing may suddenly change. I’ve tested this on several friends, and they reported that adding a level of indirection altered their perception.
If you used to feel the urge to argue with people who called your favorite movie "garbage," you can just apply the indirection: "This person is verbalizing their feelings toward the movie as ‘garbage,’" and usually, after that, the arguments aimed at proving to them that the movie possesses some inherent "goodness" they’re missing will simply fade, and you’ll lose interest because those debates were based on a false belief—the mind-projection fallacy.
Finally, people are often hooked on familiar emotions toward things, and if those emotions weaken or disappear after rejecting the mind-projection fallacy, it may feel like "life loses its meaning, magic, and enjoyment" . During the transition phase, I experienced something similar and was puzzled as to why my happiness levels suddenly dropped.
But no magic is leaving the world — because there was never any magic to begin with. Magic is simply a verbalization of your own feelings. What’s disappearing are the familiar reactions, which were based on the false belief that objects have certain inherent properties.
The brain's machinery is capable of attaching any feelings to any object, and you can regain those feelings simply by training them. The reasons you used to call a building beautiful or cats cute are still there; these mechanisms are part of the laws of physics. If your feelings were grounded in reality, they’ll remain. What will be affected are the feelings that can be shattered by the truth. But remember, the human brain is capable of feeling emotions toward non-existent things. You can bring back almost any feeling if you want, except those emotions that were tied to a false belief—you can only bring those back if you force yourself to believe the falsehood again.
Now, I experience a high level of happiness even after rejecting the fallacy — you can take this as a testimony. But during the first three months of adaptation, it was painful. In the text of this article, I’ve tried to minimize the mind-projection fallacy. Perhaps you felt some emotions despite my efforts to remove it, which serves as evidence that the absence of the fallacy doesn’t destroy your emotions.
My essay continues the line of reasoning from Yudkowsky’s article "The Mind Projection Fallacy" and attempts to expand on it by providing more gears to explain how the fallacy operates. When I understood the mechanism he described (not on the first try), it was a huge leap. Here, I want to present my perspective on how the mind projection fallacy influences people's motivations, distorting their mental maps.
I went through a phase of analyzing the shifts in my motivations during reading and conversations, noting the specific phrases that triggered spikes in motivation. This included while reading Yudkowsky’s articles. I noticed a pattern that repeated too frequently to ignore—spikes in motivation when encountering words like "useful," "wrong," "right," "good," "bad," "should," "important," and others that from the inside seemed to me like 1-place functions, not 2-place functions.
Here’s how I understood Yudkowsky’s "mind projection fallacy":
What is the mind projection fallacy? It’s a cognitive bias that occurs all around us. "This thing is good, you’re a sweetheart, you’re a bad person, this movie is terrible, you’re a monster, the woman is sexy, this is harmful, this is useful, this is important, this is unimportant, this is right"—all of these phrases may or may not involve the mind projection fallacy. How can we tell if a phrase falls into the pattern indicated by the label?
The mind projection fallacy is the perception of one’s feelings about an object not as personal feelings about the object but as intrinsic properties of the object, inherently tied to it and independent of the observer.
You look at a cute cat, and it seems to you that the cat possesses qualities like “cute,” “beautiful,” and “pleasant.” You experience certain sensations, which you verbalize as “pleasant.” From the inside, it feels as though the cat possesses some quality like "beautiful" and "pleasant," similar to how the sky possesses the quality of "blue," or a stone's surface has the "property" of smoothness.
Let me define “property” here as the stable patterns of behavior we observe in an object.
You see the sky as blue, so it seems like the sky has a certain "blue" property, directly "attached" to the sky, and it seems you can verify it somehow. The smoothness of a stone can be verified in certain ways as well. And so, a habit forms: if you notice a certain pattern in a piece of reality, and this pattern is confirmed when other people see it the same way—you form a habit of generalizing this pattern beyond the specific instance where you observed it. For example, if you see that the grass is green in several parts of the forest, you’ll generalize the expectation that it’ll be roughly the same color in the unexplored parts of the forest as well, and you won’t be wrong.
The mind projection fallacy arises as a result of this habit.
The explanation might be difficult to grasp at first, so I'll offer an example you can always refer to in order to catch the intuition.
This example comes from Yudkowsky’s article "The Mind Projection Fallacy". Yudkowsky borrowed the term from the mathematician E. T. Jaynes, although Jaynes originally used it to describe a mistake in probability interpretation.
In the early days of science fiction, alien invaders might occasionally abduct a girl in a torn dress and drag her away with the intent to assault her, as depicted on many old magazine covers. Strangely, aliens never seemed to go after men in torn shirts.
Sometimes people assume that all minds work similarly because they lack access to how an alien experiences the world from within. You simply don’t know how to model it, so there’s a temptation to take the easy path and model it based on your own experience. I feel this way—so others likely feel the same.
Now imagine that capybaras suddenly became sentient, learned human language, and started claiming that female capybaras are universally sexy, possessing the inherent property of “sexuality” that every intelligent species must notice. Humans, whose brains weren’t wired to experience certain feelings, called sexuality, when looking at female capybaras, would start arguing with the capybaras, saying that this "sexuality" doesn’t exist. They’d argue that the capybaras are mistaking their own feelings for an objective property of the capybaras.
Even a child could see this mistake if a capybara started pushing this idea.
But when people talk among themselves and say, "this building is beautiful," for some reason the analogy with the capybaras becomes less obvious. If in the case of the capybaras, you don’t see this property of “sexuality” in the female capybara from any angle, then when a person points out the “beautiful” quality of a building, you might—if you try hard—be able to imagine how this supposedly existing property of “beauty” feels from the inside. And since the desire to argue depends on whether or not you feel this supposed property inside you and how strongly, if you do feel it, you might even accept “the building is beautiful” as a true statement.
How do false properties (false patterns that create false predictions) emerge? Let’s say your friend guessed what you were doing last night based on a couple of sentences. And since you think it takes a “genius” to guess so precisely, you attach the property “genius” to your friend.
From the inside, this is experienced as, “My friend has some kind of powerful and impressive trait, he knows something smart, and it’s too much effort to understand, so I’ll bow to the mystery and will experience feelings of a fan over this vague trait of genius, which I generalized from one instance. And to save cognitive resources, I’ll lump all my intuitions into the word ‘genius.’” Now, your friend possesses the false property of “genius,” which generates the false prediction that this property will manifest similarly in the future—for example, that your friend will continue to guess what you did with amazing accuracy. But you forgot to turn off your microphone, and your friend overheard, which is why they knew. They don’t have the "genius" property, but you can project it onto them. And then feel disappointment and confusion when your friend makes a dumb mistake. How could it happen? Weren’t they a “genius”?
Similarly, you can project the false property of “good,” “harmful,” or “disgusting” onto someone or something. And forget that you were using those words to describe your own feelings. After all, if your friend possesses the property of “genius,” what does it matter what you feel? This property is internally experienced as objective—it’s a pattern.
Given that people, when they speak, usually optimize their words to evoke specific associations and feelings in their listeners, meaning they generally avoid using unfamiliar language or words you won’t clearly understand, it follows that the mind projection fallacy flourishes everywhere.
If your beloved wife or husband says, “the building is beautiful,” and you look at the building and experience feelings that you could describe with the word "beautiful," why waste extra cognitive resources adding a layer of indirection like, “I’m labeling my feelings toward this building with the word ‘beautiful’”? People usually don’t speak like this, and you won’t automatically add this layer of indirection (the indication of how many perceptions the information is filtered through) to the words of people you trust. When people quickly name an object or strategy as good, there’s a temptation to simply activate your familiar feelings for the word “good,” specifically—calm, loyalty, a sense of value, and stress over the fear of losing it.
If an alien crystal says that this purple crystal is sexy, but you don’t feel any sexuality toward that crystal, why would you attempt to activate the same feelings of sexuality you’re familiar with for this crystal? The crystal doesn’t have the property of "sexuality," so you won’t even try to mirror those feelings to avoid arguing with the alien. Or will you? Do people often train their brains to be aroused by crystals? It seems unlikely.
But if another person, someone similar to you, says that a girl is attractive, your brain is already wired such that you labeled specific feelings toward some girls with the word "attractive." You try to activate those feelings, and it works. And to avoid arguing with this person, it’s easier for you to agree with the misconception that the girl has the property of attractiveness, attached solely to her (a 1-place word), rather than to her plus your perception (a 2-place word).
If people chatter away, spewing dozens of words per minute, with a projection fallacy in every sentence, then to avoid cognitive overload from adding one layer of indirection after another 20 times per minute, you just relax and begin automatically interpreting words as sensations about the object.
The mind-projection fallacy has been deeply ingrained in humanity for centuries and shows no signs of fading. If you try to completely rid yourself of it, your speech may begin to sound like mine (I speak the same way I write in this article), which people often describe as "robotic," "weird," "cringe," or "lacking in emotion." Moreover, the mind-projection fallacy is arguably one of the main sources of emotion in human speech. Emotions are often generated by our mental models and expectations. Perceiving the word "cool" with and without the fallacy will evoke different emotional responses. With the fallacy in place, if a friend compliments you by saying, "You’re so cool," you might experience a rush of pleasant emotions because it exceeds your expectations in a favorable way. It feels like your friend is testifying to a quality within you — "coolness" — that exists independently of their perception, much like how a woman might be said to possess the "objective" trait of "sexiness," only in this case, you are the one being admired.
If you associate the word "cool" with certain emotions, like admiration and exceeding expectations, and if you start to internalize this as a constant trait, it may seem like this "built-in quality" of coolness is something that others will notice as well, just like they notice a building’s beauty or a woman’s sexiness. In your day-to-day life, you haven't found much evidence that this supposed trait exists within you, as you walk down the street and people aren’t in awe of you. But if this quality of "coolness" were truly present, they certainly would be.
Then someone tells you, "You’re actually cool." You didn’t think you had the trait, but now someone sees it in you. This feels like confirmation of the hypothesis "I am cool." You fall into the trap of the mind-projection fallacy and deceive yourself, but at the sensory level, you don’t even need to engage any analytical thought to fall for it. And in return for this self-deception, you receive the reward of pleasant feelings from validated expectations.
And since this carousel of feelings provides you with a rich and pleasurable sensory experience, why would you want to give it up? -People haven’t for centuries.
But what’s the alternative? What happens if you try to fully reject the mind-projection fallacy? For six months now, I’ve been rejecting it in my internal judgments, but I still sometimes use it to quickly trigger pleasant emotions in someone else, like when I say, "You’re a cutie." Here, I’m exploiting the mind-projection fallacy for my own purposes, such as reinforcing behavior so the person is more likely to do me another favor or subscribe to a channel for a new hit of oxytocin that I predict they’ll feel after hearing "You’re a cutie."
In theory, rejecting the mind-projection fallacy should completely change your worldview. What you once thought was "objectively good, right, or important" would be replaced with "I feel a certain way about this."
Likewise, what you previously considered "disgusting or bad" would be reframed as "I feel unpleasant about this for some reason." After rejecting the mind-projection fallacy, you would no longer deceive yourself into thinking that a woman possesses an objective sense of "sexiness" that everyone around her will notice.
The process of rejecting this fallacy, if it existed in you — and I expect it inevitably exists in everyone who hasn’t actively fought against it — will inevitably lead to the destruction of expectations formed by standard thought patterns, with corresponding side emotions. If you know that even minor disruptions of your expectations cause emotional drama for you, then rejecting the mind-projection fallacy will lead to incredible, maximum drama and a variety of secondary unpleasant emotions from being unable to comfortably experience the familiar feeling attached to that mental cluster. But your brain will adapt if you do this often enough.
If you want to experiment with rejecting the mind-projection fallacy, my method is the "add a level of indirection" technique (acknowledging the perceiver). Every time you notice yourself making statements like "this thing is good, bad, or whatever," and there’s no indication or intuition that this is just your perception, you add the phrase "I verbalize my feelings toward this thing as..." because that’s what you’re really doing. You have certain feelings toward something, and you’re naming them—there’s no deception in that. But the way you perceive that thing may suddenly change. I’ve tested this on several friends, and they reported that adding a level of indirection altered their perception.
If you used to feel the urge to argue with people who called your favorite movie "garbage," you can just apply the indirection: "This person is verbalizing their feelings toward the movie as ‘garbage,’" and usually, after that, the arguments aimed at proving to them that the movie possesses some inherent "goodness" they’re missing will simply fade, and you’ll lose interest because those debates were based on a false belief—the mind-projection fallacy.
Finally, people are often hooked on familiar emotions toward things, and if those emotions weaken or disappear after rejecting the mind-projection fallacy, it may feel like "life loses its meaning, magic, and enjoyment" . During the transition phase, I experienced something similar and was puzzled as to why my happiness levels suddenly dropped.
But no magic is leaving the world — because there was never any magic to begin with. Magic is simply a verbalization of your own feelings. What’s disappearing are the familiar reactions, which were based on the false belief that objects have certain inherent properties.
The brain's machinery is capable of attaching any feelings to any object, and you can regain those feelings simply by training them. The reasons you used to call a building beautiful or cats cute are still there; these mechanisms are part of the laws of physics. If your feelings were grounded in reality, they’ll remain. What will be affected are the feelings that can be shattered by the truth. But remember, the human brain is capable of feeling emotions toward non-existent things. You can bring back almost any feeling if you want, except those emotions that were tied to a false belief—you can only bring those back if you force yourself to believe the falsehood again.
Now, I experience a high level of happiness even after rejecting the fallacy — you can take this as a testimony. But during the first three months of adaptation, it was painful. In the text of this article, I’ve tried to minimize the mind-projection fallacy. Perhaps you felt some emotions despite my efforts to remove it, which serves as evidence that the absence of the fallacy doesn’t destroy your emotions.