Applied Bayes' Theorem: Reading People

Kaj_Sotala

Or, how to recognize Bayes' theorem when you meet one making small talk at a cocktail party.

Knowing the theory of rationality is good, but it is of little use unless we know how to apply it. Unfortunately, humans tend to be poor at applying raw theory, instead needing several examples before it becomes instinctive. I found some very useful examples in the book Reading People: How to Understand People and Predict Their Behavior - Anytime, Anyplace. While I didn't think that it communicated the skill of actually reading people very well, I did notice that it did have one chapter (titled "Discovering Patterns: Learning to See the Forest, Not Just the Trees") that could almost have been a collection of Less Wrong posts. It also serves as an excellent example of applying Bayes' theorem in every-day life.

In "What is Bayesianism?" I said that the first core tenet of Bayesianism is "Any given observation has many different possible causes". Reading People says:

If this book could deliver but one message, it would be that to read people effectively you must gather enough information about them to establish a consistent pattern. Without that pattern, your conclusions will be about as reliable as a tarot card reading.

In fact, the author is saying that Bayes' theorem applies when you're trying to read people (if this is not immediately obvious, just keep reading). Any particular piece of evidence about a person could have various causes. For example, in a later chapter we are offered a list of possible reasons for why someone may have dressed inappropriately for an occasion. They might (1) be seeking attention, (2) lack common sense, (3) be self-centered and insensitive to others, (4) be trying to show that they are spontaneous, rebellious, or noncomformists and don't care what other people think, (5) not have been taught how to dress and act appropriately, (6) be trying to imitate someone they admire, (7) value comfort and convenience over all else, or (8) simply not have the right attire for the occasion.

Similarly, very short hair on a man might indicate that he (1) is in the military, or was at some point in his life, (2) works for an organization that demands very short hair, such as a police force or fire department, (3) is trendy, artistic or rebellious, (4) is conservative, (5) is undergoing or recovering from a medical treatment, (6) thinks he looks better with short hair, (7) plays sports, or (8) keeps his hair short for practical reasons.

So much for reading people being easy. This, again, is the essence of Bayes' theorem: even though somebody being in the military might almost certainly mean that they'd have short hair, them having a short hair does not necessarily mean that they are in the military. On the other hand, if someone has short hair, is clearly knowledgeable about weapons and tactics, displays a no-nonsense attitude, is in good shape, and has a very Spartan home... well, though it's still not for certain, it seems likely to me that of all the people having all of these attributes, quite a few of them are in the military or in similar occupations.

The book offers a seven-step guide for finding patterns in people. I'll go through them one at a time, pointing out what they say in Bayesian and heuristic/bias terms. Note that this is not a definitive list: if you can come up with more Bayesian angles to the book, post them in the comments.

1. Start with the person's most striking traits, and as you gather more information see if his other traits are consistent or inconsistent.

As computationally bounded agents, we can't simply take in all the available data at once: we have to start off some particularly striking traits and start building a picture from there. However, humans are notorious about anchoring too much (Anchoring and Adjustment), so we are reminded to actively seek disconfirmation to any initial theory we have.

I constantly test additional information agaisnt my first impression, always watching for patterns to develop. Each piece of the puzzle - a person's appearance, her tone of voice, hygiene and so on - may validate my first impression, disprove it, or have little impact on it. If most of the new information points in a different direction than my first impression did, I revise that impression. Then I consider whether my revised impression holds up as even more clues are revealed - and revise it again, if need be.

Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it. Nor should you expect to suddenly counter some piece of evidence that, on its own, would make you switch to becoming confident in something completely different. An ideal Bayesian agent will expect their beliefs to be in a constant state of gradual revision as the evidence comes in, and people with human cognitive architectures should also make an explicit effort to make their impressions update as fluidly as possible.

Another thing that's said about first impressions also bears to be noted:

People often try hard to make a good first impression. The challenge is to continue to examine your first impression of someone with an open mind as you have more time, information, and opportunity.

Filtered evidence, in its original formulation, was a set of evidence that had been chosen for the specific purpose of persuading you of something. Here I am widening the definition somewhat, and also applying to cases where the other person cannot exclude all the evidence they dislike, but are regardless capable of biasing it in a direction of their choice. The evidence presented at a first meeting is usually filtered evidence. (Such situations are actually complicated signaling games, and a full Bayesian analysis would take into account all the broader game-theoretic implications. Filtered evidence is just one part of it.)

Evidence is an event tangled by links of cause and effect with whatever you want to know about. On a first meeting, a person might be doing their best to appear friendly, say. Usually being a friendly person will lead them to behave in specific ways which are characteristic of friendly people. But if they are seeking to convey a good impression of themselves, their behavior may not be caused by an inherent friendliness anymore. The behavior is not tangled with friendliness, but with a desire to appear friendly.

2. Consider each characteristic in light of the circumstances, not in isolation.

The second core tenet in What is Bayesianism was "How we interpret any event, and the new information we get from anything, depends on information we already had."

If you told me simply that a young man wears a large hoop earring, you couldn't expect me to tell you what that entails. It might make a great parlor game, but in real life I would never hazard a guess based on so little information. If the man is from a culture in which most young men wear large earrings, it might mean that he's a conformist. If, on the other hand, he is the son of a Philadelphia lawyer, he may be rebellious. If he plays in a rock band, he may be trendy.

A Bayesian translation of this might read roughly as follows. "Suppose you told me simply that a young man wears a large hoop earring. You are asking me to suggest some personality trait that's causing him to wear them, but there is not enough evidence to locate a hypothesis. If we knew that the man is from a culture where most young men wear large earrings, we might know that conformists would be even more likely to wear earrings. If the number of conformists was sufficiently large, then a young man from that culture, chosen randomly on the basis of wearing earrings, might very likely be a conformist, simply because conformist earring-wearers make up such a large part of the earring-wearer population.

(Or to say that in a more mathy way, say we started with a .4 chance of a young man being a conformist, a .6 chance for a young man to be wearing earrings, and a .9 chance for the conformists to be wearing earrings. Then we'd calculate (0.9 * 0.4) / (0.6) and get a 0.6 chance for the man in question to be conformist. We don't have exact numbers like these in our heads, of course, but we do have a rough idea.)

But then, he might also be the son of a Philadelphia lawyer, say, and then we'd get a good chance for him being rebellious. Or if he were a rock band member, he might be trendy. We don't know which of these reference classes we should use; whether we should think we're picking a young man at random from a group of earring-wearing young men from an earring-wearing culture or from all the sons of lawyers. We could try to take a prior over his membership in any of the relevant reference classes, saying for instance that there was a .05 chance of him being a member of an earring culture, or a .004 chance of him being the son of a lawyer and so on. In other words, we'd think that we're picking a young earring-wearing man from the group of all earring-wearing men on Earth. Then we'd have a (0.05 * 0.6 =) 0.03 chance of him being a conformist due to being from an earring culture, et cetera. But then we'd distribute our probability mass over such a large amount of hypotheses that they'd all be very unlikely: the group of all earring-wearing men is so big that drawing at random could produce pretty much any personality trait. Figuring out the most likely alternative of all those countless alternatives might make a great parlor game, but in real life it'd be nothing you'd like to bet on.

If you told me that he was also carrying an electric guitar... well, that still wouldn't be enough to get a very high probability on any of those alternatives, but it sure would help increase the initial probability of the "plays in a rock band" hypothesis. Of course, he could play in a rock band and be from a culture where people usually wore earrings."

3. Look for extremes. The importance of a trait or characteristic may be a matter of degree.

This is basically just a reformulation of the above points, with an emphasis on the fact that extreme traits are easier to notice. But again, extreme signs don't tell us much in isolation, so we need to look for the broader pattern.

The significance of any trait, however extreme, usually will not become clear until you learn enough about someone to see a pattern develop. As you look for the pattern, give special attention to any other traits consistent with the most extreme ones. They're usually like a beacon in the night, leading you in the right direction.

4. Identify deviations from the pattern.

(I'll skip this one.)

5. Ask yourself if what you're seeing reflects a temporary state of mind or a permanent quality.

Again, any given observation has many different possible causes. Sometimes a behavior is caused not by any particular personality trait, but the person simply happening to be in a particular mood, which might be rare for them.

This is possibly old hat by now, but just to be sure: The probability that behavior X is caused by cause A, sayeth Bayes' theorem, is the probability that A happened in the first place times (since they must both be true) the probability that A would cause X at all. That's divided by the summed chance for anything else to have caused X.

A psedo-frequentist interpretation might compare this to the probability of drawing an ace out of a deck of cards. (I'm not sure if the following analogy is useful or makes sense to anyone besides me, but let's give it a shot.) Suppose you get to draw cards from a deck, but even after drawing them you're never allowed to look at them, and can only guess whether you're holding the most valuable ones. The chance that you'll draw a particular card is one divided by the total number of cards. You'd have a better chance of drawing it if you got to draw more cards. Imagine the probability of "(A happened) * (A would cause X)" as the amount of cards you'll get to draw from the deck of all hypotheses. You need to divide that with the probability that all hypotheses combined have, alternative explanations included, so think of the probability of the alternate hypotheses as the amount of other cards in the deck. Then your chance of drawing an ace of hearts (the correct hypothesis) is maximized if you get to draw as many cards as possible and the alternative hypotheses have as little probability (as few non-ace-of-hearts cards in the deck) as possible. Not considering the alternate hypotheses is like thinking you'll have a high chance of drawing the correct card, when you don't know how many cards there are in the deck total.

If you're hoping to draw the correct hypothesis about the reasons for someone's behavior, then consider carefully whether you want to use the "this is a permanent quality" or the "this is just a transient mood" explanation. Frequently, drawing the "this is just a transient mood" cards will give you a better shot at grabbing the hypothesis with the most valuable card.

See also Correspondence Bias.

6. Distinguish between elective and nonelective traits. Some things you control; other things control you.

As noted in the discussion about first impressions, people have an interest in manipulating the impression that others give them. The easier it is to manipulate an impression, and the more common it is that people have an interest in biasing that impression, the less reliable of a guide it is. Elective traits such as clothing, jewelry and accessories can be altered almost at will, and are therefore relatively weak evidence.

Nonelective traits offer stronger evidence, particularly if they're extreme: things such as extreme overweight, physical handicaps, mental disorders and debiliating diseases often have a deep-rooted effect on personality and behavior. Many other nonelective traits such as height or facial features that are not very unusual don't usually merit special consideration - unless the person has invested signficant resources to permanently altering them.

7. Give special attention to certain highly predictive traits.

Left as an exercise for readers.

A good Bayesian still needs to imagine all the explanations that might be elevated by the evidence. This post did a good job emphasizing that.

I picked up this book six months ago, and noted that the paradigm was quite rational. For example, she did talk about spending a lot of time practicing how you read people and then testing your hypotheses. The author chose people for a jury, so she needed to objective about whether she was reading people correctly to be successful.

I just skimmed it, but one idea stood out that I carried away with me: she wrote that there are many questions you can ask a person for which the response will depend upon their socioeconomic background and other factors. However, there was one characteristic that she found to be robust: whether they were compassionate or not (or something akin). Apparently, this is one personality trait people don't falsely signal.

(Later edit: I'm sure that the book said it was straightforward to identify if someone was compassionate, as I may have mis-remembered. Instead it seems the main message was that IF someone is compassionate, that is a good predictor of their behavior. Thanks to wedifred for motivating me to double-check.)

However, there was one characteristic that she found to be robust: whether they were compassionate or not (or something akin). Apparently, this is one personality trait people don't signal falsely.

Really? I've dealt with sociopaths who signaled that falsely.

I've dealt with sociopaths who signaled that falsely.

I'd be interested in hearing more about this, as I am highly skeptical of the existence of sociopaths (in the popular, overdramatized sense of people constantly manipulating others). There are people who have conduct disorders with no self-control, and who can't stop themselves from impulsively committing crimes and acts of violence, but that's a far cry from the stereotype of always lying, being devious and cunning, and signaling falsely.

Here are some personal impressions from my own experience and that of friends (which should be taken with a grain of salt.)

*Sociopaths occupy 1-2% of the population, so a good majority of people who one meets are not sociopaths.

*Of those sociopaths who one does meet, most conceal their sociopathy.

*Combining the two factors above, it's quite possible to go through life without ever seeing evidence of sociopaths first hand. Thus, just because you've never seen firsthand evidence doesn't mean that you should rule out their existence.

*Many sociopaths do constantly manipulate others, but sometimes this is not a conscious choice. It can happen that (at least locally) when they tell lies they believe them. What's so sad about their situation is that they're often as much victims of their condition as they are perpetrators.

*If you want stories of people's experiences with sociopaths, check out emotional abuse forums online. Of course, in principle the people there could be deluded as to the nature of their experiences, but by going to such forums you can at least get some idea of the sort of sociopath that people report on encountering in real life as opposed to Hollywood's version.

She might have been more specific. I'll look it up..

I'll be interested to hear. I'm getting the impression that there is an underlying insight to what she is saying that does match my observations.

I read an older, very different version of the book, but in this version she writes,

If I peg someone as either very compassionate or unusually cold and harsh, I already know more about them and how they are likely to behave than their age, educational background, employment, physical appearance and sex combined could ever tell me.

Earlier on the same page she wrote,

I am not alone in my belief that an individual's level of compassion is a very good predictor of how he will think and act.

So the main difference in what I remembered and what she wrote is that while I had taken away that compassion is a good predictor of behavior, it's not necessarily easy to measure, just reliable information about a person once you have measured it.

For the record, in the most recent version she also mentions socio-economic background and satisfaction with life as predictors that are nearly as powerful as compassion.

Or, how to recognize Bayes' theorem

Kaj, bad link

Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it.

There has to be a better way to put this.

The problem is that sometimes you can anticipate the direction. For example, if someone's flipping a coin, and you think it might have two heads. This is a simple example because a heads is always evidence in favor of the two-heads hypothesis, and a tails is always evidence in favor of the normal-coin hypothesis. We can see you become sure of the direction of evidence in this scenario: If the prior prob of two heads is 1/2, then after about ten heads you're 99% sure the eleventh is also going to be heads.

However, I do think that this is just because of very artificial features of the example that would never hold when making first impressions of people. Specifically, what's going on in the coin example is a hypothesis that we're very sure of, that makes very specific predictions. I can't prove it, but I think that's what allows you to be very sure of the update direction.

This never happens in social situations where you've just recently met someone--you're never sure of a hypothesis that makes very specific predictions, are you?

I don't know. I do know that there's some element of the situation besides conservation of probability going into this. It takes more than just that to derive that updates will be gradual and in an unpredictable direction.

(EDIT: I didn't emphasize this but updates aren't necessarily gradual in the coin example--a tails leads to an extreme update. I think that might be related--an extreme update in an unexpected direction balancing a small one in a known direction?)

(Or to say that in a more mathy way, say we started with a .4 chance of a young man being a conformist, a .6 chance for a young man to be wearing earrings, and a .9 chance for the conformists to be wearing earrings. Then we'd calculate (0.9 * 0.4) / (0.6) and get a 0.6 chance for the man in question to be conformist. We don't have exact numbers like these in our heads, of course, but we do have a rough idea.)

Love this example. I have been a HPMOR fan and a LWer for a while now, but i'm just taking my first steps into solid bayesian reasoning. Just out of curiosity the 0.6 probability assigned, includes "conformists" and "non-conformists" right? I think i'm pretty much right, but just feel that I need some validation to make sure I havent completely misunderstood what's going on.

Where would you put the idea of checking the quality of your source of hypothesized patterns?

I admit this list of possibilities is just what seems reasonable to me.....

Patterns you've observed yourself in a culture you've spent a lot of time in and that you've needed to update now and then, patterns you've heard about from members of a culture, patterns you've heard about from people you observed updating on new evidence, patterns you've observed with no surprises, patterns you've heard about from people who mostly seem to talk to each other about a culture they're not part of, patterns you've seen in fiction.

http://www.pokeroffice.com/ does this for online poker to a level of arbitrary perfection, by predicting your opponents likely play in the current situation given all of their previous play.

If you are playing online poker for moderate to high amounts of money and you are not using software to do ideal Bayesian predictions of your opponents play in realtime, you are the sucker.

"ideal Bayesian predictions of your opponents play in realtime"?

This sounds to me like you're exaggerating what the software you use does. Are you involved with the people making money from selling that software?

Personally, I use a competing software suite (http://www.pokertracker.com/), do you mean to claim that PokerOffice does significantly more than this one does?

I haven't actively played poker online for many years and I was probably exaggerating, maybe confusing the data analysis of the past with the ability to make predictions for future play.

Reading this makes me really wish I had access to some data. p(personality type | trait). That would make reading people as easy as counting cards in blackjack! Surely there is this kind of data out there... and if not why not?!

and if not why not?!

Changing trends? Take hippy clothes. In the 60s a probable indicator of promiscuity, nowadays more likely to be a green (who are not known for their free love).

Also if it were collated and published humans would be somewhat anti-inductive if possible. For example if liking cats had a low mutual probability with sociopathy, and this was widely known, then sociopaths would pretend to like cats in order to avoid detection.

Do you really think people making studies giving numbers to trends that are already widely known would make a difference? Most people pick up this kind of data intuitively but would never consider memorizing a table buried in the middle of a research paper. Having the figures just makes it easier to learn empathy in a systematic way. (Which, incidentally, few people would even be capable of.)

When conducting surveillance across a diverse population, having this information would certainly be useful. What proportion of shoplifters carry large bags? What proportion of bag carriers are shoplifters?

Come to think of it, perhaps this is how airline safety ought to work?

Any sort of predictive field of individual behavior ought to be able to make use of this data. Especially useful if you can tie in some computer assisted image tagging.

I suspect not everyone knows every trend. Lots of high class people might not know about straight edge punks. I also suspect that someone will write a pop-sci book about it if it interesting.

You might find something like this in market research. Certainly the sort of analysis that predicts which advertisements are relevant to a user on sites like Facebook would be similar to this. Trying to answer a question like "Which advertisement will the user be most receptive to given this cluster of traits?", where the traits are your likes / dislikes / music / etc.

This isn't exactly what you're asking for, but I doubt there is a P(personality type | trait) table anywhere. You're talking about a high-dimensional space and a single trait does not have much predictive power in isolation.

This isn't exactly what you're asking for, but I doubt there is a P(personality type | trait) table anywhere. You're talking about a high-dimensional space and a single trait does not have much predictive power in isolation.

If I had enough data points of people's personality traits, I could stick it in something like Weka, look for empirical clusters (using something like k-means or hierarchical clustering, and so forth), then train a number of classifiers to sort individual people into these clusters given a limited number of personality trait observations.

There are all sorts of forms these classifiers could take. You could do the same sort of thing wedrifed is thinking of: assume that traits are independent and use the p(personality type | trait) values that have the most predictive power to classify a person. This would be a naive Bayes classifier, of the sort that's fantastically effective at spam filtering.

If you wanted to make something simpler -- perhaps something you could print out as a handy pocket guide to classifying people -- you could use a decision tree. That's like a precomputed strategy for playing 20 questions, where you only ask questions whose answers pay rent. It's approximate, but it can work surprisingly well. A related method is to build several randomized decision trees and have them vote.

Of course, once you build a classifier, that's a hypothesis about some structure in reality. You need to test that hypothesis before you rush forth and start putting your trust in it. For that, you can hold some of the data in reserve, and see how a classifier built from the rest of the data performs on it. If you break your data up into n groups and take turns letting each group be the testing data set, this can tell you if your general method for generating classifiers is working for this data set.

Of course this is all terribly ad-hoc, but the Bayesian ideal approach is hard to compute here, and often these hacks work surprisingly well.

This isn't exactly what you're asking for, but I doubt there is a P(personality type | trait) table anywhere. You're talking about a high-dimensional space and a single trait does not have much predictive power in isolation.

And yet, this is exactly what personality tests must rely on and the sort of thing that we are doing when we follow the advice in the post. Access to even the raw data used when creating the 'big five' would be useful.

No, the article specifically warns against using a single trait. It gives specific examples of how a single trait can mean very different things. It takes a cluster of traits to establish something useful.

If you want to pursue getting the data, though, you could try to derive something like a table of probabilities from a self scored 'Big Five' test, like the one in the appendix of this review paper. From that same review paper you can also find the papers and data sets that gave rise to five factor personality analysis.

edit: fixed the link.