this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions
Sure, if there really are differences, this method probably underestimates them. But there are other phenomena that could create false differences, such as varying social desirability bias.
Yes, this is something that I've wondered about quite a bit specifically in connection with the variation in conscientiousness and agreeableness by religion. I plan on partially addressing this issue by discussing some objective behavioral proxies to the personality traits in later posts.
Five Factor Model (FFM) ... the model is founded on the lexical hypothesis:
I notice I am confused. I was sure that the FFM came out of doing the following simple procedure:
How wrong is this? How important is the "lexical hypothesis" part?
That's right. The lexical hypothesis only comes in at step 1 by including questions like "I am [adjective]." We start with a vague theory in the questionnaire and apply dimension reduction. The lexical hypothesis is that language gives us a vague theory. We want as broad a theory as possible, so it is useful to combine questionnaires. Some sources claim that the original questionnaire was generated from language without questions from explicit theories, but I don't think that's correct.
Thanks for writing this! I really think people should be doing this (applying well-known algorithms to interesting datasets and seeing what happens) a lot more often overall, and it's on my list of skills I'd really like to learn personally. So I'd be interested to hear a little more info on methodology - what programming language(s) you used, how you generated the graphs, etc.
I'm pretty skeptical of making any connections to the Bay Area rationalist community based on Berkeley's conscientiousness score (which I think is interesting but not for this reason). There are 100,000 people living in Berkeley, and most of them aren't rationalists. And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests. (Edit: But see Douglas_Knight's comment below.) In general I'd think more about selection effects like this before drawing any conclusions.
Glad you liked it :-).
So I'd be interested to hear a little more info on methodology - what programming language(s) you used, how you generated the graphs, etc.
I used R for this analysis. Some resources that you might find relevant:
And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests.
Douglas_Knight is correct – the average age of users is quite low, at ~26 years old both for the high conscientiousness cities and the low conscientiousness cities.
I think you have the causality flipped around. Jonah is suggesting that something about Berkeley contributes to the prevalence of low conscientiousness among rationalists.
What I had in mind was that the apparent low average conscientiousness in the Bay Area might have been one of the cultural factors that drew rationalists who are involved in the in-person community to the location. But of course the interpretation that you raise is also a possibility.
Ah, I spoke imprecisely. I meant what you said, as opposed to things of the form "there's something in the water".
Actually, two of your complaints cancel out. You should expect that the population living in Berkeley has a very young personality, but if all the data is from college students, then there's nothing special about Berkeley (except that it is large and thus small effects are statistically significant — but the claim is that it has a large effect).
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you'd do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Added: poking around the website I don't see a clear answer to how old the data is. Most of it seems to have been collected by 2011, but I'm not sure because there are lots of variations. Each big5 score is labeled with the date taken.
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you'd do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Good point, I missed this.
Fascinating that the high-extroversion cities are places I know well, while the low-extroversion cities are places I've never heard of.
However, this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions rather than anchoring on the national population
I think this is a major problem. This is mainly based on taking a brief look at this study a while back and being very suspicious of it explicitly contradicting so many of my models (eg South America having lower Extraversion than North America and East Asia being the least Conscientious region)
In 2007, psychology researchers Michal Kosinski and David Stillwell released a personality testing app on Facebook app called myPersonality. The app ended up being used by 4 million Facebook users, most of whom consented to their personality question answers and some information from their Facebook profiles to be used for research purposes.
The very large sample size and matching data from Facebook profiles make it possible to investigate many questions about personality differences that were previously inaccessible. Koskinski and Stillwell have used it in a number of interesting publications, which I highly recommend (e.g. [1], [2] [3]).
In this post, I focus on what the dataset tells us about how big five personality traits vary by geographic region in the United States.
The Five Factor Model of Personality
The Five Factor Model (FFM) or Big Five personality trait model is currently the dominant paradigm in personality research. The model is founded on the lexical hypothesis:
When people are asked questions about whether various adjectives describe them (or describe someone who they know), their answers are pairwise correlated with one another. Applying factor analysis to the responses yields a small number of underlying factors that explain a large fraction of the variance common to the answers.
Empirically, it's been found that a model with 5 factors often fits the data well (though some researchers claim that one gets 6 or 7 factors if one uses a question battery that fully exhausts descriptive adjectives, see e.g. the HEXACO model of personality and The Big Seven Model of Personality and Its Relevance to Personality Pathology for more information).
The five factors referred to as the "Big Five" are labelled extraversion, neuroticism, agreeableness, conscientiousness and openness. I will describe these more below.
It's likely that the Big Five personality model falls far short of carving reality at its joints, and I'm in broad agreement with the stance that Jack Block expresses in A contrarian view of the five-factor approach to personality description. Nevertheless, the five factors in the model satisfy some desirable criteria, such as
and much of the data that's available uses Big 5 personality questionnaires, so it's often what we have to work with.
Data, methodology and high level results
There were ~680k Americans who both answered 20+ questions on a Big Five Personality Test, and who made their hometown available to researchers. After excluding hometowns with <= 30 users, about ~3500 hometowns were represented. Questions were answered on a scale from 1 (strongly disagree) to 5 (strongly agree).
I estimated personality trait averages for each city using Bayesian hierarchical modeling in order to account for regression to the mean when sample sizes are small. This results in relatively large cities being more prominently represented at the extremes of the estimates, on account of the larger sample sizes making it possible to have greater confidence in city averages deviating substantially from the mean. A CSV file with all estimates of city averages is available on Dropbox.
The units in the graphs below are standard deviations away from the mean of the entire sample. Roughly speaking, average self-reported personality by city varies from -0.2 to 0.2 standard deviations from the mean. However, this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions rather than anchoring on the national population, as described in Birds of a feather do flock together: behavior and language-based personality assessment reveal personality homophily among couples and friends:
Extraversion
Representative questions:
Party cities have high average extraversion
The appearance of New Orleans, Miami, Hollywood, Beverly Hills and Newport Beach as amongst the highest on average extraversion is consistent with the the cities' reputations as having high prevalence of partying & socialization. New Orleans and Miami are both highest average extraversion in the data, and 2 of the 3 American cities on this list flist of top 20 party cities in the world.
The Seattle Freeze
Andrew J. Ho comments that the high frequency of cities in Washington state reminds him of the Seattle Freeze:
Neuroticism
Representative questions:
Ethnicity as an underlying factor
Washington DC and Atlanta stand out as having unusually large African American populations, constituting roughly 50% of the population. From Wikipedia:
The researchers behind the myPersonality app labelled the Facebook profile photos of a subset of the users by their race, so we can stratify by race. The numbers of people for whom we have labelled photos are given below, by race.
The people in cities with low average neuroticism are heavily disproportionately African-American:
This is not a coincidence. In fact, for the sample as a whole, African Americans' self-reported neuroticism is a full 0.2 standard deviations lower than the rest of the population. This remains true even if we restrict attention to a particular city, like Washington DC:
The finding of African Americans being relatively low on neuroticism is consistent with the literature on national differences in personality. The figure below is from The Geographic Distribution of Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations. It depicts estimates of average neuroticism by continent, showing that Africans are as a group noticeably lower in neuroticism than people from other continents.
Agreeableness
Representative questions:
Agreeableness and Mormonism?
Seven of the 10 cities with highest average agreeableness are in Utah. This corresponds to Utah residents being almost 60% Mormon: as a group, Mormons have exceptionally high average agreeableness. One can do an analysis similar to the one that I did with race and neuroticism. I'll return to this later in the context of a more systematic discussion of agreeableness and religion.
New Yorkers really are unusually disagreeable
The fact that 8 of the 10 cities listed correspond to some burrough of New York City is in accordance with stereotypes around New Yorkers being unfriendly / mean / aggressive / rude (c.f. New York City Ranked Sixth Most Unfriendly City in the World, Survey Finds).
Conscientiousness
Representative questions:
Low conscientiousness in the Bay Area
It's striking that each of Berkeley, San Francisco, San Jose, Hayward and Cuptertino make the list of 10 cities with lowest average conscientiousness, while simultaneously all being in the Bay Area.
Connection with the in person rationalist community?
The finding that Bay Area residents skew toward unusually low conscientiousness should be of especially strong interest to the rationalist community in light of the fact that the Bay Area has become the central hub of community activity.
Slightly shifting the subject, in the 2016 Less Wrong Diaspora Survey, those respondents who reported to having involvement with the in-person community reported to being clincially diagnosed with ADHD with frequency ~20%, roughly 2x more frequently than those who reported to having no involvement with the in person community. Low conscientiousness is known to associate with ADHD, with people who have been diagnosed with ADHD scoring an average of 1 standard deviation below the population mean. In light of these things, it seems possible that there's some connection between high rates of clinical diagnosis of ADHD amongst people being involved with the in person community, and Bay Area residents being unusually low conscientiousness.
As with low extraversion, I'd welcome any ideas on what differentiates the cities with high average conscientiousness from others...
Openness
Representative questions.
Artsy cities and openness
Openness is associated with artistic interests. Hollywood is the center of cinema in the United States. Sante Fe and New Orleans are considered two of the ten most artistic cities in America. So their appearance near the top of the list is in consonance with expectations.
Political Liberalism and Openness
Openness is known to be strongly predictive of liberal political affiliation (c.f. The Secret Lives of Liberals and Conservatives: Personality Profiles, Interaction Styles, and the Things They Leave Behind). So the appearance of many coastal California cities is also in consonance with expectations.
To Be Continued...
There's much more to say about personality and demographics, and I plan on writing more along these lines.