The Role of Attractiveness in Mate Selection: Individual Variation

JonahS

This post reports on a portion of my analysis of Fisman and Iyengar's speed dating dataset which bears on the question of how people select romantic partners.

Note: I made very substantial edits to the second to last section of this post having posted it, addressing questions of generalizability. I've also cross-posted to my blog.

Summary

Participants rated one another on several dimensions. The majority of variation in the ratings is captured by the average of the different rating types: some people were regarded as good overall, and others were regarded as not good overall.
The second most important source of variation in the ratings given to participants is that some were regarded as more attractive and fun than they were intelligent/sincere, and for others, the situation was reversed.
Broadly, when people had to chose between partners who were seen as attractive and fun and partners who were seen as intelligent and sincere, they had a moderately strong preference for partners who were seen as attractive and fun.
Individuals varied substantially in how they responded to the tradeoff, with some showing very strong preference for people who were seen as attractive and fun people, and others showed virtually no such preference.

The speed dating context may be unusual in that people make a decision on whether or not to see somebody again after only 4 minutes of interaction. On the other hand, some people do meet their partners in contexts such as bars and speed dating events where decisions are made based on brief interactions. To this extent, the empirical phenomena in data from the study are relevant to understanding mate selection in general.

The Predictive Power of Attractiveness

In How Subjective Is Attractiveness? I described how the group consensus on somebody's attractiveness explained 60% of the variance in people's perceptions of attractiveness. My original purpose in writing it was as background for a discussion of how much attractiveness influenced people's decisions as to whether or not to see their partners again.

I touched on this in Predictors of Selectivity and Desirability at Speed Dating Events. The group consensus on attractiveness is highly predictive of how often people wanted to see somebody again. I remember being slightly shocked upon first viewing the graphs below:

If we average over all participants, we find that participants of above average attractiveness had twice as many suitors as participants of below average attractiveness.

There are questions of how the group consensus on attractiveness should be interpreted: for example, how much it's determined by physical appearance as opposed to other characteristics. But up to that ambiguity, the question of whether the connection between attractiveness and desirability was causal is a semantic one — the group consensus on attractiveness picked up on some characteristic that resulted in certain people having many more suitors than others. If we define attractiveness to be whatever that characteristic is, then the connection is causal by definition.

Despite the strong predictive power of the group consensus on attractiveness, there was substantial variability in how much people's decisions were influenced by attractiveness, whether measured by group consensus or by their own assessment. While 98% of participants had perceptions of attractiveness that overlapped with those of the others in the group, only 93% of participants made decisions that were correlated with the consensus of others on their partners' attractiveness.

Individual responsiveness to attractiveness

To visualize the distribution of the degree to which people's decisions were influenced by their partners' attractiveness, for each individual, we form the angle between the vectors the participant's decisions, and the average attractiveness of his or her partners, and then plot these angles. The two vectors are in some ways qualitatively different, so the angles don't give a good sense for how much somebody's decisions were influenced by attractiveness in absolute terms, but they're helpful for thinking about how influenced people were relative to others.

An angle of 0 degrees represents perfect correlation while an angle of 90 degrees represents the person's decisions being orthogonal to the group's consensus on his or her partners' attractiveness. Angles greater than 90 degrees represent negative correlation. One can see that the angle was about 90 degrees for a small but significant fraction of participants, while for others the angle is very small, approaching 0 degrees.

The actual preferences of the participants surely vary less than the above graph suggests if it's taken at face value: the difference between those at the extremes and those in the middle would shrink with

A larger sample of dates per person
Better estimates of group consensus (based on ratings from a larger number of raters).

Still, the graph renders it plausible that the weight that people gave to attractiveness varied a lot, even if the variation was smaller than it is in the graph.

We could proceed to make "best guess" estimates of what the true distribution is, but we can get greater insight into what's going on by first adopting a shift in perspective.

Overall desirability and the tradeoffs

Participants rated each other on attractiveness, fun, ambition, intelligence and sincerity, as well as overall likeability. Ratings on the different dimensions were all correlated, sometimes strongly. (More here). This is partially explained by perceptions of somebody on one dimension influencing perceptions of the person on other dimensions (the Halo Effect). It could be partially explained by actual correlations between the underlying traits being measured. I'll explore possible explanations in greater detail in the future. From the point of view of understanding how people's preferences vary, the main point is that though we have 6 rating types, we have fewer than 6 independent of pieces of information: ratings of intelligence aren't just ratings of intelligence, ratings of ambition aren't just ratings of ambition, etc.

We would like to throw out the redundant information so that we can focus on the essentials. A method that facilitates this is principal component analysis (PCA), an automated procedure that takes the 6 ratings as inputs and returns an output of 6 weighted averages of the ratings (called "principal components") that are independent of one another. The key point is that it's often the case that the procedure compresses much of the information present in all of the variables into the first few principal components (something that the procedure designed to do), and that we can discard the other principal components with little cost, reducing the number of variables that we need to consider.

If we apply PCA to the 6 ratings, the first combination that the procedure gives is a weighted average where each rating gets almost equal weight:

good= 4* (Attractiveness) + 5*(Like) + 4*(Fun) + 4*(Intelligence) + 4*(Ambition) + 3*(Sincerity)

This can be thought of as corresponding to overall favorable impressions of somebody, so I named it "good." It captures roughly 60% of the information that was in the original ratings.

The second weighted average that PCA gives is not nearly as symmetric:

tradeoff = 4.5*(Attractiveness) + 3*(Like) + 3*(Fun) — 6*(Intelligence) — 2*(Ambition) — 5*(Sincerity)

This principal component picks up on the fact that after the variation picked up on by the first principal component, the second largest source of variation comes from those who were rated falling on a spectrum between the two poles

attractive, fun and likable <-------------> sincere, intelligent and ambitious

The first cluster of traits is more closely connected with mainstream romance than the second cluster of traits, which are thought of as positive, but less relevant.

The"tradeoff" combination captures roughly 20% of the information in the original ratings. So together, the first two principal components capture 80% of the information in the original ratings. We could look at the rest of the combinations that PCA gives us, but doing so would complicate the analysis without telling us much more.

Individual differences in romantic preferences

Having extracted the two principal components "good" and "tradeoff", we can examine how participants vary with respect to how their decisions depend on their partners' levels of each. Participants didn't vary very much with respect to their responsiveness to the "good" dimension. It's more interesting to examine how people differed with respect to preferences on the "tradeoff" dimension.

As background context, if we're content not to take into account differences in romantic preferences, we can model the probability of a participant's decision being yes by using a linear model for the log odds ratio:

LOR ~ 2*good + tradeoff + (general willingness to see partners again)

The fact that we're adding the tradeoff term rather than subtracting it corresponds to people tending to favor attractive and fun partners over intelligent and sincere partners, when forced to choose.

To individualize the model while attempting to correct for the variation that one would expect by chance, I followed Andrew Gelman's suggestion and used Bayesian hierarchical modeling. We replace the equation above with

LOR ~ 2*good + (personal tradeoff coefficient)*tradeoff + (general willingness to see partners again)

where "personal tradeoff coefficient" is a constant that depends on the individual making the decision.

The plot below shows the distribution of best guess estimates for the personal tradeoff coefficients. The title of the plot is a loose description of the "tradeoff" principal component, the precise definition of which I gave above.

The lefthand tail corresponds to some people having exhibited virtually no preference for attractive and fun partners over intelligent and sincere partners. The righthand tail corresponds to some people's preference being almost twice as strong as average.

What this means in tangible terms

In my first draft of this post, I postponed discussion of statistical significance until later, but I subsequently realized that I could address it succinctly.

I formed the graphs below by:

Estimating participants' coefficients based on the first 65% of the dates that they went on. These dates are the train set for our model.
Forming a "high" and "low" groups of participants according to whether their coefficients were in the top or bottom 1/3^rd.
Restricting consideration to those dates that were not in the first 65% of dates. These dates are the test set for our model.

Thus, the dates that I used to estimate the coefficients are completely disjoint from the dates that I used to form the graphs, so that we get unbiased estimates for the romantic preferences that the two groups of people would show in contexts similar to those of the study.

The first graph shows the frequency with which people's decision was 'yes' as as a function of their partners' attractiveness level.

The slope is slightly larger for the the group with high coefficient: you can see that the initial difference between the two groups in selectivity shrinks as one passes from partners with low attractiveness to high attractiveness.

The visual appearance of the graph understates the difference between the two groups: the high group virtually never expressed interest people lowest part of the attractiveness spectrum, whereas people in the low group were several more times more likely to. This comes across more clearly if we replace the percentage on the y-axis with the corresponding Log Odds Ratio . Here "odds" has the same meaning that it does in gambling (e.g.Roulette) and "log" refers to "logarithm." In the graph below, the 0 on the y axis corresponds to decisions being yes 50% of the time, and an increase of 1 along the y-axis corresponds to the odds of a yes decision increasing by 2x:

From this, one sees that while the high group was ~4x more selective than the low group when it came to partners at the low end of the attractiveness, it was only ~ 1.5x as selective as the low group when it came to partners at the high end of the attractiveness spectrum.

The corresponding graphs with attractiveness replaced by intelligence and sincerity are

(Note the difference in scales on the axes: there was much less variation in perceptions of sincerity and intelligence than there was in perceptions of attractiveness.)

One sees that past a certain point, the high group is not responsive to increasing sincerity and intelligence, whereas the low group is.

Of course, the high group and the low group don't differ most with respect to their responsiveness to attractiveness, or intelligence, or sincerity as individual traits. They differ the most in how they respond to a tradeoff between attractiveness/fun and intelligence/sincerity. The graph that depicts this is:

In passing from partners for whom the tradeoff term is lowest to partners for whom its highest, the odds of being selected by members of the low group increase by 5.5x, whereas the odds of being selected by the members of the high group increase by only 1.4x.

The differences between the groups correspond to generalizable phenomena. In fact, I knew that the differences are statistically robust and generalizable before even doing a train/test split as I did above. What made it obvious to me is that the tradeoff coefficient correlates with many other features of the participants that were collected prior to the events...

To Be Continued...

The question now arises: who are the people who lie at the two ends of the continuum between relative preference for attractiveness/ fun and relative preference for intelligence / sincerity? How did they spend their time? What career paths did they pursue? How did members of the opposite sex view them?

I'll offer partial answers to this questions in my next post. Readers who are intrigued can take a look at the survey instrument for a list of features present in the dataset, and guess which features correlated with the personal tradeoff coefficient.

This post reports on a portion of my analysis of Fisman and Iyengar's speed dating dataset which bears on the question of how people select romantic partners.

Note: I made very substantial edits to the second to last section of this post having posted it, addressing questions of generalizability. I've also cross-posted to my blog.

Summary

Participants rated one another on several dimensions. The majority of variation in the ratings is captured by the average of the different rating types: some people were regarded as good overall, and others were regarded as not good overall.
The second most important source of variation in the ratings given to participants is that some were regarded as more attractive and fun than they were intelligent/sincere, and for others, the situation was reversed.
Broadly, when people had to chose between partners who were seen as attractive and fun and partners who were seen as intelligent and sincere, they had a moderately strong preference for partners who were seen as attractive and fun.
Individuals varied substantially in how they responded to the tradeoff, with some showing very strong preference for people who were seen as attractive and fun people, and others showed virtually no such preference.

The Predictive Power of Attractiveness

If we average over all participants, we find that participants of above average attractiveness had twice as many suitors as participants of below average attractiveness.

Individual responsiveness to attractiveness

A larger sample of dates per person
Better estimates of group consensus (based on ratings from a larger number of raters).

Still, the graph renders it plausible that the weight that people gave to attractiveness varied a lot, even if the variation was smaller than it is in the graph.

We could proceed to make "best guess" estimates of what the true distribution is, but we can get greater insight into what's going on by first adopting a shift in perspective.

Overall desirability and the tradeoffs

If we apply PCA to the 6 ratings, the first combination that the procedure gives is a weighted average where each rating gets almost equal weight:

good= 4* (Attractiveness) + 5*(Like) + 4*(Fun) + 4*(Intelligence) + 4*(Ambition) + 3*(Sincerity)

This can be thought of as corresponding to overall favorable impressions of somebody, so I named it "good." It captures roughly 60% of the information that was in the original ratings.

The second weighted average that PCA gives is not nearly as symmetric:

tradeoff = 4.5*(Attractiveness) + 3*(Like) + 3*(Fun) — 6*(Intelligence) — 2*(Ambition) — 5*(Sincerity)

attractive, fun and likable <-------------> sincere, intelligent and ambitious

The first cluster of traits is more closely connected with mainstream romance than the second cluster of traits, which are thought of as positive, but less relevant.

Individual differences in romantic preferences

LOR ~ 2*good + tradeoff + (general willingness to see partners again)

LOR ~ 2*good + (personal tradeoff coefficient)*tradeoff + (general willingness to see partners again)

where "personal tradeoff coefficient" is a constant that depends on the individual making the decision.

What this means in tangible terms

In my first draft of this post, I postponed discussion of statistical significance until later, but I subsequently realized that I could address it succinctly.

I formed the graphs below by:

Estimating participants' coefficients based on the first 65% of the dates that they went on. These dates are the train set for our model.
Forming a "high" and "low" groups of participants according to whether their coefficients were in the top or bottom 1/3^rd.
Restricting consideration to those dates that were not in the first 65% of dates. These dates are the test set for our model.

The first graph shows the frequency with which people's decision was 'yes' as as a function of their partners' attractiveness level.

The corresponding graphs with attractiveness replaced by intelligence and sincerity are

(Note the difference in scales on the axes: there was much less variation in perceptions of sincerity and intelligence than there was in perceptions of attractiveness.)

One sees that past a certain point, the high group is not responsive to increasing sincerity and intelligence, whereas the low group is.

To Be Continued...

While online dating offers a platform to obtain high size sample values from a naturalistic setting that should confer high external reliability. In speed dating all the interactions are enforced, because they are not the result of interaction of courtship. And other issues such as low population density (number of daters) and keeping artificially event operational sex ratios near 1:1. Speed dating eliminates the component of the pre-selection in the human mating. While in others systems as human natural leks (nightclubs, bar, etc) and online dating the attracting attention is the first goal. Attention is elicited through the display of signals that excite the interest of possible mates. In mating field non-verbal solicitation is mainly done by the female as a basis for the male decision to approach her.

What most studies tells us, is that since physical attractiveness (independent variable) is the limiting factor for both sexes (since other attributes act as dependent variables), I'm going to focus in this parameter to address other issues.

Attractiveness ratings:

In this article you illustrate how revealed preferences (, preferences inferred through a speed dating event) can be used to investigate the nature of mate preferences. You describe how revealed preferences can be estimated and how the reliability of these estimates can be established. Then revealed preference estimates were used to explore the level of consensus in judgments of who is and is not attractive and whether revealed preferences are systematically related to self-reported mate preferences and personality traits.

Some of the graphics are pretty obtuse me for me. I'd like to ask you if it would be possible to exposing other type of graphs where the data could become more clarifiers.

Participants of both genders showed substantial consensus in judgments of whom they found attractive and unattractive, but what sex showed higher consensus? Is the standard deviation in your speed dating study of attraction ratings for a specific opposite-sex face on average smaller when looking at a specific gender?

It seems that in most studies women have a higher variance in ratings of sex-objects than men (Jankowiak et al. 1992; Townsend & Wasserman 1997). But this should be taken with a grain of salt because attractiveness rankings have much higher variation when ranking males as opposed to females.

Schulman & Hoskins (1986) found that ratings of female photos had statistically significant lower variance than male photos for both male and female raters. Thus, the effect could partially be in that both sexes are worse at judging attractiveness of males.

There is never going to be rigorous agreement on any kind of informal attractiveness metric, so the subjective discussions are missing the point. And here’s where something interesting beings to happen with this whole rating system. The more imbalanced the mating dynamic becomes, the more asymmetric – in terms of their distribution between the male and female populations – these rankings become.

In their study is a notable absence of individuals at extremes of attractiveness. Rather, future work might best reveal decision rules by manipulating the distribution of quality among potential mates; such manipulations would predict if people, mainly females, are using sample-based or threshold-based decision rules. So, it comes to a point that I've usually observed on my own experiments, that male ‘ratings’ are bottom heavy in distribution, while female ‘ratings’ are top heavy (meaning there are more female 7′s than male 7′s, by virtue of the fact that a female 7 has a greater probability of attracting a male 7, than the reverse). Although this does not seem to be the case in your study.

Furthermore It would also be interesting to know the assortative/ disassortative mating coefficients. How do perceptions of male attractiveness differ from perceptions of female attractiveness? I know that a speed dating event does not represent a potentially robust source of attractiveness data. And it is clear, however, that the site’s audience may not be very representative of the population as a whole. Anyway I’d like if you could address one aspect of that problem by attempting to determine whether and how the distribution of male attractiveness in your speed dating sample differs from the distribution of female attractiveness: female/male population distribution. It seems your graph (which does not support that females are more selective, given that rating skew is a corollary of selectivity), which poses too many confounders in the data to rely upon too strongly.) differ substantially from those found here:

http://onlinelibrary.wiley.com/doi/10.1111/jomf.12072/full

http://blog.okcupid.com/index.php/your-looks-and-online-dating/

And what’s yes/no decisions distributions? Intuitively, several answers to this question seem plausible. On one hand, it seems anecdotally to be true that in our study there are not nobody extremely attractive people, many average looking people, and few extremely unattractive people. Such logic could lead one to predict a normal attractiveness distribution as your findings.

But Okcupid blog and Kreager/Cavanag study, for example, find this gaussian distribution only in male population distribution, since women women rate 80% of guys as worse-looking than medium.

I'd like to know whether participants’ ratings of hypothetical partners, for example, reflect whom they would actually choose to date (yes/no). I don’t understand the distribution of decisions/ attractiveness angles chart. What's the relationship between individuals’ own physical attractiveness (as rated by other users) and the attractiveness of the people they wanted to meet?

“men’s decision were yes for 48% of the dates in the sample, and women’s decisions were yes for 33% of the dates in the sample.”

I'd like to know whether participants’ ratings of hypothetical partners, for example, reflect whom they would actually choose to date (yes/no). ie, How the percentage of acceptances (number of acceptances) is distributed for each attractiveness range of males and females in this system?. For example, for a woman 4 in attractiveness, what is her total number of yeses she gave respect of all male daters? and for a 6 women? And of this number of acceptances or yeses, what percentage of acceptances is corresponding to men rated as 1…4, 5, 6,..?or how their offerings are distributed between the different spectrum of quality, since the optimal threshold depends on the attributes of prospective mates ( and her own quality), and the distribution of the quality of these ones.

By the other hand, it would be interesting try to find out on here if there is a genetically determined threshold (threshold-based decisions) or there is any other unlearned threshold (sampled-based decisions) . These considerations would also reveal a simple algorithm by which the threshold could be learned. Peter M. Todd et al tried to make that test. See http://141.14.165.6/CogSci09/papers/547/paper547.pdf

First, it's important to know ,analizing yes/no rates, if less attractive people is accepting less attractive dates (My own analysis of data from online dating suggests that this is not the case) or are focusing in most deserable opposite-sex individuals. And analyzing attractiveness data, if less attractive individuals' assessment are higher than those from most attractive ones. (i.e. if less attractive people do not delude themselves into thinking that their dates are more physically attractive than others perceive them to be). True that it could be a conditioning problem the absence of highly attractive individuals (top of the beauty scale) in the study sample.

It would be necessary introducing into the sample several highly attractive daters (>8 points) to see if this data tend to remain constant or conversely betray a predictable patterns in demonstrating a near universal preference for this very narrow range of male/female physical phenotypes.

It is a mistake the absence of a number of subjects that can be classified as highly attractive (above 8). A speed dating event does not represent a potentially robust source of attractiveness data (small size). And it is clear, however, that the site’s audience may not be very representative of the population as a whole. Most people are within the medium spectrum, and only a handful are good-looking.

I would say that real mate choice (in broader mating leks) is concentrated in a narrow population range, especially in female choice. Since the most reliable data / investigation (online dating and field courship) agree in this frame of observation. And what this tells us, is that since physical attractiveness was a limiting factor for BOTH sexes, and women are MORE selective in assessing attractive males – women are MORE likely (than men) to cull prospects according to assessments of physical attractiveness. Where women tend to fixate on the top ~ %10-20 of males. See: freakanomics data, http://jonmillward.com/blog/attraction-dating/cupid-on-trial-a-4-month-online-dating-experiment/, or my own experiment::https://sirtyrionlannister.wordpress.com/2014/02/23/female-mating-skew-ii-supported-by-online-dating-experiment/), considering the bottom %80 of males as, inexplicably, less than average (see OK Cupid data), the variance in that top %10-20 tends to split a lot of trivial hairs (making the differences harder to quantify, with respect to an attractiveness ranking system).

27

The Role of Attractiveness in Mate Selection: Individual Variation

27

Summary

The Predictive Power of Attractiveness

Individual responsiveness to attractiveness

Overall desirability and the tradeoffs

Individual differences in romantic preferences

What this means in tangible terms

To Be Continued...

27

27

The Role of Attractiveness in Mate Selection: Individual Variation

27

Summary

The Predictive Power of Attractiveness

Individual responsiveness to attractiveness

Overall desirability and the tradeoffs

Individual differences in romantic preferences

What this means in tangible terms

To Be Continued...

27