Students who go to more selective colleges make more money later in life. The 2013-2014 PayScale College Salary Report gives starting salaries of ~$60k for graduates of Ivy League schools vs. ~$45k/year for graduates of mid-tier state schools, and mid-career salaries of ~$110k/year for graduates of Ivy League schools vs. ~$80k/year for mid-tier state schools, so the difference is about 30%. 

This partially reflects students who go to more selective colleges being more able and ambitious, as opposed to attending a more selective college boosting one's income. How much does attending boost income? The famous paper Estimating the Return to College Selectivity Over the Career Using Administrative Earning Data (2011) by Dale and Krueger raises the possibility that on average, attending a more selective college doesn't raise earnings at all. Specifically, controlling for

  • Student high school GPA
  • Student SAT score
  • The average SAT score of the colleges that a student applied for1
  • The number of applications that the student submitted
  • Various demographic factors (race, sex, parental education)

they found that as a group, there was no statistically significant difference in income later in life between students who went to more selective colleges and students who went to less selective colleges. Their finding is somewhat robust: it's based on a large (~10k) sample size, it's true both of the class of 1976 and the class of 1989, it's true of the class of 1976 from age 25 through age 50 and it's true both of men and of women.2,3

A couple of caveats:

  • Blacks, Hispanics, and children of parents who don't have college degrees and who attended more selective schools earned more than those who did not.
  • The 27 universities that the students were drawn from are highly selective: if one were to look at less selective universities, one might get different results.

Dale and Krueger's central finding has been taken to be evidence that going to a more selective college does not increase one's earnings, contrary to conventional wisdom.

 There are many factors that could give rise to a difference in income between those who attended more selective colleges, some of which favor those who attended less selective colleges, and could counterbalance others that increase earnings. I list some potential contributing factors below (some of which were pointed out by Dale and Krueger). In each case, whether these effects are present is unclear, and to the extent that they are there, the overall effect of them is sometimes unclear, but they could give rise to a difference between the two groups.

Selection effects

These have no bearing on whether going to a more selective college increases expected earnings.

Cost

The cost of attending a more selective college could (after taking a student's ability and merit-based financial aid into account) be higher or lower than the cost of attending a selective college. So the choice of attending a more selective school could reflect having a family that's willing to pay more for college (and/or the willingness of the student for his or her family to pay).

Future career plans

Students who choose to attend more selective schools could be more or less likely to go into academia than students who don't. Academics make less money than other people do (after controlling for factors such as GPA, SAT scores and conscientiousness). Similarly, those who attend more selective schools may have other career preferences that feed into expected earnings

For students who go on to professional school (law, medicine or business), college attended doesn't matter as much for later life prospects. So attending a more selective college could reflect lower intent to go on to professional school. The earnings of people who go onto professional school are generally higher than those of people who don't – this points in the direction of expected earnings being higher for those who go to less selective colleges.

Concern for prestige

Students who choose to attend more selective colleges plausibly care more about prestige. This desire for prestige could correspond to more desire to make money later in life – this points in the direction of expected earnings being higher for those who go to more selective colleges. 

Signaling

The impact of prestige of college attended on hiring decisions

According to research by Lauren Rivera, high paying elite professional service firms (investment banks, law firms, and management consulting firms) give a lot of weight to whether job applicants attended top four universities when making their hiring decisions.

More broadly, employers give weight to the prestige of college attended. However, the effect size is smaller than it might seem. In a 2013 Gallup Poll, 9% of business leaders said that the college a job applicant attended is "very important" to managers making hiring decisions, and 37% said that it's "somewhat important." Notably,

(i) employers listed college attended as the least important of the 4 factors that they were asked about

(ii) the American public gives more credence to employers weighting college attended when making hiring decisions: 30% think it's "very important" and 50% think it's "somewhat important." 

(The 2013 data is much more recent than the Dale-Krueger data, but still provides relevant evidence.) 

Once one is hired, where one went to college won't play a role in professional advancement at that firm (except to the extent that it built relevant skills). 

Influence on grades

Carl Shulman suggested that going to a more selective school reduces one's expected GPA, because of higher grading standards. It's unclear to me whether this is true, and the effect could even cut the other way, but there's plausibly an effect in some direction. Reduced GPA reduces one's prospects for getting into medical or law school. Perusing forums for applicants, one finds many people saying that GPA is a dominant factor in admissions, one that's far more important than prestige of college attended.

Influence on major choice

Attending a more selective school reduces one's relative standing amongst students in a given major. This can lead to students at more selective schools choosing a less demanding major than they otherwise would have (whether because the major requires more effort than it would have, because one finds it intolerable to be one of the weaker students majoring in a given subject, or for some other reason). In the 2013 Gallup Poll cited above, business leaders surveyed listed the subject that a student majored in as significantly more important than college attended, in the context of hiring decisions.

Treatment effects

Peer group

Having a more capable peer group can lead to better learning opportunities, and higher expected earnings. Ben Kuhn wrote 

By watching how more competent people work and think, you can often pick up useful study habits and better techniques for the subject you're studying. I've found this especially true in CS classes, where I've had this experience from both sides, e.g. teaching classmates how to use Git and picking up C coding style and tricks from better programmers. 

It can also give one access to better advice: Ben Kuhn also wrote:

Knowing talented students has given me info about several excellent courses, as well as summer opportunities, I wouldn't otherwise have known about. 

These considerations generally favor more selective schools, but not as strongly as might meet the eye: less selective schools often have honors courses and honors programs, where one might be able to meet students as capable as those who one would be interacting with at less selective colleges (though the best students at more selective colleges will generally be stronger than the best students at less selective colleges).

Networking benefits

Going to a more selective college will generally expose one to people who will be in higher places later on in life, and who will correspondingly be able to connect one with influential people in one's professional field, who may get one a high paying job and so forth. Such people may also serve as professional collaborators, for example, if one wants to do a startup right out of college.

As above, the effect here is smaller than might initially meet the eye, because one might be able to get similar benefits by interacting with the most capable students at a less selective college.

Confidence

Some people have suggested that being a "small fish in a big pond" reduces students' confidence relative to being a "big fish in a small pond." Assuming this, to the extent that confidence increase later life earnings, all else being equal, attending a more selective school will reduce expected earnings.

Better learning due to attention from professors

Some people have found that being a "big fish in a small pond" is conducive to getting more attention from the professors in one's classes, on account of standing out. This can increase the amount that a student learns, because of professors' greater willingness to spend time on personalized instruction.

Self-sufficiency

Some people who I know report to having a subjective sense that being in a less elite environment was helpful to them, because it forced them to be independent (on account of being different from their peers), whereas had they been in an elite environment, they would have "gone with the flow" and uncritically made their decisions based on what their peers were doing. All else being equal, this factor would increase expected earnings of those who attend less selective colleges.

Influence on major choice

As above, attending a more selective college can reduce the probability that one will major in a demanding subject. Aside from having signaling implications, this also reduces the chances that one will acquire technical skills required for higher paying jobs.

Influence on career choice

Going to a more selective college could nudge one toward or away from academia, going into the non-profit world, getting a professional degree, etc. Earnings vary across these fields, and this could give rise to a difference between the groups.

Implications

Before thinking seriously about the paper by Dale-Krueger, I had subscribed to the conventional wisdom that going to a more selective college generally boosts expected earnings. Thinking it over more carefully, it's now genuinely unclear to me whether this is the case, and it could reduce expected earnings in general.

Rather than taking the Dale-Krueger finding to be definitive, one should give some weight to conventional wisdom, on the grounds that the paper might have hidden methodological errors, or have ceased to be relevant in the present day. But in view of

  1. The Dale-Krueger finding
  2. The fact that there are a number of ways in which going to a more selective college could decrease earnings
  3. The fact that and the fact that the salient advantages of going to a more selective college are less significant than they might initially appear
  4. The fact that conventional wisdom is at least partially rooted in a conflation of correlation and causation

it seems reasonable to adopt a "best guess" that if going to a more selective college does increase expected earnings, the effect size isn't high.

What implications does this have for students who are trying to decide what college to go to, or how much to focus on getting into college? First some general considerations:

  • Rather than taking the Dale-Krueger finding to be definitive, one should give some weight to conventional wisdom, on the grounds that the paper might have hidden methodological errors, or have ceased to be relevant in the present day. But in view of (i) the Dale-Krueger finding (ii) the fact that there are many ways in which going to a more selective college could decrease earnings (iii) the fact that and the fact that conventional wisdom is at least partially rooted in a conflation of correlation and causation, it seems reasonable to adopt a "best guess" that if going to a more selective college does increase expected earnings, the effect size isn't high (perhaps on the order of ~2% – this is an intuitive guess based on the data given in the Dale-Krueger paper).
  • Even if the causal effects lead to there being no difference in expected earnings on average and at the moment, that doesn't mean that there's no difference for a rational actor who wants to maximize expected earnings. For example, if attending a more selective (or less selective) college were to reduce expected earnings on account of increasing the probability that students go into academia, if one doesn't want one's earnings to be reduced, one can simply buck the trend and not go into academia. However, there are certain casual effects over which one doesn't have control (such as the fact that elite firms look more favorably on students who graduated from top 4 colleges)
  • The casual effects will vary from person to person. For example, going to a college that elite finance firms look on favorably will have an effect for those who aspire to go into finance to a greater degree that it won't for those who don't aspire to go into finance. A high school student can try to assess which causal effects will apply to him or her. But it can be hard to tell ahead of time: before entering college, one might be unclear on (or mistake about) whether one wants to go into finance.
  • Expected earnings is not the only metric of the value of going to college. There's also the consumptive experience, as well as well as its impact on career success that's not reflected in earnings. It's plausible that going to a more selective college matters more for professional success for people who will be going on to academia than it does in general.
Irrespective of the considerations above:
  • The Dale-Krueger finding generally argues points in the direction of giving greater weight to cost differences between colleges than to differences in prestige. Depending on one's family's income and savings, going to Harvard can be cheaper than going to Berkeley, but for other families there will be a difference between the two on the order of $100k (which if invested for 30 years would grow to ~$800k).
  • The Dale-Krueger finding points in the direction of effort spent getting into college being less cost-effective than most people think, which should shift one in the direction of spending less effort on it and more effort on building employable skills, enjoying life, and contributing social value.

Footnotes

[1] You might wonder why the authors didn't control for the average SAT score of the colleges at which the students were accepted. The authors did something like this (actually, comparing students who had been accepted at the exact same set of colleges) in a 2002 paper, and obtained similar results. From the paper: "The matched applicant model and self-revelation model yielded coefficients that were similar in size, but the self-revelation model yielded smaller standard errors. Because of the smaller sample size in the present analysis, we therefore focus on the self-revelation model."

[2] The authors say this in the text of the paper, but when I look at Table 4 of the paper on page 32 of the PDF, the data seems to indicate that for women from the 1976 cohort broken down into age groups, there is a statistically significant difference, in favor of those who attended less selective schools. But I assume that I'm misinterpreting the table: I'd welcome any help interpreting the data on this point.

[3] The authors qualify this by saying "The estimates from the selection-adjusted models are imprecise, especially for the 1989 cohort. Thus, even though the point-estimates for the return to school quality are close to zero, the upper-bound of the 95 percent confidence intervals for these estimates are sometimes sizeable."

Cross-posted from the Cognito Mentoring blog 

New Comment
45 comments, sorted by Click to highlight new comments since: Today at 1:15 PM

I just don't believe in studies that "control" for things any more. I don't think that doing a linear regression, subtracting off what other variables predict, and seeing how college predicts the remainder, is something you should expect to reproduce the causal effect of a variable in a causal model where other variables correlated with your cause. I expect that you can pick various sets of control variables and get whatever result you want, and even if you only pick a reasonable-sounding set and stand by it (though I have no way of knowing if you did) I still think this is asking a statistical question whose answer is not the answer you want. There's some problems with the randomized test paradigm but it sure as hell beats the "control for these variables using a linear regression" paradigm.

I mean, I'd trust a study like this if Judea Pearl or one of his students endorsed the results and said it was done using correct causal modeling, but as it stands, my first response to this study is, "In general, I don't trust the kind of statistics you're using."

It is certainly true that you can get any results you want if you pick control variables based on how it changes the effect estimates. This is a major problem with almost all observational research, and greatly reduces its reliability. People have suggested getting around it by requiring all investigators to publish their protocols before data collection begins, so that they have to specify at that time exactly what variables they will control for. This would be a good start, but it does not appear to be on the horizon.

The only way to obtain a valid estimate of a causal effect, is to run a randomized controlled trial. However, this is not always feasible, and in situations where we really need information on causal effects but are unable to perform the experiment, the only thing we can do is to assume that nature ran a randomized controlled trial for us. This might have been a complicated trial where "nature" used one loaded dice in men with high GPA, another loaded dice in men with low GPA, a third dice in women with high GPA and a fourth dice in women with low GPA etc.

If the data came from such a trial, it will be possible to recover an estimate of the causal effect. In a simple situation, a linear regression model that controls for sex and GPA may be sufficient; but in most realistic settings, you will need models that are able to account for the fact that "exposure" varies with time. The important thing to note is that the validity of the estimate is only as valid as your belief in the untestable assumption that the data came from a randomized trial run by nature.

What Pearl's causal framework gives you, is a very powerful language for reasoning about whether it is reasonable to interpret observational data as if it came from a randomized controlled trial run by nature. Pearl's students will be able to tell you whether the framework has been applied correctly, but this is not sufficient to assess the trustworthiness of the result: For that, you need someone with subject matter expertise, ie, someone who is willing to make untestable claims about whether the map (the DAG model) matches the territory (the data generating mechanism)

My main point is that you shouldn't just automatically discount observational studies, but instead use DAG models to reason carefully about what level of confidence you assign to the investigators' implicit claim that the data came from a natural experiment where exposure was assigned by loaded dice that differed between the groups defined by the control variables.

-

Edited to add that if you believe the main problem with observational studies is that investigators choose their set of control variables in order to get the results they want, one solution to this is to use Don Rubin's propensity score matching method, which specifically avoids this problem. See this paper. The only problem with the propensity score method, is that it does not generalize to situations where exposure varies with time; in fact, students who are trained in the Rubin Causal Model (which competes with Pearl's graphical model) often become blind to a range of biases that arise because exposure varies with time.

So when Robin Hanson wants to know the real effect of health spending on health, he doesn't look for correlational control-variables studies on the effect of health spending on health, because he knows those studies will return whatever the researchers want it to say. What Robin does instead is look for studies that happen to control for health care spending, on the way to making some other point, and then look at what the correlation coefficient was in those studies, which aren't putatively about healthcare; and according to Robin the coefficient is usually zero.

This is an example of clever data, obtained despite of the researchers, which I might be inclined to trust - perhaps too much so, for its cleverness. But the scarier moral is that correlational studies are bad enough, and by the time you add in control variables, the researchers usually get whatever result they want. If you trust a result at all in a correlational study, it should be because you think the researchers weren't thinking about that result at all and were unlikely to 'optimize' it by accident while they were optimizing the study outcome they were interested in.

But the scarier moral is that correlational studies are bad enough, and by the time you add in control variables, the researchers usually get whatever result they want.

Hmm.

Here's my thinking about this in the context of the post.

If the presence of trait A precedes the presence of trait B, and there's correlation between trait A and trait B, then this establishes a prior that trait A causes trait B. The strength of the prior depends (in some sense) on the number of traits correlated with trait A that precede the presence of trait B, and one updates from the prior based on the plausibility of causal pathways in each case.

In the case of college attended and earnings, we have two hypotheses (that constitute the bulk of the probabilistic effect sizes) as to the source of the correlation: (i) going to a more selective college increases earnings, and (ii) traits that get people into more selective colleges increase earnings.

To test for (ii), one controls for features that feed into college admissions. GPA and SAT scores are the most easy of these to obtain data on, but there are others, such as class rank, extracurricular activities, essays, whether one is a strong athlete, whether one's parents are major donors to the college, etc. To pick up on some of these, the authors control for the average SAT score of the colleges that the student applied to, and number of applications submitted which measure of the student's confidence that he or she can get into selective colleges (the intuition being that if a student submits only a small number of applications and applies only to top colleges, he or she has confidence that he or she will get into one).

The question is then whether there are sufficiently many other metrics (with large publicly available data sets) of the characteristics that get students into college so that the authors could have cherry picked ones that move the correlation to be statistically indistinguishable from 0. Can you name five?

If the presence of trait A precedes the presence of trait B

You mean preceeds in time? What if A is my paternal grandfather's eye color (black), and B is my eye color (black)? Our eye color is correlated due to common ancestry, and A preceeds B in time. But A does not cause B. There are lots of correlated things in the world due to a common cause, and generally one of them preceeds another in time.

You can't talk about correlation and time like that. I think the only thing we can say is probably macroscopic retrocausation should be disallowed.


The way interventionists think about effects is that the effect of A on B in a person C is really about how B would change in a hypothetical person C' who differs from C only in that we changed their A. It's not about correlation, dependence, temporal order, or anything like that.

This approach might work sometimes, but I think it is problematic in most cases for the following reason:

Health care spending can only affect health through medical interventions (unless it is possible to extend someone's life by signalling that you care enough to spend money on health care).

If the study is designed to estimate the effect of some medical intervention, that intervention will be in the regression model. If you want to interpret the coefficient for health care spending causally, you have a major problem in that the primary causal pathway has been blocked by conditioning on whether the patient got the intervention. In such situations, the coefficient of health care spending would be expected to be zero even if it has a causal effect through the intervention.

The important thing to note is that the validity of the estimate is only as valid as your belief in the untestable assumption

Can I check that when you and others writing on statistics and causality talk about "untestable assumptions", you mean assumptions not testable by the experiment under discussion. Presumably the assumptions are based on previously acquired knowledge which may well have been tested, and had better be capable of being tested by other possible experiments; it's just that the present experiment is not capable of providing any further evidence about the assumption.

Good question! "Untestable assumption" can actually mean two different things:

In this context, you are correct to point out that I am talking about assumptions that are not testable by the data we are analyzing. I would be able to falsify my unconfoundedness assumption if I run an experiment where I first observe what value the treatment variable would take naturally in all individuals, then intervene so that everyone are treated, and look at whether the distribution of the outcome differs between the group who naturally would have had treatment, and the group who naturally would not have had treatment

In other contexts, there are other types of untestable assumptions, which are unfalsifiable even in principle. These relate to independences between counterfactual variables from different "worlds". Basically, they assume that certain columns in your ideal dataset are independent of each other, when it is impossible even in theory to observe those two columns in the whole population at the same time.

If you refuse to make assumptions of the second type, you will still be able to estimate the effect of any intervention that is identifiable in Pearl's causal framework NPSEM, but you will not be able to analyze mediation or causal pathways. This is the difference between Pearl's model NPSEM and Robins' model FFRCISTG. The refusal to make unfalsifiable assumptions about independences between cross-world counterfactuals is also the primary motivation behind the "Single World Intervention Graph" paper by Robins and Richardson, which Ilya linked to in another comment in this thread.

Good post, thanks. FFRCISTG still assumes SUTVA, which is untestable (also, like any structural equation model, it assumes absent arrows represent absence of individual level effects, which seems like it is also untestable (?)).


. . . someone should write up an explanation and post it here. :)

I think I might be confused by the concept of testability. But with that out of the way:

no, we really mean "untestable." SUTVA (stable unit treatment value assumption) is a two part assumption:

first, it assumes that if we give the treatment to one person, this does not affect other people in the study (unclear how to check for this...)

second, it assumes that if we observed the exposure A is equal to a, then there is no difference between the observed responses for any person, and the responses for that same person under a hypothetical randomized study where we assigned A to a for that person (unclear how to check for this either... talks about hypothetical worlds).


Causal inference from observational data has to rely on untestable assumptions to link what we see with what would have happened under a hypothetical experiment. If you don't like this, you should give up on causal inference from observational data (and you would be in good company if you do -- Ronald Fisher and lots of other statisticians were extremely skeptical).

It's not clear to me how large a class of statements you're considering untestable. Are all counterfactual statements untestable (because they are about non-existent worlds)?

To take an example I just googled up, page 7 of this gives an example of a violation of the first of the SUVTA conditions. Is that violating circumstance, or its absence, untestable even outside of the particular study?

Another hypothetical example would be treatment of patients having a dangerous and infectious disease. One would presumably be keeping each one in isolation; is the belief that physical transmission of microorganisms from one person to another may result in interference between patient outcomes untestable? Surely not.

Such a general concept of untestability amounts to throwing up one's hands and saying saying "what can we ever know?", while looking around at the world shows that in fact we know a great deal. I cannot believe that this is what you are describing as untestable, but then it is not clear to me what the narrower bounds of the class are.

At the opposite extreme, some assumptions called untestable are described as "domain knowledge", in which case they are as testable as any other piece of knowledge -- where else does "domain knowledge" come from? -- but merely fail to be testable by the data under present consideration.

It's not clear to me how large a class of statements you're considering untestable.

As I said, I am confused about the concept of testability. While I work out a general account I am happy with (or perhaps abandon ship in a Bayeswardly direction or something) I am relying on a folk conception to get statements that, regardless of what the ultimate account of testability might be, are definitely untestable. That is, we cannot imagine an effective procedure that would, even in principle, check if the statement is true.

The standard example is Smoking -> Tar -> Cancer

The statement "the random variables I have cancer given that I was _assigned_ to smoke and I have tar in my lungs given that I was _assigned_ not to smoke are independent" is untestable.

That's because to test this independence, I have to simultaneously consider a world where I was assigned to smoke, and another world where I was assigned not to smoke, and consider a joint distribution over these two worlds. But we only can access one such world at a time, unless we can roll back time, or jump across Everett branches.


Pearl does not concern himself with testability very much, because Pearl is a computer scientist, and to Pearl the world of Newtonian physics is like a computer circuit, where it is obvious that everything stays invariant to arbitrary counterfactual alterations of wires, and in particular sources of noise stay independent. But the applications of causal inference isn't on circuits, but on much mushier problems -- like psychology or medicine. In such domains it is not clear why assumptions like my example intuitively should hold, and not clear how to test them.


Such a general concept of untestability amounts to throwing up one's hands and saying saying "what can we ever know?"

This is not naive skepticism, this is a careful account of assumptions, a very good habit among statisticians, in my opinion. We need more of this in statistical and causal analysis, not less.

The statement "the random variables I have cancer given that I was assigned to smoke and I have tar in my lungs given that I was assigned not to smoke are independent" is untestable.

Can you give me a larger context for that example? A pointer to a paper that uses it would be enough.

At the moment I'm not clear what the independence of these means, if they're understood as statements about non-interacting world branches. What is the mathematical formulation of the assertion that they are independent? How, in mathematical terms, would that assumption play a role in the study of whether smoking causes cancer?

From another point of view, suppose that we knew the exact mechanisms whereby smoke, tar, and everything else have effects on the body leading to cancer. Would we then be able to calculate the truth or falsity of the assumption?

Since you asked for a paper, I have to cite myself:

http://arxiv.org/pdf/1205.0241v2.pdf

(there are lots of refs in there as well, for more reading).

The "branches" are interacting because they share the past, although I was being imprecise when I was talking about Everett branches -- these hypothetical worlds are mathematical abstractions, and do not correspond directly to a part of the wave function at all. There is no developed extension of interventionist causality to quantum theory (nor is it clear whether this is a good idea -- the intervention abstraction might not make sense in that setting).

Thanks, I now have a clearer idea of what these expressions mean and why they matter. You write on page 15:

Defining the influence of A on Y for a particular unit u as Y(1,M(0,u),u) involved a seemingly impossible hypothetical situation, where the treatment given to u was 0 for the purposes of the mediator M, and 1 for the purposes of the outcome Y.

For the A/M/Y = smoking/tar/cancer situation I can imagine a simple way of creating this situation: have someone smoke cigarettes with filters that remove all of the tar but nothing else. There may be practical engineering problems in creating such a filter, and ethical considerations in having experimental subjects smoke, but it does not seem impossible in principle. This intervention sets A to 1 and M to M(0,u), allowing the measurement of Y(1,M(0,u),u).

As with the case of the word "untestable", I am wondering if "impossible" is here being understood to mean, not impossible in an absolute sense, but "impossible within some context of available means, assumed as part of the background". For example, "impossible without specific domain knowledge", or "impossible given only the causal diagram and some limited repertoire of feasible interventions and observations". The tar filter scenario goes outside those bounds by using domain knowledge to devise a way of physically erasing the arrow from A to M.

I have the same question about page 18, where you say that equation (15):

Y(1,m) _||_ M(0)

is untestable (this is the example you expressed in words upthread), even though you have shown that it mathematically follows from any SEM of a certain form relating the variables, and could be violated if it has certain different forms. The true causal relationships, whatever they are, are observable physical processes. If we could observe them all, we would observe whether Y(1,m) _||_ M(0).

Again, by "untestable" do you here mean untestable within certain limits on what experiments can be done?

Richard, thanks for your message, and for reading my paper.


At the risk of giving you more homework, I thought I would point you to the following paper, which you might find interesting:

http://www.hsph.harvard.edu/james-robins/files/2013/03/wp100.pdf

This paper is about an argument the authors are having with Judea Pearl about whether assumptions like the one we are talking about are sensible to make. Of particular relevance for us is section 5.1. If I understood the point the authors are making, whenever Judea justifies such an assumption, he tells a story that is effectively interventional (very similar to your story about a filter). That is, what really is happening is we are replacing the graph:

A -> M -> Y, A -> Y

by another graph:

A -> A1 -> Y, A -> A2 -> M -> Y

where A1 is the "non tar-producing part" of smoking, and A2 is the "tar-producing part" of smoking (the example in 5.1 was talking about nicotine instead). As long as we can tell such a story, the relevant counterfactual is implemented via interventions, and all is well. That is, Y(A=1,M(A=0)) in graph 1 is the same thing as Y(A1=1,A2=0) in graph 2.


The true causal relationships, whatever they are, are observable physical processes. If we could observe them all, we would observe whether Y(1,m) || M(0).

The point of doing mediation analysis in the first place is because we are being empiricist -- using data for scientific discovery. In particular, we are trying to learn a fairly crude fact about cause-effect relationships of A, M and Y. If, as you say, we were able to observe the entire relevant DAG, and all biochemical events involved in the A -> M -> Y chain, then we would already be done, and would not need to do our analysis in the first place.

"Testability" (the concept I am confused about) comes up in the process of scientific work, which is crudely about expanding a lit circle of the known via sensible procedures. So intuitively, "testability" has to involve the resources of the lit circle itself, not of things in the darkness. This is because there is a danger of circularity otherwise.

My main point is that you shouldn't just automatically discount observational studies, but instead use DAG models to reason carefully about what level of confidence you assign to the investigators' implicit claim that the data came from a natural experiment where exposure was assigned by loaded dice that differed between the groups defined by the control variables.

I agree with this, but disagree with a number of other things you say here. I will add that sometimes the DAG is more complicated than observed exposure influenced by observed "baseline covariates" (what you call "control variables.") Sometimes causes of observed exposure are unobserved and influence observed outcomes -- but you can still get the causal effect (by a more complicated procedure than one that relies on variable adjustment).


in fact, students who are trained in the Rubin Causal Model (which competes with Pearl's graphical model) often become blind to a range of biases that arise because exposure varies with time.

Rubin competes with Pearl. The Rubin causal model does not compete with Pearl's graphical models, they are the same thing. Rubin just doesn't understand/like graphs. See here:

http://www.csss.washington.edu/Papers/wp128.pdf

Is it actually true that people using Rubin's model will not handle time-varying confounding properly? Robins expresses time-varying confounding problems using sequential ignorability, which I think ought to be quite simple to Rubin people.


one solution to this is to use Don Rubin's propensity score matching method, which specifically avoids this problem

I think you are just wrong here. With all due respect to Don Rubin, propensity score matching has nothing to do with avoiding bias, it is an estimation method for a functional you get when you adjust for confounding:

p(Y(a)) = \sum_{a} p(Y | a, c) p(c)

if conditional ignorability: (Y(a) independent of A given C) and SUTVA hold.

It is just one method of many for estimating the above functional, the other being inverse weights, or the parametric g-formula, or the doubly robust methods, or whatever other ways people have invented. None of these estimation methods are going to avoid the issue if the functional above is not equal to the causal effect you want. Whether that is true has to do with whether conditional ignorability is true in your problem or not, not whether you use propensity score methods.

I don't disagree with anything you are saying: There is nothing in the propensity score estimation method itself that makes the results less prone to bias, compared to other methods

I should have been more specific, my point was rather that the specific implementation of propensity score matching which Rubin recommends in the paper I linked to, allows the investigator to blind themselves to the outcome while assessing whether they have been able to create balanced groups. It is possible that you can do something similar with other estimation methods, but I haven't heard anyone talk much about it, and it is not immediately obvious to me how you would go about it

Thanks for the quick clarification! I guess I am not following you. If you want to blind yourself, you can just do it -- you don't need to modify the estimator in any way, just write the computer program implementing your estimator in such a way that you don't see the answer. This issue seems to be completely orthogonal to both causal inference and estimation. (???)

Am I missing something?

If you use the propensity score matching method, you begin by estimating the propensity score, then you match on the propensity score to create exposed and unexposed groups within levels of the propensity score. After you create those groups, there is a step where you can look at the matched groups without the outcome data, and assess whether you have achieved balance on the baseline covariates. If I understand Rubin's students correctly, they see this as a major advantage of the estimation method.

You can obviously blind yourself to the outcome using any estimation method, but I am not sure if there is a step in the process where you look at the data without the outcome to evaluate how confident you are in your work.

In order for the estimand of the propensity score method to be unbiased, the following is sufficient:

(a) SUTVA (this is untestable)

(b) Conditional ignorability (this is testable in principle, but only if we randomize the exposure A)

(c) The treatment assignment probability model (that is the model for p(A | C), where A is exposure and C is baseline covariates) must be correct.

It may be that the "balance property" tests a part of (b), but surely not all of it! That is, the arms might look balanced, but conditional ignorability might still not hold. We cannot test all the assumptions we need to draw causal conclusions from observational data using only observational data. Causal assumptions have to enter in somewhere!

I think I might be confused about why checking for balance without working out the effect is an advantage -- but I will think about it, because I am not an expert on propensity score methods, so there is probably something I am missing.

  • When the sample size is very large, getting whatever result you want requires using a lot of variables and/or contrived variables. The variables used don't seem particularly numerous or contrived.
  • The results of Dale and Krueger are consistent with attending a more selective college having a strong positive effect on earnings, and also consistent with attending a more selective college having a strong negative effect on earnings. But if it were true that attending a more selective college had a strong positive effect on earnings, would you expect there to be zero correlation after controlling for the variables that they do?
  • It would be better to take a randomly selected population of sufficiently large size to get statistical power and examine the causal pathways that led them to their current income level as opposed to a higher level or a lower level. I'm very interested in getting this sort of data, but it would be hard to get it at a sufficiently fine level of granularity (e.g. it's not possible to get records of how all of the hiring decisions were made) and I don't know of any such data sets that have even coarse information on the subject.

Hi,

I am not sure if I am following you, but the issue isn't sample size, the issue is whether your answer is biased or not. If your estimator is severely biased, it doesn't matter what the sample size is. That is, if the effect is 0, but your bias is -3, then you estimate -3 poorly at 100 samples, and better at 100000 samples, but you are still estimating a number that is not 0.

You seem to think that if the effect is 0, then getting it to look like it is not 0 requires very exotic scenarios or sets of variables, but that's not true at all. It is very easy to get bias. So easy that people even have word examples of it happening: "when I regressed the possession of an olympic gold medal on physical fitness, I got a strong effect, even adjusting for socioeconomic background, age, gender, and height -- to the olympic medal replica store!"


"Causal pathways" are a hard problem. You have to be very careful, especially if your data tracks people over time. The keywords here are "mediation analysis."


Pearl said that he does not deal with statistical issues, only identification. So perhaps he would not be the best person to judge a study (because statistical analysis itself is a difficult art, even if we get the causal part right). Perhaps folks at the Harvard, (or Hopkins, or Berkeley, or North Carolina? or Penn?) causal groups would be better.

See my response to Eliezer, as well.

Thanks. I don't have technical knowledge of statistics, and may be off base. I'll have to think about this more.

Do you disagree with my bottom line that the Dale Krueger study points (to a nontrivial degree) in the direction of having a prior that there's no effect?

I think the Lesswrong census data is pretty nice to understand what regression does.

It turns out that US Lesswrong readers are on average smarter than non-US Lesswrong readers.

If you try to find out whether a belief correlates with IQ and simply run a regression you sometimes get very different results when you control for whether the person comes from the US or whether you don't control for that fact.

Apart from all the theoretical arguments, regression studies frequently simply don't replicate. Even if you don't see a problem with a study if it doesn't replicate it's worthless.

If you try to find out whether a belief correlates with IQ and simply run a regression you sometimes get very different results when you control for whether the person comes from the US or whether you don't control for that fact.

Correlations of very different sizes? Or just differing signs? I would not be surprised by the latter. The former would surprise me, if it applied to a randomly selected belief.

Apart from all the theoretical arguments, regression studies frequently simply don't replicate.

The Dale-Krueger study does replicate in a sense: they got the same results for the 1989 cohort as they did for the 1976 cohort.

The US LW'ler are one average smarter, older and have higher income. If I rember right they also vote more often. But it's been a while till a played with the data so I don't want to say something wrong by being to detailed in my claims.

The Dale-Krueger study does replicate in a sense: they got the same results for the 1989 cohort as they did for the 1976 cohort.

For what value of "same"? Did they first analysed 1976 and published and years later analysed 1989 and came to the same conclusions or did they just throw all data together.

Did they first analysed 1976 and published and years later analysed 1989 and came to the same conclusions

Yes

Criticism of the traditional interpretation of Dale and Krueger study (more). I haven't read the original paper closely; has anyone else? I don't trust Half Sigma to give an unbiased interpretation.

Some people have suggested that being a "small fish in a big pond" reduces students' confidence relative to being a "big fish in a small pond." Assuming this, to the extent that confidence increase later life earnings, all else being equal, attending a more selective school will reduce expected earnings.

Paul Graham (Cornell + Harvard grad) thinks the opposite:

In addition to the power of the brand name, graduates of elite colleges have two critical qualities that plug right into the way large organizations work. They're good at doing what they're asked, since that's what it takes to please the adults who judge you at seventeen. And having been to an elite college makes them more confident.

Back in the days when people might spend their whole career at one big company, these qualities must have been very valuable. Graduates of elite colleges would have been capable, yet amenable to authority. And since individual performance is so hard to measure in large organizations, their own confidence would have been the starting point for their reputation.

I read the whole of the 2011 paper. Two comments on the Half Sigma criticism:

(a) It was written with reference to the 2002 paper, not the 2011 paper. (b) Regarding

But what it really says is that the average SAT score of a school is unimportant, what’s important is how highly “ranked” it is. I suspect that in many cases, when a student attended a school with a lower average SAT score, they did so because the school with the lower score was actually the more prestigious school.

The authors didn't just look at average SAT scores, they also looked at selectivity (as measured by the Barron's selectivity index). In the 2002 paper, they even matched students based on the exact schools they had been accepted to.

I wonder if students at the top elite schools are more likely to go into comparatively low-paying jobs like academia, philanthropy, or politics, compared to more students at second tier schools going into high-earning careers. I'd be very interested to see the % in each sector breakdown for differently ranked schools.

I wonder if students at the top elite schools are more likely to go into comparatively low-paying jobs like academia, philanthropy, or politics, compared to more students at second tier schools going into high-earning careers.

Politics is not a low-paying job, academia (if you can get tenure) isn't either.

Otherwise I think that this is likely because a lot of these students come from wealthy families.

It is compared to other careers that are available to smart people who test well. The average pay of a college professor is around 81k. Congresspeople get around 174k. Junior hedge fund portfolio managers make upwards of 600k, including their bonuses. Third year investment banking associates make 250-500k. And of course, they make more as time goes on, so these people are usually way younger than your average professor or congressperson.

Junior hedge fund portfolio managers make upwards of 600k, including their bonuses.

No, they don't. You're succumbing to huge survival bias. Some, successful, junior hedge fund portfolio managers make north of $600K and so make the news. A lot make much less or blow up and go out of business, but you don't hear about them and so assume they don't exist.

And, of course, just being smart and testing well does not automatically get you an invitation to Wall Street.

In any case, if "comparatively low-paying" means compared to Goldman Sachs managing partners, well...

There's a significant difference in income between the average high-IQ person who tries to be an investment banker vs. a politician or professor. The figure I saw was the average for people who made it that far, not people in the news, who make far more than that (the richest investment bankers have a net worth of over a billion). The other two professions are also extremely competitive at the top (most people who try never become professors or congresspeople. I would guess that becoming a member of congress is the most competitive.

A lot of individuals' outcome variance comes out of how their particulars (mis)match their situations.

For each college seeker, I'd advise: Seize a school that feels like a good/promising cultural fit. (Which may not even exist, for a lot of us.)

Connect with people in college with whom you fit. There will be some. Might be better to go to a school with a poor overall fit, as the outgroup solidarity will likely be stronger.

You're not going to connect with everyone. Aim to connect with a bunch well.

Dale and Krueger's paper was revised and published in the Journal of Human Resources under the new title "Estimating the effects of college characteristics over the career using administrative earnings data".

This is very old but I just wanted to say that I am basically considering changing my college choice due to finding out about this research. Thanks so much for putting this post up and spreading awareness.

Thanks, I'm glad to be able to help :-).

I read at the Oregonian, Oregon's largest circulation paper that only 1 of 100 students that seek a community college transfer degree succeed. However about 7 of 10 persons that go to Oregon's largest University graduate. Thus university is about 70 times more effective than community college at heightening earnings.

Thus if you know anyone attending community college urge them to immediately apply to full university. Its worth a few hundred thousand $ or more.

Notably online review sites say test prep is effective, thus a few funds used on test prep could permit immediate University transfer.

Kudos to online reference finders.

"Oregon's college completion rate is 15 percent for community college students, 54 percent for public four-year university students and 67 percent for students attending private four-year colleges, according to Complete College" http://www.oregonlive.com/education/index.ssf/2010/03/oregon_college_students_crash.html

University of Oregon's graduation rate is 67 per 100, so going directly to university is more than four times as likely to succeed.

I strongly support the value of education to simply know things or learn new abilities. A networking class, an Art or electronics course, a literature course; all of value even absent a certificate. Although I have not located the reference, I think another Oregonian article states that 8 of 100 nontransfer certificate seekers gain a certificate.

1 of 100 students that seek a community college transfer degree succeed

A few minutes of Google puts the figure at more like 60% if I'm unpacking "succeed" correctly, although I'd be happier with that number if I'd found citations not pointing to the National Student Clearinghouse. In any case, though, 1% doesn't pass the giggle test for me.

I'm a community college transfer currently at UC Davis. Some quick googling turned up these figures from 2004, which give a graduation rate of 84% for transfer students, compared to 89% for students who enrolled as freshmen, conditional on acheiving junior standing (the probability of graduation for any random student who enrolled directly out of highschool is actually lower than for transfers). These numbers are a decade old, but are roughly in line with my current experiences, so I don't expect them to have changed that much.