That sounds more like arbitrage than money pumping; were you ever able to buy e.g. every book in a series from the same seller over a period of time, then sell the entire collection back at a premium to the same person?
Even then, the reason this happens might be plausibly explained by the changing information of the bookstore rather than actual intransitivity.
It has a lot more to do with how crazy my schedule has been lately.
And what the heck people, upvoting that to +4?
HEAD SCRATCH
It's a Schelling point, er, joke isn't the right word, but it's funny because the day was supposed to be a Schelling point. And you forgot about it.
Wait, the 14th? Oh crud. I... I meant to be there but I remembered it as 'my second weekend in boston' instead of an absolute date. So when my arrival was delayed by a week...
Oops.
This is simultaneously hilarious and weak evidence that the holiday isn't working as intended (though I think repeating the holiday every year will do the trick).
I'll put it this way: in the average GRE scores by intended field, education ranks below philosophy & STEM in every subtest, and various forms of education rank very low (early childhood education is, out of 50 groups, second from the bottom in 2 subtests and fifth from the bottom in the last subtest).
I notice that I'm confused: the maximum score on the Quantitative section is 800 (at that time), and Ph.D. econ programs won't even consider you if you're under a 780. The quantitative exam is actually really easy for math types. When you sign up for the GRE, you get a free CD with 2 practice exams. When I took it, I took the first practice exam without studying at all and got a 760 or so on the quantitiative section (within 10 pts). After studying I got a 800 on the second practice exam and on the actual exam, I got a 790. The questions were basic algebra for the most part with a bit of calculus and basic stat at the top end and a tricky question here and there. The exam was easy - really easy. I was a math major at a tiny / terrible liberal arts school; nothing like MIT or any self respecting state school. So it seems like it should be easy for anyone with a halfway decent mathematics background.
Now you're telling me people intending to major in econ in grad school average a 706, and people intending to major in math average a 733? That's low. Really low relative to my expectations. I would have expected a 730 in econ and maybe a 760 in math.
Possible explanations:
1) Tons of applicants who don't want to believe that they aren't cut out for their field create a long tail on the low side while the high side is capped at 800.
2) Master's programs are, in general, more lenient and there are a large number of people who only intend to go to them, creating the same sort of long tail effect as above in 1).
3) There's way more low-tier graduate programs than I thought in both fields willing to accept the average or even below average student.
4) Weirdness in how these fields are classified (e.g. I don't see statistics there anywhere, is that included in math?)
5) the quantitative section of the standard GRE actually doesn't matter if you're headed to a math or physics program (someone in that field care to comment?). Note: the quantitative section of the standard GRE does matter in econ, but typically only as a way to make the first cut (usually at 760 or 780, depending on the school). I don't know much of the details here though.
6) very few people actually study for the GRE like I did - i.e. buy a prep book and work through it. This depresses their scores even though they're much better quantitatively than I am.
Unsurprisingly since these are in when-I-though-of-them order, 1)-3) appeal to me the most, but 5) and 6) also seem plausible. I don't see why 4) would bias the scores down instead of up so it seems unlikely a priori.
Education even ranks below Religion in every category. Also, Economics is only quantitatively better than Religion. </abuse of ranked lists of things>
Not surprising, given my experience. Most religion majors I've met were relatively smart and often made fun of the more fundamentalist/evangelical types who typically were turned off by their religion classes. Religion majors seemed like philosophy-lite majors (which is consistent with the rankings).
Edit: Also, relative to Religion, econ has a bunch of poor english speakers that pull the other two categories down. (Note: the "analytical" section is/was actually a couple of very short essays)
Chicken-and-egg problem: Non-economics majors don't think economically enough to choose fields on the basis of their remuneration?
That seems to explain why Econ majors get a premium, but that doesn't seem to explain why econ majors don't rank higher, or am I missing something?
The fact that the authors ignored potential heterogeneity in responses IS a problem for their analysis, but their result is still evidence against heterogeneous responses.
Why do you say that? Did you look at the data?
They found F values of 0.77, 2.161, and 1.103. That means they found different behavior in the two groups. But those F-values were lower than the thresholds they had computed assuming homogeneity. They therefore said "We have rejected the hypothesis", and claimed that the evidence, which interpreted in a Bayesian framework might support that hypothesis, refuted it.
I didn't look at the data. I was commenting on your assessment of what they did, which showed that you didn't know how the F test works. Your post made it seem as if all they did was run an F test that compared the average response of the control and treatment groups and found no difference.
What they did do?
Are you saying the measurements they took make their final claim more likely, or that their analysis of the data is correct and justifies their claim?
Yes, if you arrange things moderately rationally, evidence against a homogenous response is evidence against any response, but much less so. I think Phil agrees with that too, and is objecting to a conclusion based on much less so evidence pretending to have much more justification than it does.
Both the t-test and the F-test work by assuming that every subject has the same response function to the intervention:
response = effect + normally distributed error
where the effect is the same for every subject.
The F test / t test doesn't quite say that. It makes statements about population averages. More specifically, if you're comparing the mean of two groups, the t or F test says whether the average response of one group is the same as the other group. Heterogeneity just gets captured by the error term. In fact, econometricians define the error term as the difference between the true response and what their model says the mean response is (usually conditional on covariates).
The fact that the authors ignored potential heterogeneity in responses IS a problem for their analysis, but their result is still evidence against heterogeneous responses. If there really are heterogeneous responses we should see that show up in the population average unless:
- The positive and negative effects cancel each other out exactly once you average across the population. (this seems very unlikely)
- The population average effect size is nonzero but very small, possibly because the effect only occurs in a small subset of the population (even if it's large when it does occur) or something similar but more complicated. In this case, a large enough sample size would still detect the effect.
Now it might not be very strong evidence - this depends on sample size and the likely nature of the heterogeneity (or confounders, as Cyan mentions). And in general there is merit in your criticism of their conclusions. But I think you've unfairly characterized the methods they used.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
No.
I think it's more than terminology. And if Mencius can be dismissed as someone who does not really get Bayesian inference, one can surely not say the same of Cosma Shalizi, who has made the same argument somewhere on his blog. (It was a few years ago and I can't easily find a link. It might have been in a technical report or a published paper instead.) Suppose a Bayesian is trying to estimate the mean of a normal distribution from incoming data. He has a prior distribution of the mean, and each new observation updates that prior. But what if the data are not drawn from a normal distribution, but from the sum of two such distributions with well separated peaks? The Bayesian (he says) can never discover that. Instead, his estimate of the position of the single peak that he is committed to will wander up and down between the two real peaks, like the Flying Dutchman cursed never to find a port, while the posterior probability of seeing the data that he has seen plummets (on the log-odds scale) towards minus infinity. But he cannot avoid this: no evidence can let him update towards anything his prior gives zero probability to.
What (he says) can save the Bayesian from this fate? Model-checking. Look at the data and see if they are actually consistent with any model in the class you are trying to fit. If not, think of a better model and fit that.
Andrew Gelman says the same; there's a chapter of his book devoted to model checking. And here's a paper by both of them on Bayesian inference and philosophy of science, in which they explicitly describe model-checking as "non-Bayesian checking of Bayesian models". My impression (not being a statistician) is that their view is currently the standard one.
I believe the hard-line Bayesian response to that would be that model checking should itself be a Bayesian process. (I'm distancing myself from this claim, because as a non-statistician, I don't need to have any position on this. I just want to see the position stated here.) The single-peaked prior in Shalizi's story was merely a conditional one: supposing the true distribution to be in that family, the Bayesian estimate does indeed behave in that way. But all we have to do to save the Bayesian from a fate worse than frequentism is to widen the picture. That prior was merely a subset, worked with for computational convenience, but in the true prior, that prior only accounted for some fraction p<1 of the probability mass, the remaining 1-p being assigned to "something else". Then when the data fail to conform to any single Gaussian, the "something else" alternative will eventually overshadow the Gaussian model, and will need to be expanded into more detail.
"But," the soft Bayesians might say, "how do you expand that 'something else' into new models by Bayesian means? You would need a universal prior, a prior whose support includes every possible hypothesis. Where do you get one of those? Solomonoff? Ha! And if what you actually do when your model doesn't fit looks the same as what we do, why pretend it's Bayesian inference?"
I suppose this would be Eliezer's answer to that last question.
I am not persuaded that the harder Bayesians have any more concrete answer. Solmonoff induction is uncomputable and seems to unnaturally favour short hypotheses involving Busy-Beaver-sized numbers. And any computable approximation to it looks to me like brute-forcing an NP-hard problem.
In response to:
and
I think a hard line needs to be drawn between statistics and epistemology. Statistics is merely a method of approximating epistemology - though a very useful one. The best statistical method in a given situation is the one that best approximates correct epistemology. (I'm not saying this is the only use for statistics, but I can't seem to make sense of it otherwise)
Now suppose Bayesian epistemology is correct - i.e. let's say Cox's theorem + Solomonoff prior. The correct answer to any induction problem is to do the true Bayesian update implied by this epistemology, but that's not computable. Statistics gives us some common ways to get around this problem. Here are a couple:
1) Bayesian statistics approach: restrict the class of possible models and put a reasonable prior over that class, then do the Bayesian update. This has exactly the same problem that Mencius and Cosma pointed out.
2) Frequentist statistics approach: restrict the class of possible models and come up with a consistent estimate of which model in that class is correct. This has all the problems that Bayesians constantly criticize frequentists for, but it typically allows for a much wider class of possible models in some sense (crucially, you often don't have to assume distributional forms)
3) Something hybrid: e.g., Bayesian statistics with model checking. Empirical Bayes (where the prior is estimated from the data). Etc.
Now superficially, 1) looks the most like the true Bayesian update - you don't look at the data twice, and you're actually performing a Bayesian update. But you don't get points for looking like the true Bayesian update, you get points for giving the same answer as the true Bayesian update. If you do 1), there's always some chance that the class of models you've chosen is too restrictive for some reason. Theoretically you could continue to do 1) by just expanding the class of possible models and putting a prior over that class, but at some point that becomes computationally infeasible. Model checking is a computationally feasible way of approximating this process. And, a priori, I see no reason to think that some frequentist method won't give the best computationally feasible approximation in some situation.
So, basically, a "hardline Bayesian" should do model checking and sometimes even frequentist statistics. (Similarly, a "hardline frequentist" in the epistemological sense should sometimes do Bayesian statistics. And, in fact, they do this all the time in econometrics.)
See my similar comments here and here.