Punoxysm comments on Open Thread, April 27-May 4, 2014 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (200)
I have to say, I seriously don't get the Bayesian vs Frequentist holy wars. It seems to me the ratio of importance to education of its participants is ridiculously low.
Bayesian and frequentist methods are sets of statistical tools, not sacred orders to which you pledge a blood oath. Just understand the usage of each tools, and the fact that virtually any model of something that happens in the real world is going to be misspecified.
It's because Bayesian methods really do claim to be more than just a set of tools. They are supposed to be universally applicable.
This is a bit of an exaggeration.
Additionally, you are only talking about the 'sets of statistical tools', where in my experience the bigger disagreement often lies in whether a person accepts that probabilities can be subjective or not; And yes - this does matter.
Can you please give an example of where the possible subjectivity of probabilities matter? I mean this in earnest.
'From my point of view the probability for X is Y, but from his point of view at the time it would've been Z'. (subjective) vs 'The Probability for X is Y' ('objective').
Honestly though, frequentists use subjective probabilities all the time and you can argue that frequentism is just as subjective as bayesinism, so even that disagreement is quite muddy.
Can you be more concrete? When would this matter for two people trying to share a model and make predictions of future events?
Part of it is that Bayesianism claims to be not just a better statistical tool, but a new and better epistemology, a replacement and improvement over Aristotelian logic.
There are a bunch of issues involved. It hard to speak about them because the term Bayesianism is encompasses a wide array of ideas and everytime it's used it might refer to a different subset of that cluster of ideas.
Part of LW is that it's a place to discuss how an AGI could be structured. As such we care about the philosophic level of how you come to know that something is true. As such there an interest into going as basic as possible when looking at epistemology. There are issues about objective knowledge versus "subjective" Bayesian priors that are worth thinking about.
We live at a time where up to 70% of scientific research can't be replicated. Frequentism might not be to blame for all of that, but it does play it's part. There are issues such an the Bem paper about porno-precognition where frequentist techniques did suggest that porno-precognition is real but analysing Bems data with Bayesian methods suggested it's not.
There are further issues that a lot of additional assumptions are loaded into the word Bayesianism if you use that word on LessWrong. What Bayesianism taught me speaks about a bunch of issues that only have indirectly something to do with Bayesian tools vs. Frequentist tools.
Let's say I want to decide how much salt I should eat. I do follow the consensus that salt is bad and therefore have some prior that salt is bad. Then a new study comes along and says that low salt diets are unhealthy. If I want to make good decisions I have to ask: How much should I update? There no good formal way for making such decisions. We lack a good framework for doing this. Bayes rule is the answer to that problem that provides the promise of a solution. The solution to wait a few years and then read a meta review is unsatisfying.
In the absence of a formal way to do the reasoning, many people do use informal ways of updating towards new evidence. Cognitive bias research suggest that the average person isn't good at this.
That sentence is quite easy to say but it effectively means there no such thing as pure absolute objective truth. If you use tools A you get truth X and if you use tools B you get truth Y. Neither X or Y are "more true". That's not an appealing conclusion to many people.
Full disclosure: I have papers using B (on structure learning using BIC, which is an approximation to a posterior of a graphical model), and using F (on estimation of causal effects). I have no horse in this race.
See, this is precisely the kind of stuff that makes me shudder, that regularly appears on LW, in an endless stream. While Scott Alexander is busy bible thumping data analysts on his blog, people here say stuff like this.
Bayes rule doesn't provide shit. Bayes rule just says that p(A | B) p(B) = p(B | A) p(A).
Here's what you actually need to make use of info in this study:
(a) Read the study.
(b) See if they are actually making a causal claim.
(c) See if they are using experimental or observational data.
(d) Experimental? Do we believe the setup? Are we in a similar cohort? What about experimental design issues? Observational? Do they know what they are doing, re: causality-from-observational-data? Is their model that permits this airtight (usually it is not, see Scott's post on "adjusting for confounders". Generally to really believe that adjusting for confounders is reasonable you need a case where you know all confounders are recorded by definition of the study, for instance if doctors prescribe medicine based only on recorded info in the patient file).
(e) etc etc etc
I mean what exactly did you expert, a free lunch? Getting causal info and using it is hard.
p.s. If you skeptical about statistics papers that adjust for confounders, you should also be skeptical about missing data papers that assume MAR (missing at random). It is literally the same assumption.
You might want to read a bit more precisely. I did choose my words when I said "promise of a solution" instead of "a solution".
In particular MetaMed speaks about wanting to produce a system of Bayesian analysis of medical papers. (Bayesian mathematical assessment of diagnosis)
You miss the point. When it comes to interviewing candidates for job then we found out that unstructured human assessment doesn't happen that good.
It could very well be that the standard unstructured way of reading papers is not optimal and that we should have Bayesian beliefs nets in which we plug numbers such as whether the experiment is experimental or observational.
Whether MetaMed or someone else succeeds at that task and provides a good improvement on the status quo isn't certain but there are ideas to explore.
Is it clear that MetaMed as group of self professed Bayesians provide a useful service? Maybe, maybe not. On the other hand the philosophy on which MetaMed operates is not the standard philosophy on which the medical establishment operates.
I don't know how Metamed works (and it's sort of their secret sauce, so they probably will not tell us without an NDA). I am guessing it is some combination of doing (a) through (e) above for someone who cannot do it themselves, and possibly some B stats. Which seems like a perfectly sensible business model to me!
I don't think the secret sauce is in the B stats part of what they are doing, though. If we had a hypothetical company called "Freqmed" that also humanwaved (a) through (e), and then used F stats I doubt they would get non-sensible answers. It's about being sensible, not your identity as a statistician.
I can be F with Bayes nets. Bayes nets are just a conditional independence model.
I don't know how successful Metamed will be, but I honestly wish them the best of luck. I certainly think there is a lot of crazy out there in data analysis, and it's a noble thing to try to make money off of making things more sensible.
The thing is, I don't know about a lot of the things that get talked about on LW. I do know about B and F a little bit, and about causality a little bit. And a huge chunk of stuff people say is just plain wrong. So I tell them it's wrong, but they keep going and don't change what they say at all. So how should I update -- that folks on this rationalist community generally don't know what they are talking about and refuse to change?
It's like wikipedia -- the first sentence in the article on confounders is wrong on wikipedia (there is a very simple 3 node example that violates that definition). The talk page on Bayesian networks is a multi-year tale of woe and ignorance. I once got into an edit war with a resident bridge troll for that article, and eventually gave up and left, because he had more time. What does that tell me about wikipedia?
But we don't. MetaMed did come out of a certain kind of thinking. The project had a motivation.
Just because you know what the people in the statistic community mean when they say "Bayesian" doesn't automatically mean that you know what someone on LW means when he says Bayesian.
If you look at the "What Bayesianism taught me", there a person who changed their beliefs through learning about Bayesianism. Do the points he makes have something to do with Frequentism vs. Bayesianism? Not directly. On the other hand he did change major beliefs about he thinks about how the world and epistemology.
That means that the term Bayesianism as used in that article isn't completely empty.
Sensiblism might be a fun name for a philosophy. On the first LW meetup where I attended one of the participants had a scooter. My first question was about his traveling speed and how much time he effectively wins by using it. On that question he gave a normal answer.
My second question was over the accident rate of scooters. He replied something along the lines: "I really don't know, I should research the issue more in depth and get the numbers." That not the kind of answer normal people give when faced with the question for safety of the mode of travel.
You could say he's simply sensible while 99% of the population that out there that would answer the question differently isn't. On the other hand it's quite difficult to explain to those 99% that they aren't sensible. If you prod them a bit they might admit that knowing accident risks is useful for making a decision about one's mode of travel but they don't update on a deep level.
Then people like you come and say: "Well of course we should be sensible. There no need to point is about explicitly or to give it a fancy name. Being sensible should go without saying."
The problem is that in practice it doesn't go without saying and speaking about it is hard. Calling it Bayesianism might be a very confusing way to speak about it but it seems to be an improvement over having no words at all. Maybe tabooing Bayesianism as word on LW would be the right choice. Maybe the word produces more problems than it solves.
"In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable." is at the moment that sentence. How would you change the sentence? There no reason why we shouldn't fix that issue right now.
Counterexamples to a definition (this example is under your definition but is clearly not what we mean by confounder) are easier than a definition. A lot of analytic philosophy is about this. Defining "intuitive terms" is often not as simple as it seems. See, e.g.:
http://arxiv.org/abs/1304.0564
If you think you can make a "sensible" edit based on this paper, I will be grateful if you did so!
re: the rest of your post, words mean things. B is a technical term. I think if you redefine B as internal jargon for LW you will be incomprehensible to stats/ML people, and you don't want this. Communication across fields is hard enough as it is ("academic coordination problem"), let's not make it harder by not using standard terminology.
I am 100% behind this idea (and in general taboo technical terms unless you really know a lot about it).
Speaking of, an interesting paper which distinguishes the Fisher approach to testing from the Neyman-Pearson approach and shows how you can unify/match some of that with Bayesian methods.
It seems to me that there's a bigger risk from Bayesian methods. They're more sensitive to small effect sizes (doing a frequentist meta-analysis you'd count a study that got a p=0.1 result as evidence against, doing a bayesian one it might be evidence for). If the prior isn't swamped then it's important and we don't have good best practices for choosing priors; if the prior is swamped then the bayesianism isn't terribly relevant. And simply having more statistical tools available and giving researchers more choices makes it easier for bias to creep in.
Bayes' theorem is true (duh) and I'd accept that there are situations where bayesian analysis is more effective than frequentist, but I think it would do more harm than good in formal science.
Why would you do that? If I got a p=0.1 result doing a meta-analysis, I wouldn't be surprised at all since things like random-effects means it takes a lot of data to turn in a positive result at the arbitrary threshold of 0.05. And as it happens, in some areas, an alpha of 0.1 is acceptable: for example, because of the poor power of tests for publication bias, you can find respected people like Ioannides using that particular threshold (I believe I last saw that in his paper on the binomial test for publication bias).
If people really acted that way, we'd see odd phenomenon where people saw successive meta-analysts on whether grapes cure cancer: 0.15 that grapes cure cancer (decreases belief grapes cure cancer), 0.10 (decreases), 0.07 (decreases), someone points out that random-effects is inappropriate because studies show very low heterogeneity and the better fixed-effects analysis suddenly reveals that the true p-value is now at 0.05 (everyone's beliefs radically flip as they go from 'grapes have been refuted and are quack alt medicine!' to 'grapes cure cancer! quick, let's apply to the FDA under a fast track'). Instead, we see people acting more like Bayesians...
Is that a guess, or a fact based on meta-studies showing that Bayesian-using papers cook the books more than NHST users with p-hacking etc?
Turns out I am overoptimistic and in some cases people have done just that: interpreted a failure to reject the null (due to insufficient power, despite being evidence for an effect) as disproving the alternative in a series of studies which all point the same way, only changing their minds when an individually big enough study comes out. Hauer says this is exactly what happened with a series of studies on traffic mortalities.
(As if driving didn't terrify me enough, now I realize traffic laws and road safety designs are being engineered by vulgarized NHST practitioners who apparently don't know how to patch the paradigm up with emphasis on power or meta-analysis.)
No. The most basic version of meta-analysis is, roughly, that if you have two p=0.1 studies, the combined conclusion is p=0.01.
To all your points about the overloading of "Bayesian", fair enough. I guess I just don't see why that overloading is necessary.
Sure Bayes rule provides a formalization of updating beliefs based on evidence, but you can still be dead wrong. In particular, setting a prior on any given issue isn't enough. You have to be prepared to update for evidence of the form "I am really bad at setting priors". And really, priors are just a (possibly arbitrary) way of digesting existing evidence. Sometimes they can be very useful (avoiding privileging the hypothesis) but sometimes they are quite arbitrary.
According to the Slate Star Codex article Bem's results stand up to bayesian analysis quite well (that is, it has a strong Bayes factor). The only exception he mentioned was "I begin with a very low prior for psi phenomena, and a higher prior for the individual experiments and meta-analysis being subtly corrupt"; but there's nothing especially helpful about this in actually fixing the experimental design and meta-analysis.
How you get from AGI to epistemology eludes me. As long as the AGI can accurately model its interactions with the environment, that's really all it needs (or can hope) to do.
One of them is more useful for prediction and inference. They can guide you towards observing mechanisms useful for future hypothesis generation. That's all you can hope for. Especially in the case of "are low-salt diets healthy". A "Yes" or "No" to that question will never be truthful, because "health" and "for what segments of the population" and "in conjunction with what other lifestyle factors" are left underspecified. And you'll never get rid of the kernel of doubt that the low-sodium lobby has been the silent force behind all the anti-salt research this whole time.
The best you can do is provide enough evidence that anyone who points out your hypothesis is not truth can be reasonably called a pedant or conspiracy theorist, but not 100% guaranteed wrong.
As you might see, I am a fan of the idea of Dissolving epistemology.
Can you point to examples of these "holy wars"? I haven't encountered something I'd describe like that, so I don't know if we've been seeing different things, or just interpreting it differently.
To me it looks like a tension between a method that's theoretically better but not well-established, and a method that is not ideal but more widely understood so more convenient - a bit like the tension between the metric and imperial systems, or <geek warning> between flash and html5.
[etc.]
Ugh. Here is a good heuristic:
"Not in stats or machine learning? Stop talking about this."
Dude, I'm being genuinely curious about what "holy wars" he's talking about. So far I got:
... but zero actual answers, so I can't even tell if he's talking about some stupid overblown bullshit, or if he's just exaggerating what is actually a pretty low-key difference in opinion.
A "holy war" between Bayesians and frequentists exists in the modern academic literature for statistics, machine learning, econometrics, and philosophy (this is a non-exhaustive list).
Bradley Efron, who is arguably the most accomplished statistician alive, wrote the following in a commentary for Science in 2013 [1]:
In another paper published in 2013, Efron wrote [2]:
Thirty years ago, Efron was more critical of Bayesian statistics [3]:
The following bit of friendly banter in 1965 between M. S. Bartlett and John W. Pratt shows that the holy war was ongoing 50 years ago [4]:
For further reading I recommend [5], [6], [7].
[1]: Efron, Bradley. 2013. “Bayes’ Theorem in the 21st Century.” Science 340 (6137) (June 7): 1177–1178. doi:10.1126/science.1236536.
[2]: Efron, Bradley. 2013. “A 250-Year Argument: Belief, Behavior, and the Bootstrap.” Bulletin of the American Mathematical Society 50 (1) (April 25): 129–146. doi:10.1090/S0273-0979-2012-01374-5.
[3]: Efron, B. 1986. “Why Isn’t Everyone a Bayesian?” American Statistician 40 (1) (February): 1–11. doi:10.1080/00031305.1986.10475342.
[4]: Pratt, John W. 1965. “Bayesian Interpretation of Standard Inference Statements.” Journal of the Royal Statistical Society: Series B (Methodological) 27 (2): 169–203. http://www.jstor.org/stable/2984190.
[5]: Senn, Stephen. 2011. “You May Believe You Are a Bayesian but You Are Probably Wrong.” Rationality, Markets and Morals 2: 48–66. http://www.rmm-journal.com/htdocs/volume2.html.
[6]: Gelman, Andrew. 2011. “Induction and Deduction in Bayesian Data Analysis.” Rationality, Markets and Morals 2: 67–78. http://www.rmm-journal.com/htdocs/volume2.html.
[7]: Gelman, Andrew, and Christian P. Robert. 2012. “‘Not Only Defended but Also Applied’: The Perceived Absurdity of Bayesian Inference”. Statistics; Theory. arXiv (June 28).
For lots of "holy war" anecdotes, see The Theory That Would Not Die by Sharon Bertsch McGrayne.
Do you consider personal insults, accusations of fraud, or splitting academic departments along party lines to be "a pretty low-key difference in opinion"? If so, then it is "overblown bullshit," otherwise it isn't.
Ilya responded to your second paragraph not the first one. metric vs. imperial or flash vs. html5 are not good analogies.
The term "holy war" or "religious war" is often used to describe debates where people advocate for a side with an intensity disproportionate to the stakes, (e.g. the proper pronunciation of "gif", vi vs. emacs, surrogate vs. natural primary keys in the RDBM). That's how I read the OP, and it's fitting in context.
Sure, I'm just not sure which debates he's referring to ... is it on LessWrong? Elsewhere?
Various bits of Jaynes's "Confidence intervals vs Bayesian intervals" seem holy war-ish to me. Perhaps the juiciest bit (from pages 197-198, or pages 23-24 of the PDF):