shminux comments on Rationality Quotes September 2013 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (456)
Statements like "I reject the intelligence explosion hypothesis because it's not falsifiable."
I see. I doubt that it is as simple as naive Popperianism, however. Scientists routinely construct and screen hypotheses based on multiple factors, and they are quite good at it, compared to the general population. However, as you pointed out, many do not use or even have the language to express their rejection in a Bayesian way, as "I have estimated the probability of this hypothesis being true, and it is too low to care." I suspect that they instinctively map intelligence explosion into the Pascal mugging reference class, together with perpetual motion, cold fusion and religion, but verbalize it in the standard Popperian language instead. After all, that is how they would explain why they don't pay attention to (someone else's) religion: there is no way to falsify it. I suspect that any further discussion tends to reveal a more sensible approach.
Yeah. The problem is that most scientists seem to still be taught from textbooks that use a Popperian paradigm, or at least Popperian language, and they aren't necessarily taught probability theory very thoroughly, they're used to publishing papers that use p-value science even though they kinda know it's wrong, etc.
So maybe if we had an extended discussion about philosophy of science, they'd retract their Popperian statements and reformulate them to say something kinda related but less wrong. Maybe they're just sloppy with their philosophy of science when talking about subjects they don't put much credence in.
This does make it difficult to measure the degree to which, as Eliezer puts it, "the world is mad." Maybe the world looks mad when you take scientists' dinner party statements at face value, but looks less mad when you watch them try to solve problems they care about. On the other hand, even when looking at work they seem to care about, it often doesn't look like scientists know the basics of philosophy of science. Then again, maybe it's just an incentives problem. E.g. maybe the scientist's field basically requires you to publish with p-values, even if the scientists themselves are secretly Bayesians.
If there was a genuine philosophy of science illumination it would be clear that, despite the shortcomings of the logical empiricist setting in which Popper found himself , there is much more of value in a sophisticated Popperian methodological falsificationism than in Bayesianism. If scientists were interested in the most probable hypotheses, they would stay as close to the data as possible. But in fact they want interesting, informative, risky theories and genuine explanations. This goes against the Bayesian probabilist ideal. Moreover, you cannot falsify with Bayes theorem, so you'd have to start out with an exhaustive set of hypotheses that could account for data (already silly), and then you'd never get rid of them---they could only be probabilistically disconfirmed.
Strictly speaking, one can't falsify with any method outside of deductive logic -- even your own Severity Principle only claims to warrant hypotheses, not falsify their negations. Bayesian statistical analysis is just the same in this regard.
A Bayesian analysis doesn't need to start with an exhaustive set of hypotheses to justify discarding some of them. Suppose we have a set of mutually exclusive but not exhaustive hypotheses. The posterior probability of an hypothesis under the assumption that the set is exhaustive is an upper bound for its posterior probability in an analysis with an expanded set of hypotheses. A more complete set can only make a hypotheses less likely, so if its posterior probability is already so low that it would have a negligible effect on subsequent calculations, it can safely be discarded.
I'm a Bayesian probabilist, and it doesn't go against my ideal. I think you're attacking philosophical subjective Bayesianism, but I don't think that's the kind of Bayesianism to which lukeprog is referring.
For what it's worth, I understand well the arguments in favor of Bayes, yet I don't think that scientific results should be published in a Bayesian manner. This is not to say that I don't think that frequentist statistics is frequently and grossly mis-used by many scientists, but I don't think Bayes is the solution to this. In fact, many of the problems with how statistics is used, such as implicitly performing many multiple comparisons without controlling for this, would be just as large of problems with Bayesian statistics.
Either the evidence is strong enough to overwhelm any reasonable prior, in which case frequentist statistics wlil detect the result just fine; or else the evidence is not so strong, in which case you are reduced to arguing about priors, which seems bad if the goal is to create a societal construct that reliable uncovers useful new truths.
No, the multiple comparisons problem, like optional stopping, and other selection effects that alter error probabilities are a much greater problem in Bayesian statistics because they regard error probabilities and the sampling distributions on which they are based as irrelevant to inference, once the data are in hand. That is a consequence of the likelihood principle (which follows from inference by Bayes theorem). I find it interesting that this blog takes a great interest in human biases, but guess what methodology is relied upon to provide evidence of those biases? Frequentist methods.
Deborah, what do you think of jsteinhardt's Beyond Bayesians and Frequentists?
But why not share likelihood ratios instead of posteriors, and then choose whether or not you also want to argue very much (in your scientific paper) about the priors?
What do you think "p<0.05" means?
The p-value is "the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true." It is often misinterpreted, e.g. by 68 out of 70 academic psychologists studied by Oakes (1986, pp. 79-82).
The p-value is not the same as the Bayes factor:
I wasn't saying it was the same, my point is that reporting the data on which one can update in Bayesian manner is the norm. (As is updating, e.g. if the null hypothesis is really plausible, at p<0.05 nobody's really going to believe you anyway)
With regards to the Bayes factor. The issue is that there is a whole continuum of alternate hypotheses. There's no single factor between those that you can report on which could be used for combining evidence in favour of quantitatively different alternative "most supported" hypotheses. The case of the null hypothesis (vs all possible other hypotheses) is special in that regard, and so that is what a number is reported for.
With regards to the case of the ratio between evidence for two point hypotheses, as discussed in the article you link: Neyman-Pearson lemma is quite old.
With regards to the cause of experiment termination, you have to account somewhere for the fact that termination of the experiment has the potential to cherry pick and thus bias the resulting data (if that is what he's talking about, because its not clear to me what is his point and it seems to me that he misunderstood the issue).
Furthermore, the relevant mathematics probably originates from the particle physics, where it serves a different role: a threshold on the p-value is here to quantify the worst-case likelihood that your experimental apparatus will be sending people on the wild goose chase. It has more to do with the value of the experiment than probabilities, given that priors for hypotheses in physics would require a well defined hypotheses space (which is absent). And given that the work on the production of stronger evidence is a more effective way to spend your time there than any debating of the priors. And given that the p-value related issues in any case can be utterly dwarfed by systematic errors and problems with the experimental set up, something the probability of which changes after the publication as other physicists do or do not point towards potential problems in the set up.
A side note: there's a value of information issue here. I know that if I were to discuss Christian theology with you (not the atheism, but the fine points of the life of Jesus, that sort of thing, which I never really had time or inclination to look into), the expected value of information to you would be quite low. Because most of the time that I spent practising mathematics and such, you spent on the former. It would be especially the case if you entered some sort of very popular contest in any way requiring theological knowledge, and scored #10th of all time on a metric that someone else seen fit to chose in advance. The same goes for discussions of mathematics, but the other way around. This is also the case for any experts you are talking to. They're rather rational people, that's how they got to have impressive accomplishments, and a lot of practical rationality is about ignoring low expected value pursuits. Einsteins and Fermis of this world do not get to accomplish so much on so many different occasions without great innate abilities for that kind of thing. They also hold teaching positions and it is more productive for them to correct misconceptions in the eager students who are up to speed on the fundamental knowledge.
(with #10th I'm alluding to this result of mine ).
Mmm. I've read a lot of dumb papers where they show that their model beats a totally stupid model, rather than that their model beats the best model in the literature. In algorithm design fields, you generally need to publish a use case where your implementation of your new algorithm beats your implementation of the best other algorithms for that problem in the field (which is still gameable, because you implement both algorithms, but harder).
Thinking about the academic controversy I learned about most recently, it seems like if authors had to say "this evidence is n:1 support for our hypothesis over the hypothesis proposed in X" instead of "the evidence is n:1 support for our hypothesis over there being nothing going on" they would have a much harder time writing papers that don't advance the literature, and you might see more scientists being convinced of other hypotheses because they have to implement them personally.
In physics a new theory has to be supported over the other theories, for example. What you're talking about would have to be something that happens in sciences that primarily find weak effects in the noise and co-founders anyway, i.e. psychology, sociology, and the like.
I think you need to specifically mention what fields you are talking about, because not everyone knows that issues differ between fields.
With regards to malemployment debate you link, there's a possibility that many of the college graduates have not actually learned anything that they could utilize, in the first place, and consequently there exist nothing worth describing as 'malemployment'. Is that the alternate model you are thinking of?
(Your point is well taken but...)
Approximately it means "I have a financial or prestige incentive to find a relationship and I work in a field that doesn't take its science seriously".
Or, for instance in the case of particle physics, it means the probability you are just looking at background. You are painting with an overly broad brush. Sure, p-values are overused, but there are situations where the p-value IS the right thing to look at.
No, it's the probability that you'd see a result that extreme (or more extreme) conditioned on just looking at background. Frequentists can't evaluate unconditional probabilities, and 'probability that I see noise given that I see X' (if that's what you had in mind) is quite different from 'probability that I see X given that I see noise'.
(Incidentally, the fact that this kind of conflation is so common is one of the strongest arguments against defaulting to p-values.)
Keep in mind that he and other physicists do not generally consider "probability that it is noise, given an observation X" to even be a statement about the world (it's a statement about one's personal beliefs, after all, one's confidence in the engineering of an experimental apparatus, and so on and so forth), so they are perhaps conflating much less than it would appear under very literal reading. This is why I like the idea of using the word "plausibility" to describe beliefs, and "probability" to describe things such as the probability of an event rigorously calculated using a specific model.
edit: note by the way that physicists can consider a very strong result - e.g. those superluminal neutrinos - extremely implausible on the basis of a prior - and correctly conclude that there is most likely a problem with their machinery, on the basis of ratio between the likelihood of seeing that via noise to likelihood of seeing that via hardware fault. How's that even possible without actually performing Bayesian inference?
edit2: also note that there is a fundamental difference as with plausibilities you will have to be careful to avoid vicious cycles in the collective reasoning. Plausibility, as needed for combining it with other plausibilities, is not a real number, it is a real number with attached description of how exactly it was made, so that evidence would not be double-counted. The number itself is of little use to communication for this reason.
Well, technically, the probability that you will end up with a result given that you are just looking at background. I.e. the probability that after the experiment you will end up looking at background thinking it is not background*, assuming it is all background.
It's really awkward to describe that in English, though, and I just assume that this is what you mean (while Bayesianists assume that you are conflating the two).
Note that the 'brush' I am using is essentially painting the picture "0.05 is for sissies", not a rejection of p-values (which I may do elsewhere but with less contempt). The physics reference was to illustrate the contrast of standards between fields and why physics papers can be trusted more than medical papers.
That's what multiple testing correction is for.
With the thresholds from physics, we'd still be figuring out if penicillin really, actually kills certain bacteria (somewhat hyperbolic, 5 sigma ~ 1 in 3.5 million).
0.05 is a practical tradeoff, for supposed Bayesians, it is still much too strict, not too lax.
I'm willing to bet most scientists aren't taught these things formally at all. I never was. You pick it up out of the cultural zeitgeist, and you develop a cultural jargon. And then sometimes people who HAVE formally studied philosophy of science try to map that jargon back to formal concepts, and I'm not sure the mapping is that accurate.
I think 'wrong' is too strong here. Its good for some things, bad for others. Look at particle-accelerator experiments- frequentist statistics are the obvious choice because the collider essentially runs the same experiment 600 million times every second, and p-values work well to separate signal from a null-hypothesis of 'just background'.