Rationality Quotes May 2013
Here's another installment of rationality quotes. The usual rules apply:
- Please post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote from Less Wrong itself, Overcoming Bias, or HPMoR.
- No more than 5 quotes per person per monthly thread, please.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (387)
-- Doug McDuff, M.D., and John Little, Body by Science, pp. ix-x
Lee Kelly.
-- Matthew Leifer
I like the quote, but I have a nitpick:
When I've lost in the first round of single-elimination tournaments, I've found myself hoping that the person who beat me would prove skilled enough to win the entire tournament. That way, my loss wouldn't mean that I totally sucked, but only that I wasn't the best. So I think the quoted observation fails to account for nuances relating to how losses inform us about our skill level.
--Clue (1985)
Dupe: http://lesswrong.com/lw/8n9/rationality_quotes_december_2011/5erg
I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive, - Randall Munroe.
That's been posted at least twice before that I can remember.
From Richard Feynman one last letter to his first wife, over a year after her death from TB (incidentally, antibiotics had been discovered and were being tested on humans a few months before her death; a year sooner, and she would have had a good chance of recovery):
Perfectly Reasonable Deviations from the Beaten Track: The Letters of Richard P. Feynman.
The whole letter and the rest of the book is well worth reading.
(I found this interesting, given that Feynman was likely the most instrumentally rational physicist ever, and definitely did not believe in any kind of afterlife -- he surely knew he was writing it for himself.)
Key-lock diaries aren't for no one but the writer.
Rodger Cotes defending Newton from the charge that his theory treats gravity as an occult cause:
This is a wild guess, but (on the assumption that you endorse this quote) is the thought that MWI stands in relation to experimentally testable physics as something like a metaphysical thesis, and that instrumentalism doesn't lack metaphysical theses of this kind, but simply refuses to acknowledge and examine them?
Anyway, a related quote, and so far as I know the oldest of this kind:
Reminds me of Popper (World of Parmenides):
Actually, it was someone asking what the heck I meant by "reality fluid", to which the answer is that I don't know either which is why I always call it "magical reality fluid". I mean, I could add in something that sounded impressive and might to some degree be helpful along the lines of "It's the mind-projection-fallacy conjugate of 'probability' as it appears inside hypotheses about collections of real things in which some real things are more predicted to happen to me than others for purposes of executing post-observation Bayesian updates, like, if the squared modulus rule appearing in the Born statistics reflected the quantity present of an actual kind of stuff" but I think saying, "It's magic, which is the mind-projection-fallacy conjugate of 'I'm confused'" would be wiser in a conversation like that. I think it's very important not to create the illusion of knowing more than you do, when you try to operate at the frontiers of your own ability to be coherent. At the same time, refusing to digress into metaphysics even to demarcate the things that confuse you, even to form ideas which can be explicitly incoherent rather than implicitly incoherent, is indeed to become the slave of the unexamined thought.
I wonder if others find the notion of "magical reality fluid" a useful moniker for "I have no clear idea of what's going on here, but something does, so I cannot avoid thinking about it". I confess it does nothing for me.
Hypothesis: Whether or not a readers finds that useful correlates with whether or not they've read this.
I wouldn't if it was the first time I read that phrase, but since I read EY's explanation of what he means by it I have had no trouble in remembering that. Sure, a long phrase full of hyphens starting with “whatever-the-hell-it-is-that-” would be clearer, but it would also be more of a PITA to type, so I can see why EY wouldn't use it.
FWIW, it does a fine job for me of conveying "I don't quite know what I'm talking about here."
Some people do (I have already received multiple comments to this effect). Mileage possibly varies.
Signalling sophistication and confidence when there is no object level reason for such confidence is one of the more destructive of human social incentives. I heartily endorse measures to prevent this. Seeing that someone is willing to admit uncertainty at the expense of their dignity increases the confidence I can have that their other expressions of confidence are more than social bullshit.
I would of course encourage you to stop using "magical reality fluid" as soon as possible. That is, after someone figures the philosophy (or epistemology or physics) out with something remotely approaching rigour.
Much as I love the idea of this and would like it to work for me, unfortunately as far as I can tell my brain simply treats "magical reality fluid" the same way as it would something bland like "degree of reality".
Though come to think of it, I'm not actually sure whether or not I've really been saying the magical part to myself all this time. I'll try to make sure I don't leave it out in the future, and see whether it makes a difference.
Thanks for the explanation.
When it comes to understanding how our universe evolves, religion and theology have been at best irrelevant. They often muddy the waters, for example, by focusing on questions of nothingness without providing any definition of the term based on empirical evidence. While we do not yet fully understand the origin of our universe, there is no reason to expect things to change in this regard. Moreover, I expect that ultimately the same will be true for understanding of areas that religion now considers its own territory, such as human morality.
Science has been effective at furthering our understanding of nature because the scientific ethos is based on three key principles: (1) follow the evidence wherever it leads; (2) if one has a theory, one needs to be willing to try to prove it wrong as much as one tries to prove that it is right; (3) the ultimate arbiter of truth is experiment, not the comfort one derives from one's a priori beliefs, nor the beauty or elegance one ascribes to one's theoretical models.
Lawrence M. Krauss, A Universe from Nothing, xvi
Warren Buffet on proponents of the efficient market hypothesis
--from the ongoing animation xkcd: Time, dialogue transcript found here
--Scott Aaronson, D-Wave: Truth finally starts to emerge
Well, academia itself has been attempting to get away from doing the opposite. This is most noticeable in fields like medicine and especially psychology, where anyone disagreeing with whatever the consensus is at the moment is considered an anti-scientific flat-earther, whereas the fact that this consensus itself nearly reverses every couple decades is rarely brought up. Furthermore, on the occasions when someone does bring it up, the standard response is to say that the strength of science is that it can change it's consensus.
Which vicennial cycles of academic consensus have you found most noticeable?
Well, the standard example is nutrition advice. Other reasonably well-known examples include whether post-menopausal women should take estrogen supplements, and how dangerous marijuana is. An example with a longer period is the whole issue with eugenics.
How many times has the academic consensus on those reversed, and does that match your original claim that for these century-plus old fields like medicine,
?
EDIT: feel free to reply to my challenge any time, Eugine.
I thought you might be thinking of nutrition and something like eugenics, but wasn't sure because I didn't think they fitted the criteria that well. Anyway, thanks for indulging my curiosity.
One interesting thing about eugenics, is that many of the people who supported the consensus on it while it was popular are still considered respectable whose support for eugenics is downplayed. Conversely, the people who opposed it while it was popular are still considered anti-science loons through the popular telling of misleading versions of history.
That's not true at all. It's those who made up their minds to be good but aren't who do the most evil.
I'm not sure that's true in aggregate. I think most of the evil is done by people going along with things - like, if you talked to them about it for a while they'd concede that some aspects of what they were going along with were sort of questionable and maybe a bit bad, but they don't think about that spontaneously.
Right, sure, every evil overlord needs a group of willing henchmen and an army of reluctant-to-object enablers. So, the original quote is probably right "in aggregate", though not in the amount of evil per person. Even then, how do you attribute/distribute the amount of evil between, say, Pol Pot ordering the destruction of intelligentsia and the genocide of the Chinese minority and a peasant working his rice field in the countryside, occasionally affirming his allegiance to the regime, as required? Hmm, I recall HPMoR!Quirrell talking about it, but I'm not sure how much of it is author tract.
Does this imply that it's in the act of making up your mind?
What do you think? When did Nero, Queen Isabella, Robespierre, Lenin or Pol Pot become evil and why?
I think the problem is that it should take more than one explicitly evil person per country to cause that much damage.
I saw the quote as an allusion to friendly/unfriendly AI.
--Clyde Coombs, A theory of data 1964, pg284,488
That really asks for the Samuel Johnson's refutation...
You mean, "if matter exists and I can sense it, then I will sense the collision of real objects"?
Johnson's refutation was of "the world doesn't exist," with "I can sense it." Coombs' statement is "interpretation of facts rest upon theories, which rest upon assumptions." This holds true for sense data- "if I am not hallucinating, there is a monitor in front of me."
No, I don't think so. Bishop Berkeley, after all, wasn't entirely clueless and was quite familiar with the sensory input. But what Samuel Johnson actually had in mind is besides the point.
It seems to me that the requirement to list assumptions for basic sensory data (absent a strong prior as in e.g. "I swallowed a strong psychoactive ten minutes ago") is rather pointless. Yes, solipsism may be correct, or the universe might be a simulation the code of which is about to be changed, etc. but once you put into doubt the basic sensory reality around you (for example, a big stone in front of your foot) you will quickly be forced to assume it back or the substrate for your mind might not survive.
It's kinda like the off-switch problem -- I think it comes from Iain Banks' Culture novels. The Minds, the super-intelligent AIs, love to go off into virtual worlds and play with, say, architecture in a six-dimensional space with varying gravity. They find much more utility by staying in the virtual reality compared to the actual one. But -- their "bodies", the computing substrate is still in reality. And if someone flips the off-switch in reality while the Mind is being happy in the virtual world, well...
It's not clear to me why you think this. Repeating it every time is tiresome, sure, and so that's why the assumptions should be implicitly stated rather than explicitly stated, unless explicitly stating them helps in that situation.
But the central claim is that "all data is theory-laden," which is an important point. It applies to what we perceive "directly" just as well as it applies to the chemical composition of photographs of distant galaxies (to use David Deutsch's example), and so I don't see how a Johnsonian objection would apply.
An important point, yes, but one which should not be reduced to an absurdity. If, while walking, I stub my toe on a rock, which assumptions and theory make this fact "theory-laden"?
You explicitly stated "If, while waking...".
I once had a dream where I was explaining to someone that I could not possibly be dreaming...
Daniel Dennett
Out of curiosity, would you happen to know which book this is from?
Daniel Dennett
Does Dennett offer supporting arguments for this assertion?
Daniel Dennett
William James
Daniel Dennett
Is this from the new book?
Yes.
And yes, it's great.
Marcel Kinsbourne, quoted in Dennett (2013)
Matthew 7:13-14
Bo Dahlbom
-Paul Graham
Is he basically proposing to do the opposite of Occam's razor?
No, he's correcting for a self-serving bias in the way humans generate explanations.
Even more from the same source:
More from Scott Aaronson:
Replying to a many-world-like question
While the linked to blog post discussion is somewhat interesting it seems misleading to call it a 'many-world-like question'. In trying to extract the 'rationalist moral' from the link perhaps the best quote that I can extract is the preceding sentence:
At a stretch the 'rationalist moral' could be the general principle 'Don't make logical errors just because infinity confuses you'. (I'd certainly endorse that as an often neglected insight!)
-- Lou Scheffer
(Most recent example from my own life that springs to mind: "It seems incredibly improbable that any Turing machine of size 100 could encode a complete solution to the halting problem for all Turing machines of size up to almost 100... oh. Nevermind.")
Pretty sure that also happens in fields other than the hard sciences. For example, it is said that converts to a religion are usually much more fervent than people who grew up with it (though there's an obvious selection bias).
(The advanced, dark-artsy version of this is claiming with a straight face to never have believed A in the first place, and hope the listener trusts what you're saying now more than their memory of what you said earlier, and if it doesn't work, claim they had misunderstood you. My maternal grandpa always tries to use that on my father, and almost always fails, but if he does that I guess it's because it does work on other people.)
The operative glory is doing it in five seconds.
And, being right.
That's harder to distinguish from the outside.
I've also found this to be medium evidence that I'm not as informed about the subject as I thought that I was, so I back down by confidence somewhat. If I recently made an error that would have resulted in something very bad happening, I should be very careful about thinking that my next design is safe.
That does (did?) seem improbable to me. I'd have expected n needed to be far larger than 100 before the overhead became negligible enough for 'almost n' to fit (ie. size 10,000 gives almost 10,000 would have seemed a lot more likely than size 100 gives almost 100). Do I need to update in the direction of optimal Turing machine code requiring very few bits?
I mentally replaced “100” with “N” anyway (and interpreted “almost N” in the obvious-in-the-context way).
You mentally threw away relevant information. ie. You merely made yourself incapable of thinking about what is claimed about the size of c relative to 100. That's fine but ought to indicate to you that you have little useful information to add in response to a comment that amounts to an expression of curious surprise that (c << 100).
Where the context suggests it can be interpreted as an example of the Eliezer's-edits bug?
I hadn't read the before-the-edit version of the comment.
In general, probably yes. Have you checked out the known parts of the Busy Beaver sequence? Be sure to guess what you expect to see before you look.
In specific, I don't know the size of the constant c.
So what's the program? Is it the one that runs every turing machine up to length 100 for BusyBeaver(100) steps, and gets the number BusyBeaver(100) by running the BusyBeaver_100 program whose source code is hardcoded into it? That would be of length 100+c for some constant c, but maybe you didn't think the constant was worth mentioning.
Well, it's still encoded. But I actually meant to say "almost 100" in the original. And yes, that's the answer.
David Eagleman, Incognito, p. 71
Minor nitpick:
The reason we can learn the local language is that languages are memetically selected for learnability by humans.
Wouldn't we see more regularity in the structure of languages then? English and classical Latin are almost opposites by every measure I can think to apply to a language (complexity of grammar, diversity of vocabulary and idiom, etc.). This doesn't seem like a good assumption.
English and Latin aren't even anywhere near as different as two natural languages can be. Take a look at this for a quick example, and take a look at the Language Construction Kit (it's about constructed languages, but AFAICT most of the things exemplified aren't completely unheard-of among natural languages) for a lot more.
(There are quite a few linguistic universals, but I'm not entirely convinced that all of them exist because a language flouting one of them would be unlearnable by humans, rather than (say) because they were inherited from a common ancestor.)
There doesn't have to be one solution to "memetically selected for learnabillity"
So is everything else except biology and physics.
The memetic evolution of baroque music in Europe is a development towards learnability? There are probably no more than 100 people alive that can make their way through Bach's 2nd Partita for violin.
I'm pretty sure you're underestimating that by...a lot. Fermi estimate time:
Bach's sonatas and partitas for solo violin are a cornerstone of the violin repertory. We may therefore assume that every professor of violin at a major university or conservatory has performed at least one of them at least once, just like we may assume that every professor of mathematics has studied the Lebesgue dominated convergence theorem. How many professors of violin are there? Let's just consider one country, the United States. Each state in the U.S. has at least two major public universities (typically "University of X" and "X State University", where X is the state); some have many more, and this doesn't even count private universities. Personal experience suggests that the average big state university has about one professor of violin. There are 50 states in the U.S., so that's 100 people already right there. And we have yet to count:
Thus, it wouldn't surprise me at all if there were at least 10,000 people alive who have performed one of the sonatas and partitas (to say nothing of those who would be capable of performing them). There are six of these works in total, so we can divide this already-conservative estimate by six to (under)estimate the number who have performed the Second Partita in particular. (This is likely an underestimate because many of them will have performed more than one -- indeed, all six, in a fair number of cases.)
A glance at the recordings available on Amazon, sorted by release date may help put things into perspective.
The estimate "no more than 100 alive who can make it through" would be much more appropriate for a difficult contemporary work (like, say, Melismata by Milton Babbitt) than a 300-year-old standard.
It might be more accurate to say that pretty much everything, including what we call biology and physics -- humans are the ones codifying it -- is memetically selected to be learnable by humans. Not that it all develops towards being easier to learn.
Ever notice how you never hear humans playing music that humans aren't capable of learning to play? I think there may be some selection effects at play here...
Well, I never notice the satisfaction of that contradiction, quite, but I do notice that the history of baroque music includes the steady achievement of theretofore unreached technical difficulty.
Joseph Heller, Catch-22
explaining /= explaining away
This is technically true for inclusive definitions of 'want' but highly misleading. There is a world of difference between "I want X but the opportunity cost (Y) is too great" and "I actively prefer !X". X and Y may be the prevention of parasitic worm infections and combating malaria. Precisely which limited resource is being allocated (time or money) changes little.
If "I don't have time" is to be replaced with an expression which conveys more personal acceptance of responsibility then it would be reasonable to translate it to "I have other priorities" but verging on disingenuous to translate it into "I don't want to".
I think you're reading this too literally. To my mind this says "You have the power to allocate your time" which is a non-trivial realization to some people. You can also understand this as saying "You allocate time to tasks according to how much you want to do them", an observation which also does not always rise to the conscious level.
This also requires a strange definition of "want" in order to become correct. Actions chosen for instrumental reasons sometimes differ from both the emotional urge and the all-else-equal reasoned preference, and so it's not particularly natural to include them under the label of "wanting".
I see no problems with filing "actions chosen for instrumental reasons" under the category of "want" in this context. They could be consolidated with their goal, anyway -- for time allocation purposes there is not much sense in separating "walking to the fridge and opening it" out of the general "get a beer".
This becomes problematic when you try to distinguish an instrumental decision from its terminal valuation, for example "I don't want to be commuting to work, but I choose to do so in order to get there." (negative all-else-equal valuation, positive instrumental valuation).
Again: in this context. Sometimes you need to decompose instrumentality from its terminal goal, sometimes you don't need to.
Citation? I've read the Tao Teh Ching in a few translations and I don't recognize that at all; a Google and Google Books makes it sound like the usual apocrypha.
Yeah, I could only find it in Google so I don't know the actual source. Lao Tzu is as good as any name, I suppose, if the name is translated literally.
Leibniz in Neal Stephenson's The Confusion
Huh, I thought the point of atoms is that they're not infinitely small.
Liebniz didn't like that.
Daniel Waterhouse says to Hooke in Neal Stephenson's Quicksilver
-- Megan McArdle, trying to explain Bayesian updates and the importance of making predictions in advance, without referring to any mathematics.
This annoys me because she doesn't talk at all about the power of the study. Usually, when you see statistically insignificant positive changes across the board in a study without much power, its a suggestion you should hesitantly update a very tiny bit in the positive direction, AND you need another study, not a suggestion you should update downward.
When ethics prevent us from constructing high power statistical studies, we need to be a bit careful not to reify statistical significance.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn't even matter that it is positive. An analogy: Suppose I tell you that eating garlic daily increases your IQ, and point to a study with three million participants and P < 1e-7. Vastly significant, no? Now it turns out that the actual size of the effect is 0.01 points of IQ. Are you going to start eating garlic? What if it weren't garlic, but a several-billion-dollar government health program? Statistical significance is indeed not everything, but there's such a thing as considering the size of an effect, especially if there's a cost involved.
Moreover, please consider that "consistent with zero" means exactly that. If you throw a die ten times and it comes up heads six, do you "hesitantly update a very tiny bit" in the direction of the coin being biased? Would you do so, if you did not have a prior reason to hope that the coin was biased?
I respectfully suggest that you are letting your already-written bottom line interfere with your math.
I strongly disagree.
An old comment of mine gives us a counterexample. A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults. This is very much a practically significant effect, and it's probably an underestimate for reasons I'll omit for brevity — look at the paper if you're curious.
If you do look at the paper, notice figure 1, which summarizes the results of the 8 individual RCTs the meta-analysis used. Even though all of the RCTs had sample sizes in the thousands, 7 of them failed to show a statistically significant effect, including the 4 largest (sample sizes 5139, 5085, 3711 & 3310). The effect is therefore "so small that a sample of several thousand is not sufficient to reliably observe it", but we would be absolutely wrong to infer that "it doesn't even matter that it is positive"!
The heuristic that a hard-to-detect effect is probably too small to care about is a fair rule of thumb, but it's only a heuristic. EHeller & Unnamed are quite right to point out that statistical significance and practical significance correlate only imperfectly.
That's a curious metric to choose. By that standard taking aspirin is about as healthy as playing a round of Russian Roulette.
I'd assume they mean something like the per-year risk of dying from cancer conditional on previous survival -- if they indeed mean the total lifetime risk of dying from cancer I agree it's ridiculous.
It's a fairly natural metric to choose if one wishes to gauge aspirin's effect on cancer risk, as the study's authors did.
Fortunately, the study's authors and I also interpreted the data by another standard. Daily aspirin reduced all-cause mortality, and didn't increase non-cancer deaths (except for "a transient increase in risk of vascular death in the aspirin groups during the first year after completion of the trials"). These are not results we would see if aspirin effected its anti-cancer magic by a similar mechanism to Russian Roulette.
Pardon me. Mentioning only curiosity was politeness. The more significant meanings I would supplement with are 'naive or suspicious'. By itself that metric really is worthless and reading this kind of health claim should set off warning bells. Lost purposes are a big problem when it comes to medicine. Partly because it is hard, mostly because there is more money in the area than nearly anywhere else.
And this is the reason low dose asprin is part of my daily supplement regime (while statins are not).
"All cause mortality" is a magical phrase.
I recently stopped with the low dose aspirin, the bleeding when I accidentally cut myself has proven to be too much of an inconvenience. For the time being, at least.
Am I missing a subtlety here, or is it just that cancer is usually one of those things that you hope to live long enough to get?
Yeah, pretty much. There are other examples of this where something harmful appears to be helpful when you don't take into account possible selection biases (like being put into the 'non-cancer death' category); for example, this is an issue in smoking - you can find various correlations where smokers are healthier than non-smokers, but this is just because the unhealthier smokers got pushed over the edge by smoking and died earlier.
tl;dr: NHST and Bayesian-style subjective probability do not mix easily.
Another example of this problem: http://slatestarcodex.com/2014/01/25/beware-mass-produced-medical-recommendations/
Does vitamin D reduce all-cause mortality in the elderly? The point-estimates from pretty much all of the various studies are around a 5% reduction in risk of dying for any reason - pretty nontrivial, one would say, no? Yet the results are almost all not 'statistically significant'! So do we follow Rolf and say 'fans of vitamin D ought to update on vitamin D not helping overall'... or do we, applying power considerations about the likelihood of making the hard cutoffs at p<0.05 given the small sample sizes & plausible effect sizes, note that the point-estimates are in favor of the hypothesis? (And how does this interact with two-sided tests - vitamin D could've increased mortality, after all. Positive point-estimates are consistent with vitamin D helping, and less consistent with no effect, and even less consistent with it harming; so why are we supposed to update in favor of no help or harm when we see a positive point-estimate?)
If we accept Rolf's argument, then we'd be in the odd position of, as we read through one non-statistically-significant study after another, decreasing the probability of 'non-zero reduction in mortality'... right up until we get the Autier or Cochrane data summarizing the exact same studies & plug it into a Bayesian meta-analysis like Salvatier did & abruptly flip to '92% chance of non-zero reduction in mortality'.
Such a study might show that it doesn't matter on average. But you'd need those numbers to see if it's increasing the spread of values. That would mean that it really helps some and hurts others. If you can figure out which is which, then it'll end up being useful. Heck, this applies even if the average effect is negative.
I don't know how often bio-researchers treat the standard deviation as part of their signal. I suspect it's infrequent.
How large was your prior for "insurance helps some and harms others, and we should try to figure out which is which" before that was one possible way of rescuing insurance from this study? That sort of argument is, I respectfully suggest, a warning signal which should make you consider whether your bottom line is already written.
I wasn't even thinking of insurance here. You were talking about garlic. I was thinking about my physics experiments where the standard deviation is a very useful channel of information.
If I throw a die and it comes up heads, I'd update in the direction of it being a very unusual die. :-)
Have you read the study in question? The treatment sample is NOT several thousand, its about 1500. Further, the incidence of the diseases being looked at are only a few percent or less, so the treatment sample sizes for the most prevalent diseases are around 50 (also, if you look at the specifics of the sample, the diseased groups are pretty well controlled).
I suggest the following exercise- ask yourself what WOULD be a big effect, and then work through if the study has the power to see it.
Yes, but in this case, the sample sizes are small and the error bars are so large that consistent with zero is ALSO consistent with 25+ % reduction in incidence (which is a large intervention). The study is incapable from distinguishing hugely important effect from 0 effect, so we shouldn't update much at all, which is why I wished Mcardle had talked about statistical power. Before we ask "how should we update", we should ask "what information is actually here?"
Edit: If we treat this as an exploration, it says "we need another study"- after all the effects could be as large as 40%! Thats a potentially tremendous intervention. Unfortunately, its unethical to randomly boot people off of insurance so we'll likely never see that study done.
Health is extremely important - the statistical value of a human life is something like $8 million - so smallish looking effects can be practically relevant. An intervention that saves 1 life out of every 10,000 people treated has an average benefit of $800 per person. In this Oregon study, people who received Medicaid cost an extra $1,172 per year in total health spending, so the intervention would need to save 1.5 lives per 10,000 person-years (or provide an equivalent benefit in other health improvements) for the health benefits to balance out the health costs. The study looked at fewer than 10,000 people over 2 years, so the cost-benefit cutoff for whether it's worth it is less than 3 lives saved (or equivalent).
So "not statistically significant" does not imply unimportant, even with a sample size of several thousand. An effect at the cost-benefit threshold is unlikely to show up in significant changes to mortality rates. The intermediate health measures in this study are more sensitive to changes than mortality rate, but were they sensitive enough? Has anyone run the numbers on how sensitive they'd need to be in order to find an effect of this size? The point estimates that they did report are (relative to control group) an 8% reduction in number of people with elevated blood pressure, 17% reduction in number of people with high cholesterol, and 18% reduction in number of people with high glycated hemoglobin levels (a marker of diabetes), which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.
This would be much more convincing if you reported the costs along with the benefits, so that one could form some kind of estimate of what you're willing to pay for this. But, again, I think your argument is motivated. "Consistent with zero" means just that; it means that the study cannot exclude the possibility that the intervention was actively harmful, but they had a random fluctuation in the data.
I get the impression that people here talk a good game about statistics, but haven't really internalised the concept of error bars. I suggest that you have another look at why physics requires five sigma. There are really good reasons for that, you know; all the more so in a mindkilling-charged field.
I was responding to the suggestion that, even if the effects that they found are real, they are too small to matter. To me, that line of reasoning is a cue to do a Fermi estimate to get a quantitative sense of how big the effect would need to be in order to matter, and how that compares to the empirical results.
I didn't get into a full-fledged Fermi estimate here (translating the measures that they used into the dollar value of the health benefits), which is hard to do that when they only collected data on a few intermediate health measures. (If anyone else has given it a shot, I'd like to take a look.) I did find a couple effect-size-related numbers for which I feel like I have some intuitive sense of their size, and they suggest that that line of reasoning does not go through. Effects that are big enough to matter relative to the costs of additional health spending (like 3 lives saved in their sample, or some equivalent benefit) seem small enough to avoid statistical significance, and the point estimates that they found which are not statistically significant (8-18% reductions in various metrics) seem large enough to matter.
My overall conclusion about the (based on what I know about it so far) study is that it provides little information for updating in any direction, because of those wide error bars. The results are consistent with Medicaid having no effect, they're consistent with Medicaid having a modest health benefit (e.g., 10% reduction in a few bad things), they're consistent with Medicaid being actively harmful, and they're consistent with Medicaid having a large benefit (e.g. 40% reduction in many bad things). The likelihood ratios that the data provide for distinguishing between those alternatives are fairly close to one, with "modest health benefit" slightly favored over the more extreme alternatives.
Again, the original point McArdle is making is that "consistent with zero" is just completely not what the proponents expected beforehand, and they should update accordingly. See my discussion with TheOtherDave, below. A small effect may, indeed, be worth pursuing. But here we have a case where something fairly costly was done after much disagreement, and the proponents claimed that there would be a large effect. In that case, if you find a small effect, you ought not to say "Well, it's still worth doing"; that's not what you said before. It was claimed that there would be a large effect, and the program was passed on this basis. It is then dishonest to turn around and say "Ok, the effect is small but still worthwhile". This ignores the inertia of political programs.
Most Medicaid proponents did not have expectations about the statistical results of this particular study. They did not make predictions about confidence intervals and p values for these particular analyses. Rather, they had expectations about the actual benefit of Medicaid.
You cite Ezra Klein as someone who expected that Medicaid would drastically reduce mortality; Klein was drawing his numbers from a report which estimated that in the US "137,000 people died from 2000 through 2006 because they lacked health insurance, including 22,000 people in 2006." There were 47 million uninsured Americans in 2006, so those 22,000 excess deaths translate into 4.7 excess deaths per 10,000 uninsured people each year. So that's the size of the drastic reduction in mortality that you're referring to: 4.7 lives per 10,000 people each year. (For comparison, in my other comment I estimated that the Medicaid expansion would be worth its estimated cost if it saved at least 1.5 lives per 10,000 people each year or provided an equivalent benefit.)
Did the study rule out an effect as large as this drastic reduction of 4.7 per 10,000? As far as I can tell it did not (I'd like to see a more technical analysis of this). There were under 10,000 people in the study, so I wouldn't be surprised if they missed effects of that size. Their point estimates, of an 8-18% reduction in various bad things, intuitively seem like they could be consistent with an effect that size. And the upper bounds of their confidence intervals (a 40%+ reduction in each of the 3 bad things) intuitively seem consistent with a much larger effect. So if people like Klein and Drum had made predictions in advance about the effect size of the Oregon intervention, I suspect that their predictions would have fallen within the study's confidence interval.
There are presumably some people who did expect the results of the study to be statistically significant (otherwise, why run the study?), and they were wrong. But this isn't a competition between opponents and proponents where every slipup by one side cedes territory to the other side. The data and results are there for us to look at, so we can update based on what the study actually found instead of on which side of the conflict fought better in this battle. In this case, it looks like the correct update based on the study (for most people, to a first approximation) is to not update at all. The confidence interval for the effects that they examined covers the full range of results that seemed plausible beforehand (including the no-effect-whatsoever hypothesis and the tens-of-thousands-of-lives-each-year hypothesis), so the study provides little information for updating one's priors about the effectiveness of Medicaid.
For the people who did make the erroneous prediction that the study would find statistically significant results, why did they get it wrong? I'm not sure. A few possibilities: 1) they didn't do an analysis of the study's statistical power (or used some crude & mistaken heuristic to estimate power), 2) they overestimated how large a health benefit Medicaid would produce, 3) the control group in Oregon turned out to be healthier than they expected which left less room for Medicaid to show benefits, 4) fewer members of the experimental group than they expected ended up actually receiving Medicaid, which reduced the actual sample size and also added noise to the intent-to-treat analysis (reducing the effective sample size).
I do want to point out that, while I agree with your general points, I think that unless the proponents put numerical estimates up beforehand, it's not quite fair to assume they meant "it will be statistically significant in a sample size of N at least 95% of the time." Even if they said that, unless they explicitly calculated N, they probably underestimated it by at least one order of magnitude. (Professional researchers in social science make this mistake very frequently, and even when they avoid it, they can only very rarely find funding to actually collect N samples.)
I haven't looked into this study in depth, so semi-related anecdote time: there was recently a study of calorie restriction in monkeys which had ~70 monkeys. The confidence interval for the hazard ratio included 1 (no effect), and so they concluded no statistically significant benefit to CR on mortality, though they could declare statistically significant benefit on a few varieties of mortality and several health proxies.
I ran the numbers to determine the power; turns out that they couldn't have reliably noticed the effects of smoking (hazard ratio ~2) on longevity with a study of ~70 monkeys, and while I haven't seen many quoted estimates of the hazard ratio of eating normally compared to CR, I don't think there are many people that put them higher than 2.
When you don't have the power to reliably conclude that all-cause mortality decreased, you can eke out some extra information by looking at the signs of all the proxies you measured. If insurance does nothing, we should expect to see the effect estimates scattered around 0. If insurance has a positive effect, we should expect to see more effect estimates above 0 than below 0, even though most will include 0 in their CI. (Suppose they measure 30 mortality proxies, and all of them show a positive effect, though the univariate CI includes 0 for all of them. If the ground truth was no effect on mortality proxies, that's a very unlikely result to see; if the ground truth was a positive effect on mortality proxies, that's a likely result to see.)
Incidentally, how did you do that?
If I remember correctly, I noticed an effect that did give a p of slightly less than .05 was a hazard ratio of 3, which made me think of running that test, and then I think spower was the r function that I used to figure out what p they could get for a hazard ratio of 2 and 35 experimentals and 35 controls (or whatever the actual split was- I think it was slightly different?).
It is of course very difficult to extract any precise numbers from a political discussion. :) However, if you click through some of the links in the article, or have a look at the followup from today, you'll find McArdle quoting predictions of tens of thousands of preventable deaths yearly from non-insured status. That looks to me like a pretty big hazard rate, no?
No. The Oracle says there're about 50 million Americans without health insurance. The predictions you quoted refer to 18,000 or 27,000 deaths for want of insurance per year. The higher number implies only a 0.054% death rate per year, or a 3.5% death rate over 65 years (Americans over 65 automatically get insurance). This is non-negligible but hardly huge (and potentially important for all that).
Edit: and I see gwern has whupped me here.
Over a population of something like 50 million people? Dunno.
If I throw a die once and it comes up heads I'm going to be confused. Now, assuming you meant "toss a coin and it comes up heads six times out of ten".
What is your intended 'correct' answer to the question? I think I would indeed hesitantly update a very (very) tiny bit in the direction of the coin being biased but different priors regarding the possibility of the coin being biased in various ways and degrees could easily make the update be towards not-biased. I'd significantly lower p(the coin is biased by having two heads) but very slightly raise p(the coin is slightly heavier on the tails side), etc.
My intended correct answer is that, on this data, you technically can adjust your belief very slightly; but because the prior for a biased coin is so tiny, the update is not worth doing. The calculation cost way exceeds any benefit you can get from gruel this thin. I would say "Null hypothesis [ie unbiased coin] not disconfirmed; move along, nothing to see here". And if you had a political reason for wishing the coin to be biased towards heads, then you should definitely not make any such update; because you certainly wouldn't have done so, if tails had come up six times. In that case it would immediately have been "P-level is in the double digits" and "no statistical significance means exactly that" and "with those errors we're still consistent with a heads bias".
I would think that our prior for "health care improves health" should be quite a bit larger than the prior for a coin to be biased.
That depends on how long "we" have been reading Overcoming Bias.
Hanson's point is that we often over-treat to show we care- not that 0 health care is optimal. Medicaid patients don't really have to worry about overtreatment.
I was interpreting "health care improves health" as "healthcare improves health on the margin." Is this not what was meant?
As someone who has a start-up in the healthcare industry, this runs counter to my personal experience. Also, currently "medicaid overtreatment" is showing about 676,000 results on Google (while "medicaid undertreatment" is showing about 1,240,000 results). Even if it isn't typical, it surely isn't an unheard-of phenomenon.
No, I meant going from 0 access to care to some access to care improves health, as we are discussing the medicaid study comparing people on medicaid to the uninsured.
I currently work as a statistician for a large HMO, and I can tell you for us, medicaid patients generally get the 'patch-you-up-and-out-the-door' treatment because odds are high we won't be getting reimbursed in any kind of timely fashion. I've worked in a few states, and it seems pretty common for medicaid to be fairly underfunded (hence the Oregon study we are discussing).
And generally, providing medicaid is moving someone from emergency-only to some-primary-care, which is where we should expect some impact- this isn't increasing treatment on the margin, its providing minimal care to a largely untreated population.
So I randomly sampled ~5 in the first two pages, and 3 of those were articles about overtreatment that had a sidebar to a different article discussing some aspect of medicaid, so I'm not sure if the count is meaningful here. (The other 2 were about some loophole dentists were using to overtreat children on medicaid and bill extra, I have no knowledge of dental claims).
That is Kevin Drum's take. Post 1:
Post 2:
If this were the only medical study in all of history, then yes, a non-significant result should cause you to update as your quote says. In a world with thousands of studies yearly, you cannot do any such thing, because you're sure to bias yourself by paying attention to the slightly-positive results you like, and ignore the slightly-negative ones you dislike. (That's aside from the well-known publication bias where positive results are reported and negative ones aren't.) If the study had come out with a non-significant negative effect, would comrade Drum have been updating slightly in the direction of "Medicaid is bad"? Hah. This is why we impose the 95% confidence cutoff, which actually is way too low, but that's another discussion. It prevents us from seeing, or worse, creating, patterns in the noise, which humans are really good at.
The significance cutoff is not a technique of rationality, it is a technique of science, like blinding your results while you're studying the systematics. It's something we do because we run on untrusted hardware. Please do not relax your safeguards if a noisy result happens to agree with your opinions! That's what the safeguards are for!
Then also, please note that Kevin Drum's prior was not actually "Medicaid will slightly improve these three markers", it was "Medicaid will drastically reduce mortality". (See links in discussion with TheOtherDave, below). If you switch your priors around as convenient for claiming support from studies, then of course no study can possibly cause you to update downwards. I would gently suggest that this is not a good epistemic state to occupy.
The value of health insurance isn't that it keeps you from getting sick. It's that it keeps you from getting in debt when you do get sick.
That's why McArdle recommended getting only catastrophic coverage.
This may be true, but McArdle's point is precisely that this was not said before the study came out. At that time, people confidently expected that health insurance would, in fact, improve health outcomes. Your argument is one that was only made after the result was known; this is a classic failure mode.
This is a perspective similar to DanielLC's point. Additionally, a commenter there makes the parallel point that we don't really know whether private insurance improves the outcome measures.
True, but we shouldn't overstate the argument. The p-values were not low enough to count as "statistically significant," but the direction of change was towards improved health outcomes. One is doing something wrong with this evidence if one updates against improved health outcomes for public health insurance for the poor (i.e. Medicaid).
Updates always move you towards what you just saw, and so if your estimate was above what you just saw, you update down. If you only consider the hypotheses that Medicaid "improves," "has no effect," or "harms," then this is weak evidence for "improves" (and "has no effect"). But a more sophisticated set of hypotheses is the quantitative effect of Medicaid; if one estimated beforehand that Medicaid doubled lifespans (to use an exaggerated example), they should revise their estimate downward after seeing this study.
Fair enough. I should have said "McArdle and her political allies are making a mistake by not updating towards 'Medicaid improves health outcomes,'" given my perception of their priors.
(nods) Yup. Of course, McArdle's claims about what people would have said before the study, if asked, are also only being made after the results are known, which as you say is a classic failure mode.
Of course, McArdle is neither passing laws nor doing research, just writing articles, so the cost of failure is low. And it's kind of nice to see someone in the mainstream (sorta) press making the point that surprising observations should change our confidence in our beliefs, which people surprisingly often overlook.
Anyway, the quality of McArdle's analysis notwithstanding, one place this sort of reasoning seems to lead us is to the idea that when passing a law, we ought to say something about what we anticipate the results of passing that law to be, and have a convention of repealing laws that don't actually accomplish the thing that we said we were passing the law in order to accomplish.
Which in principle I would be all in favor of, except for the obvious failure mode that if I personally don't want us to accomplish that, I am now given an incentive to manipulate the system in other ways to lower whatever metrics we said we were going to measure. (Note: I am not claiming here that any such thing happened in the Oregon study.)
That said, even taking that failure mode into account, it might still be preferable to passing laws with unarticulated expected benefits and keeping them on the books despite those benefits never materializing.
I love this idea!
There would have to be a two sided test. A tort of ineffectiveness by which the plaintiff seeks relief from a law that fails to achieve the goals laid out for it. A tort of under-ambition by which the plaintiff seeks relief from a law that is immune from the tort of ineffectiveness because the formally specified goals are feeble.
Think about the American experience with courts voiding laws that are unconstitutional. This often ends up with the courts applying balancing tests. It can end up with the court ruling that yes, the law infringes your rights, but only a little. And the law serves a valid purpose, which is very important. So the law is allowed to stand.
These kinds of cases are decided in prospect. The decision is reached on the speculation about the actual effects of the law. It might help if constitutional challenges to legislation could be re-litigated, perhaps after the first ten years. The second hearing could then be decided retrospectively, looking back at ten years experience, and balancing the actual burden on the plaintiffs rights against the actual public benefit of the law.
Where though is the goal post? In practice it moves. In the prospective hearing the government will make grand promises about the huge benefits the law will bring. In the retrospective hearing the government will sail on the opposite tack, arguing that only very modest benefits suffice to justify the law.
It would be good it the goal posts are fixed. Right from the start the law states the goals against which it will be assessed in ten years time. Certainly there needs to be a tort of ineffectiveness, active against laws that do not meet their goals. But politicians would soon learn to game the system by writing very modest goals into law. That needs to be blocked with a tort of under-ambition which ensures that the initial constitutionality of the law is judged only admitting in prospect those benefits that can be litigated in retrospect.
The goal posts should definitely be fixed! And maybe some politicians would want to pass a law that benefits him and his friends in some way, even though it only has a small effect, so there ought to be some kind of safeguard against that, too. But the main problem I can see is anti-synergy. Suppose a law is adopted that totally would have worked, were it not for some other law that was introduced a little later? Should the first one be repealed, or the second one? But maybe the second one does accomplish its goal, and repealing the first one would have negative effects, now that the second one is in place... And with so many laws interacting, how can you even tell which ones have which effects, unless the effects are very large indeed? (Of course, this is a problem in the current system too. I'm glad I'm not a politician; I'd be paralyzed with fear of unintended consequences.)
Good point! I've totally failed to think about multiple laws interacting.
I don't think that's true; if you read her original article on the subject, linked in the one I link, she quotes statistics like this:
And back in 2010, she said
I don't think her statement is entirely post-hoc.
Fair enough. I only read the article you linked, not the additional source material; I'm prepared to believe given additional evidence like what you cite here that her analysis is... er... can one say "pre-hoc"?
Well, if not, one ought to be able to. I hereby grant you permission! :)
Ante hoc.
It does help you to pay for (say) blood-pressure medication. This might be expected to result in more people with medical aid and blood-pressure problems taking their medication.
It also helps to pay for doctors. This leads to more people going to the doctor with minor complaints, and increased chances of catching something serious earlier.
Er, yes, fine, but... to the extent that the study shows anything, it shows that the positive results of these effects, if they exist, are consistent with zero. Can we please discuss the data, now that we have some, and not theory?
-- Eric Hoffer
Perhaps, but absolute power tends to be the more relevant one, as it definitionally also includes the means to persue the goals derived from absolute corruption.
I wonder where one could apply "Absolute" and not come up with a scary sounding conclusion. Absolute skepticism seems it would turn one into a gibbering madman. Absolute logic--well what is a dangerous AI but absolute logic plus power?
Absolute non-contradiction? Since anything else (that is, any contradictory statement) is absolutely horrible, if absolute non-contradiction is also horrible then nothing good exists.
edit: s/than nothing/then nothing/
Absolute goodness?
Anything else would be problematic. Making people smile is good. Tiling the universe with microscopic smiley faces is not.
Absolute goodness seems tautalogically good. If you pick any one good trait or action and maximize it it grows ominous again.
That's why I chose it.
Like the smiling example I gave.
Oh, actually I didn't see the connection with the smile-tile and the making people smile statements at first read. Since it isn't quite correct to say making people smile is good, but rather people smiling is typically a sign something good has happened. So a better (albeit common) example might be making people happy is good, wireheading is scary. But any connection to the top level post is growing tenuous.
Absolute knowledge also seems like it'd leave you gibbering... Just think about it: knowledge of everything, that is to say every atom of every single object in the universe.
I can only say Ouch
"People don't pay much attention to anything unless you give them reason to"
--The Night Circus
People don't do anything unless they have a reason to - given a sufficiently broad definition of "reason".
Kevin Warwick
I don't suppose you've got a cite for the central claim here? It's a decent enough example of reasoning from the bottom line whether or not it turns out to be true, but I Googled a couple different sets of keywords, and the only thing that came up besides a whole mess of birth records and obstetricians' papers was Warwick's lecture notes.
This paper is my best lead so far, but it's behind a paywall at the moment. I think it's in "Bayerthal (1911)", whatever that turns out to be.
Bayerthal (1911) is unfortunately in German. Now I'm waiting for access to this paper.
Got it. But see here in any case.
Google turns up a source for the "women of genius" quote, a book "Sex Differences in Cognitive Abilities" by D. Halpern. The book's quote is from someone named Bayertahl, and it's an indirect quotation from a 1989 article, "Sexual Dimorphism in the Human Brain: Dispelling the Myths" supposedly by a J. Janowsky. I say supposedly because looking for a fulltext leads me to a version with a similar title ("Sexual Dimorphism of the Human Brain: Myths and Realities") but is by M. A. Hofman and D. F. Swaab; it contains the Bayertahl quote in the original German and says that the primary source is this 1932 article by a Louis Bolk, "Hersenen en Cultuur" (Brains and Culture). This is also a full text, in Dutch; Google's translation seems to roughly confirm the claim as reported by Warwick (though the "women of genius" quote does not seem to appear in Bolk's article, at a first cursory glance).
This cites "Bayerthall 1911".
-- aristosophy
However, equivalences are also the bread and butter of inference. Distinguishing more than you need to will slow you down.
Unfortunately I only have a finite amount of storage available, so I can only do that up to a certain point.
This is often a good idea in mathematics. Two concepts that are equivalent in some context may no longer be equivalent once you move to a more general context; for example, familiar equivalent definitions are often no longer equivalent if you start dropping axioms from set theory or logic (e.g. the axiom of choice or excluded middle).
Outside of mathematical logic, some familiar examples include:
Arguable example: probability and uncertainty. (More or less identical in my theorizing, but some call the idea of their identity the ludic fallacy.)
Oh, I read Taleb as using 'ludic fallacy' to mean using distributions with light tails.
There's still a couple related fallacies that Bayesians can commit.
Most related to the "ludic fallacy" as you've described it: if you treat both epistemic (lack of knowledge) and aleatory (lack of predetermination) uncertainty with the same general probability distribution function framework, it becomes tempting to try to collapse the two together. But a PDF-over-PDFs-over-outcomes still isn't the same thing as a PDF-over-outcomes, and if you try to compute with the latter you won't get the right results.
Most related to the "ludic fallacy" as I inferred it from Taleb: if you perform your calculations by assigning zero priors to various models, as everybody does to make the calculations tractable, then if evidence actually points towards one of those neglected priors and you don't recompute with it in mind, you'll find that your posterior estimates can be grossly mistaken.
Illustration of availability bias:
http://www.youtube.com/watch?v=LVM4jR3TZsU
Thus the availability bias defeats the Pascal mugging.
Hunter Felt
Witty to be sure, but obviously false. The causal connection between baseball and the content (as opposed to the name) of the law is probably fairly tenuous. The number three is ubiquitous in all areas of human culture.
We can still blame the propaganda for helping make the laws appealing and getting them to pass
And given the popularity of things named after people like "Laura's Law" or "Megan's Law", it wouldn't surprise me if the popularity was due to the rhetorical effect on the average voter.
I think further investigation would reveal that is at most a Western cultural thing, not a hardwired human universal. Elsewhere in time and place, 4 has been the important number -- e.g. recurrences of 4 and 40 in the Hebrew scriptures; the importance of 4 and (negatively) 8 in Chinese culture, etc.. Possibly some other digits have performed similarly in other places as well.
-- Karen Pryor, Don't Shoot the Dog!: The New Art of Teaching and Training