Rationality Quotes May 2013
Here's another installment of rationality quotes. The usual rules apply:
- Please post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote from Less Wrong itself, Overcoming Bias, or HPMoR.
- No more than 5 quotes per person per monthly thread, please.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (387)
The Vulcan your Vulcan could sound like if he wasn't made of straw, I guess? Link
Well... not quite. The selection effect makes the survival number basically impossible to calculate, but regularly surviving risky scenarios seems like it would provide a bit better odds for the influence of moxie than 249:200.
Fun Bayes application: what's the likelihood ratio for the existence vs. nonexistence of moxie-based immunity to death during battle for military leaders, given the military history of Earth?
At some point, if the Vulcan is smart enough, I suspect the calculation would begin to hinge more on plot twists and the odds that the story is nearing its end, as the hypothesis that they are wearing Plot Armor rises up to the forefront.
I'd also suspect that the Vulcan would realize quickly that as his prediction for the probability of success approaches 1, the odds of a sudden plot reversal that plunges them all in deep poo also approaches 1. And then the Vulcan would immediately adjust to always spouting off some random high-odds-against-us number all the time just to make sure they'd always succeed heroically.
Ow, this is starting to sound very newcomblike.
Holy crap, canon!Spock is a genius rationalist after all.
The C3PO of rationalists.
(At least when in a fight, the bridge crew always takes great care to ask for damage reports, and whether someone anywhere on the ship broke a finger, before, you know, firing back.)
Hey, the humans have to do something while the computer (which somehow hasn't obtained sentience) does all the real work.
The computer is secretly making paper clips in cargo bay 2, beaming them into space when noone is looking.
I want to believe.
The last line of reasoning doesn't quite work. Not every incident has an episode made out of it.
I read that charitably as indicating occasional failures in non-deadly situations. Not even Captain Kirk wins 'em all.
Unweighted, that's 3690:1 odds.
Since odds to three or more significant figures have been quoted, that gives us 2856:1 odds (still without weighting). From this, I conclude that the successful incidents usually involved ships that were either very differently designed to the ship in question, or were a long time ago (case in point - the 47-year-old success case). This implies that the current ship's design is actually somewhat more likely to fall afoul of the nebula than an average ship, or an older ship. Rather substantially, in fact; enough to almost exactly counter the determination/drive factor.
An investigation into the shipyards, and current design paradigms, may be in order once the trillions of lives have been saved. I suspect that too little emphasis is being placed on safety at some point in the design process.
...as I recommended strenuously before we left dock at the beginning of this mission, since a similar analysis performed then gave approximately 8000:1 odds that before this mission was complete you would do something deeply stupid that got us all killed, no matter how strenuously I tried to instruct you in basic risk factor analysis. That having failed, I gave serious consideration to simply taking over the ship myself, which I estimate will increase by a factor of approximately 3000 the utility created by our missions (even taking into account the reduced "moxie factor", which is primarily of use during crises a sensible Captain would avoid getting into in the first place). However, I observe that my superiors in the High Command have not taken over Starfleet and the Federation, despite the obvious benefits of such a strategy. At first this led me to 83% confidence that the High Command was in possession of extremely compelling unshared evidence of the value of humanity's leadership, which at that time led me to update significantly in favor of that view myself. I have since then reduced that confidence to 76%, with a 13% confidence that the High Command has instead been subverted by hostile powers partial to humanity.
The steel-Vulcan in the original quote admits that humans have an edge in the field of interpersonal relations. I imagine that's why the Vulcans let the humans lead; because the humans are capable of persuading all the other races in the Federation to go along with this whole 'federation' idea, and leave the Vulcans more-or-less alone as long as they share some of their research results.
Or, to put it another way; Vulcan High Command has managed to foist off the boring administration work onto the humans, in exchange for mere unimportant status, and is not eager to have it land back on their laps again.
Of course, some Vulcans do think that a Vulcan-led empire would be an improvement over a human-led one. The last batch to think that went off and formed the Romulan Empire. The Vulcans and the Romulans are currently running a long-term, large-scale experiment to see which paradigm creates a more lasting empire in practice. (They don't tell the other races that it's all a political experiment, of course. They might not be great at interpersonal realtions, but they have found out in the past that that is a very bad idea).
It's not mere unimportant status, though. The Federation makes decisions that affect the state of the Galaxy, and they make different decisions than they would under Vulcan control, and those differences cash out in terms of significant differences in overall utility. For a culture that believes that "the fate of the many outweighs the fate of the few, or the one," the choice to allow that just so they can be left alone seems bizarre.
Of course, that assumes that they consider non-Vulcans to be part of "the many." Now that I think about it, there's no particular reason to believe that's a commonly held Vulcan value/belief.
Eh, questionable. I'm sure many of us have been in situations where we're advising more senior staff and the manager or whoever isn't really the one making the decision anymore - they're just the talking head we get to rubber stamp what those of us who actually deal with the problem have decided is going to happen.
In practice I tend to find that the people who control access to information, rather than the people who wield formal authority, tend to have the most power in an organisation.
This is a conceptually simple trade-off, although the math would be difficult. Assume that a Federation under Vulcan control would make better decisions but would have more difficulty implementing them (either on a sufficient scale or as effectively) because the strengths that make them better analysts are not the same strengths that make humans charismatic leaders. The Federation might not have as many planets, those planets might not be as willing to implement Vulcan ideas when advocated by Vulcans, etc. Is overall utility higher if Vulcans take the optimal action A% of the time at X% effectiveness or if humans take the optimal action B% of the time at Y% effectiveness? (You would adjust "the optimal action" for the relative strengths of the two species.)
If you believed that AX > BY, you formed the Romulan Empire. If you believed that AX < BY, you joined the Federation. I don't know enough Star Trek lore to say what happens if you end up with different estimates than the rest of your faction (defection, agitation for political change, execution?).
If I was rich enough, I would pay you to write fanfic like this.
Given how well my time is recompensed these days, I suspect you could find many far-cheaper, equally good writers.
One could object by pointing out that moxie, determination, drive, and the human spirit have the strongest effect in life-or-death situations: situations in which their rate of survival over the past three years is obviously 100%.
Thanks for the link. I really enjoyed reading the comic archives.
You shouldn't trust people who claim to know 4 digits of accuracy for a forcast like this. The uncertainity involved in the calcuation has to be greater.
You shouldn't trust a human person who makes that claim. But if we are using 'person' in a way that includes the steel-Vulcan from the quote then yes, you should.
It is all uncertainty. There is no particular reason to doubt the steel-Vulcan's ability to calibrate 'meta' uncertainties too.
In the face of all the other evidence about the relative capabilities of the species in question that the character in question is implied to have it would be an error to overvalue the heuristic "don't trust people who fail to signal humility via truncating calculations". The latter is, after all, merely a convention. Given the downsides of that convention (it inevitably makes predictions worse) it is relatively unlikely that the Vulcans would have the same traditions regarding significant figure expression.
And lo, Wedrifid did invent the concept of Steel Vulcan and it was good.
Do we actually have enough fictional examples of this to form a trope? (At least 3, 5 would be better.)
Perhaps, but on the off chance that the captain doesn't listen, giving the exact probability increases the chances of success. The Vulcan mentioned that.
What I find curious about StarTrek models of... well, intelligence, if starship building is any indication of it... is that Romulans are on the same page as Vulcans. Forget 'Vulcans are more rational/logical/... then Humans'; they haven't outstripped the other subspecies! How have they been using their philosophy since Surak?
Surak's philosophy was never about improving scientific progress. Surak's philosophy was all about shutting down all hints of emotion, with the explicit intention of shutting down anger specifically, and thus preventing the entire Vulcan species from blowing itself up in a massively destructive civil war.
Vulcans, and by exension Romulans, are significantly more intelligent than humans; this is an advantage that both subspecies hold, and Surak's philosophies don't change that. Surak's philosophies speak of the inappropriateness of any sort of emotional reaction, and praise slow, careful, methodical progress, in which every factor is taken into account from all possible angles before the experiment is begun. Surak's philosophies speak out against such emotional weaknesses as enjoying one's work; a Vulcan who enjoys science may very well decide to move into a different field instead, one in which there is less danger of committing the faux pas of actually smiling. (Surak's philosophies go perhaps rather too far - to the point where a close association with a risk-taking species like humanity is probably a good thing for the Vulcans - but they do accomplish their aim of preventing extinction via civil war).
Romulans, on the other hand, have no difficulty showing emotions. Some of them will enjoy their science, they'll take risks, they'll occasionally accidentally blow themselves up with dangerous experiments (or lose their tempers and blow up other Romulans on purpose). Somehow, they've managed to avoid suicidal, self-destructive civil war so far... but I'm somehow not surprised that the Vulcans have failed to outstrip them.
And yet it is still so easy to imagine such an outcome. Actually, I am more surprised that they chose such similar roads more than they are close in achievements. For example, maybe Vulcans would have made breakthroughs in areas that have no value for Romulans, and viva a versa.
That the Vulcans and the Romulans have incredibly close levels of technology is surprising, yes; but not nearly as surprising as the idea that the Humans, the Klingons, the Betazoids, and about a hundred or so other species all have such incredibly similar technology levels, and all without any hint of shared history before they developed their seperate warp drives.
Joseph Heller, Catch-22
explaining /= explaining away
Discussing the "Near-miss bias" which they define as a tendency to "take more risk after an event in which luck played a critical role in deciding the event's [favorable] outcome."
Top Dog: The Science of Winning and Losing by Po Bronson and Ashley Merryman, page 150.
There wouldn't happen to be anything that's sort of the opposite of this, would there? Screwing up often but sporadically, not due to inherent inability but because of simple inexpertise, making you say "I'm bad at this" more often?
Interesting. I wonder to what extent this corrects for people's risk-aversion. Success is evidence against the riskiness of the action.
Oglaf webcomic, "Bilge"
(Oglaf is usually NSFW, so I'm not linking, even if this particular comic has nothing worse than coarse language.)
I'll do it!
Throw away months of hard work? Fuck that! Let's fight!
A great illustration of sunk cost bias.
Took me a second.
I find myself wondering whether that pun was the original impetus for the comic. (If so, I commend the artist's restraint, which isn't something one can often say about Oglaf.)
On the contrary, a sizable fraction of Oglaf's comics involve restraints.
- Gunbuster
Well, since Conscientiousness is heritable to a substantial degree, perhaps she inherited her knack for hard work.
-- Richard Feynman's Surely You're Joking, Mr Feynman!
Fortunately, things have since gotten better in that respect.
Aristotle
Source: Nicomachean Ethics, book II
Hunter Felt
-- Devo, on the value of confronting problems rather than letting them fester
-- Walter Russell Mead, describing someone else's failure to understand what a desperate effort actually looks like.
– James Alexander Lindsay
-- Megan McArdle, trying to explain Bayesian updates and the importance of making predictions in advance, without referring to any mathematics.
The value of health insurance isn't that it keeps you from getting sick. It's that it keeps you from getting in debt when you do get sick.
This may be true, but McArdle's point is precisely that this was not said before the study came out. At that time, people confidently expected that health insurance would, in fact, improve health outcomes. Your argument is one that was only made after the result was known; this is a classic failure mode.
(nods) Yup. Of course, McArdle's claims about what people would have said before the study, if asked, are also only being made after the results are known, which as you say is a classic failure mode.
Of course, McArdle is neither passing laws nor doing research, just writing articles, so the cost of failure is low. And it's kind of nice to see someone in the mainstream (sorta) press making the point that surprising observations should change our confidence in our beliefs, which people surprisingly often overlook.
Anyway, the quality of McArdle's analysis notwithstanding, one place this sort of reasoning seems to lead us is to the idea that when passing a law, we ought to say something about what we anticipate the results of passing that law to be, and have a convention of repealing laws that don't actually accomplish the thing that we said we were passing the law in order to accomplish.
Which in principle I would be all in favor of, except for the obvious failure mode that if I personally don't want us to accomplish that, I am now given an incentive to manipulate the system in other ways to lower whatever metrics we said we were going to measure. (Note: I am not claiming here that any such thing happened in the Oregon study.)
That said, even taking that failure mode into account, it might still be preferable to passing laws with unarticulated expected benefits and keeping them on the books despite those benefits never materializing.
I don't think that's true; if you read her original article on the subject, linked in the one I link, she quotes statistics like this:
And back in 2010, she said
I don't think her statement is entirely post-hoc.
Fair enough. I only read the article you linked, not the additional source material; I'm prepared to believe given additional evidence like what you cite here that her analysis is... er... can one say "pre-hoc"?
Ante hoc.
Well, if not, one ought to be able to. I hereby grant you permission! :)
I love this idea!
There would have to be a two sided test. A tort of ineffectiveness by which the plaintiff seeks relief from a law that fails to achieve the goals laid out for it. A tort of under-ambition by which the plaintiff seeks relief from a law that is immune from the tort of ineffectiveness because the formally specified goals are feeble.
Think about the American experience with courts voiding laws that are unconstitutional. This often ends up with the courts applying balancing tests. It can end up with the court ruling that yes, the law infringes your rights, but only a little. And the law serves a valid purpose, which is very important. So the law is allowed to stand.
These kinds of cases are decided in prospect. The decision is reached on the speculation about the actual effects of the law. It might help if constitutional challenges to legislation could be re-litigated, perhaps after the first ten years. The second hearing could then be decided retrospectively, looking back at ten years experience, and balancing the actual burden on the plaintiffs rights against the actual public benefit of the law.
Where though is the goal post? In practice it moves. In the prospective hearing the government will make grand promises about the huge benefits the law will bring. In the retrospective hearing the government will sail on the opposite tack, arguing that only very modest benefits suffice to justify the law.
It would be good it the goal posts are fixed. Right from the start the law states the goals against which it will be assessed in ten years time. Certainly there needs to be a tort of ineffectiveness, active against laws that do not meet their goals. But politicians would soon learn to game the system by writing very modest goals into law. That needs to be blocked with a tort of under-ambition which ensures that the initial constitutionality of the law is judged only admitting in prospect those benefits that can be litigated in retrospect.
The goal posts should definitely be fixed! And maybe some politicians would want to pass a law that benefits him and his friends in some way, even though it only has a small effect, so there ought to be some kind of safeguard against that, too. But the main problem I can see is anti-synergy. Suppose a law is adopted that totally would have worked, were it not for some other law that was introduced a little later? Should the first one be repealed, or the second one? But maybe the second one does accomplish its goal, and repealing the first one would have negative effects, now that the second one is in place... And with so many laws interacting, how can you even tell which ones have which effects, unless the effects are very large indeed? (Of course, this is a problem in the current system too. I'm glad I'm not a politician; I'd be paralyzed with fear of unintended consequences.)
Good point! I've totally failed to think about multiple laws interacting.
This annoys me because she doesn't talk at all about the power of the study. Usually, when you see statistically insignificant positive changes across the board in a study without much power, its a suggestion you should hesitantly update a very tiny bit in the positive direction, AND you need another study, not a suggestion you should update downward.
When ethics prevent us from constructing high power statistical studies, we need to be a bit careful not to reify statistical significance.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn't even matter that it is positive. An analogy: Suppose I tell you that eating garlic daily increases your IQ, and point to a study with three million participants and P < 1e-7. Vastly significant, no? Now it turns out that the actual size of the effect is 0.01 points of IQ. Are you going to start eating garlic? What if it weren't garlic, but a several-billion-dollar government health program? Statistical significance is indeed not everything, but there's such a thing as considering the size of an effect, especially if there's a cost involved.
Moreover, please consider that "consistent with zero" means exactly that. If you throw a die ten times and it comes up heads six, do you "hesitantly update a very tiny bit" in the direction of the coin being biased? Would you do so, if you did not have a prior reason to hope that the coin was biased?
I respectfully suggest that you are letting your already-written bottom line interfere with your math.
If I throw a die and it comes up heads, I'd update in the direction of it being a very unusual die. :-)
I strongly disagree.
An old comment of mine gives us a counterexample. A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults. This is very much a practically significant effect, and it's probably an underestimate for reasons I'll omit for brevity — look at the paper if you're curious.
If you do look at the paper, notice figure 1, which summarizes the results of the 8 individual RCTs the meta-analysis used. Even though all of the RCTs had sample sizes in the thousands, 7 of them failed to show a statistically significant effect, including the 4 largest (sample sizes 5139, 5085, 3711 & 3310). The effect is therefore "so small that a sample of several thousand is not sufficient to reliably observe it", but we would be absolutely wrong to infer that "it doesn't even matter that it is positive"!
The heuristic that a hard-to-detect effect is probably too small to care about is a fair rule of thumb, but it's only a heuristic. EHeller & Unnamed are quite right to point out that statistical significance and practical significance correlate only imperfectly.
tl;dr: NHST and Bayesian-style subjective probability do not mix easily.
Another example of this problem: http://slatestarcodex.com/2014/01/25/beware-mass-produced-medical-recommendations/
Does vitamin D reduce all-cause mortality in the elderly? The point-estimates from pretty much all of the various studies are around a 5% reduction in risk of dying for any reason - pretty nontrivial, one would say, no? Yet the results are almost all not 'statistically significant'! So do we follow Rolf and say 'fans of vitamin D ought to update on vitamin D not helping overall'... or do we, applying power considerations about the likelihood of making the hard cutoffs at p<0.05 given the small sample sizes & plausible effect sizes, note that the point-estimates are in favor of the hypothesis? (And how does this interact with two-sided tests - vitamin D could've increased mortality, after all. Positive point-estimates are consistent with vitamin D helping, and less consistent with no effect, and even less consistent with it harming; so why are we supposed to update in favor of no help or harm when we see a positive point-estimate?)
If we accept Rolf's argument, then we'd be in the odd position of, as we read through one non-statistically-significant study after another, decreasing the probability of 'non-zero reduction in mortality'... right up until we get the Autier or Cochrane data summarizing the exact same studies & plug it into a Bayesian meta-analysis like Salvatier did & abruptly flip to '92% chance of non-zero reduction in mortality'.
That's a curious metric to choose. By that standard taking aspirin is about as healthy as playing a round of Russian Roulette.
It's a fairly natural metric to choose if one wishes to gauge aspirin's effect on cancer risk, as the study's authors did.
Fortunately, the study's authors and I also interpreted the data by another standard. Daily aspirin reduced all-cause mortality, and didn't increase non-cancer deaths (except for "a transient increase in risk of vascular death in the aspirin groups during the first year after completion of the trials"). These are not results we would see if aspirin effected its anti-cancer magic by a similar mechanism to Russian Roulette.
I'd assume they mean something like the per-year risk of dying from cancer conditional on previous survival -- if they indeed mean the total lifetime risk of dying from cancer I agree it's ridiculous.
Am I missing a subtlety here, or is it just that cancer is usually one of those things that you hope to live long enough to get?
Yeah, pretty much. There are other examples of this where something harmful appears to be helpful when you don't take into account possible selection biases (like being put into the 'non-cancer death' category); for example, this is an issue in smoking - you can find various correlations where smokers are healthier than non-smokers, but this is just because the unhealthier smokers got pushed over the edge by smoking and died earlier.
Have you read the study in question? The treatment sample is NOT several thousand, its about 1500. Further, the incidence of the diseases being looked at are only a few percent or less, so the treatment sample sizes for the most prevalent diseases are around 50 (also, if you look at the specifics of the sample, the diseased groups are pretty well controlled).
I suggest the following exercise- ask yourself what WOULD be a big effect, and then work through if the study has the power to see it.
Yes, but in this case, the sample sizes are small and the error bars are so large that consistent with zero is ALSO consistent with 25+ % reduction in incidence (which is a large intervention). The study is incapable from distinguishing hugely important effect from 0 effect, so we shouldn't update much at all, which is why I wished Mcardle had talked about statistical power. Before we ask "how should we update", we should ask "what information is actually here?"
Edit: If we treat this as an exploration, it says "we need another study"- after all the effects could be as large as 40%! Thats a potentially tremendous intervention. Unfortunately, its unethical to randomly boot people off of insurance so we'll likely never see that study done.
Health is extremely important - the statistical value of a human life is something like $8 million - so smallish looking effects can be practically relevant. An intervention that saves 1 life out of every 10,000 people treated has an average benefit of $800 per person. In this Oregon study, people who received Medicaid cost an extra $1,172 per year in total health spending, so the intervention would need to save 1.5 lives per 10,000 person-years (or provide an equivalent benefit in other health improvements) for the health benefits to balance out the health costs. The study looked at fewer than 10,000 people over 2 years, so the cost-benefit cutoff for whether it's worth it is less than 3 lives saved (or equivalent).
So "not statistically significant" does not imply unimportant, even with a sample size of several thousand. An effect at the cost-benefit threshold is unlikely to show up in significant changes to mortality rates. The intermediate health measures in this study are more sensitive to changes than mortality rate, but were they sensitive enough? Has anyone run the numbers on how sensitive they'd need to be in order to find an effect of this size? The point estimates that they did report are (relative to control group) an 8% reduction in number of people with elevated blood pressure, 17% reduction in number of people with high cholesterol, and 18% reduction in number of people with high glycated hemoglobin levels (a marker of diabetes), which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.
This would be much more convincing if you reported the costs along with the benefits, so that one could form some kind of estimate of what you're willing to pay for this. But, again, I think your argument is motivated. "Consistent with zero" means just that; it means that the study cannot exclude the possibility that the intervention was actively harmful, but they had a random fluctuation in the data.
I get the impression that people here talk a good game about statistics, but haven't really internalised the concept of error bars. I suggest that you have another look at why physics requires five sigma. There are really good reasons for that, you know; all the more so in a mindkilling-charged field.
I was responding to the suggestion that, even if the effects that they found are real, they are too small to matter. To me, that line of reasoning is a cue to do a Fermi estimate to get a quantitative sense of how big the effect would need to be in order to matter, and how that compares to the empirical results.
I didn't get into a full-fledged Fermi estimate here (translating the measures that they used into the dollar value of the health benefits), which is hard to do that when they only collected data on a few intermediate health measures. (If anyone else has given it a shot, I'd like to take a look.) I did find a couple effect-size-related numbers for which I feel like I have some intuitive sense of their size, and they suggest that that line of reasoning does not go through. Effects that are big enough to matter relative to the costs of additional health spending (like 3 lives saved in their sample, or some equivalent benefit) seem small enough to avoid statistical significance, and the point estimates that they found which are not statistically significant (8-18% reductions in various metrics) seem large enough to matter.
My overall conclusion about the (based on what I know about it so far) study is that it provides little information for updating in any direction, because of those wide error bars. The results are consistent with Medicaid having no effect, they're consistent with Medicaid having a modest health benefit (e.g., 10% reduction in a few bad things), they're consistent with Medicaid being actively harmful, and they're consistent with Medicaid having a large benefit (e.g. 40% reduction in many bad things). The likelihood ratios that the data provide for distinguishing between those alternatives are fairly close to one, with "modest health benefit" slightly favored over the more extreme alternatives.
Again, the original point McArdle is making is that "consistent with zero" is just completely not what the proponents expected beforehand, and they should update accordingly. See my discussion with TheOtherDave, below. A small effect may, indeed, be worth pursuing. But here we have a case where something fairly costly was done after much disagreement, and the proponents claimed that there would be a large effect. In that case, if you find a small effect, you ought not to say "Well, it's still worth doing"; that's not what you said before. It was claimed that there would be a large effect, and the program was passed on this basis. It is then dishonest to turn around and say "Ok, the effect is small but still worthwhile". This ignores the inertia of political programs.
Most Medicaid proponents did not have expectations about the statistical results of this particular study. They did not make predictions about confidence intervals and p values for these particular analyses. Rather, they had expectations about the actual benefit of Medicaid.
You cite Ezra Klein as someone who expected that Medicaid would drastically reduce mortality; Klein was drawing his numbers from a report which estimated that in the US "137,000 people died from 2000 through 2006 because they lacked health insurance, including 22,000 people in 2006." There were 47 million uninsured Americans in 2006, so those 22,000 excess deaths translate into 4.7 excess deaths per 10,000 uninsured people each year. So that's the size of the drastic reduction in mortality that you're referring to: 4.7 lives per 10,000 people each year. (For comparison, in my other comment I estimated that the Medicaid expansion would be worth its estimated cost if it saved at least 1.5 lives per 10,000 people each year or provided an equivalent benefit.)
Did the study rule out an effect as large as this drastic reduction of 4.7 per 10,000? As far as I can tell it did not (I'd like to see a more technical analysis of this). There were under 10,000 people in the study, so I wouldn't be surprised if they missed effects of that size. Their point estimates, of an 8-18% reduction in various bad things, intuitively seem like they could be consistent with an effect that size. And the upper bounds of their confidence intervals (a 40%+ reduction in each of the 3 bad things) intuitively seem consistent with a much larger effect. So if people like Klein and Drum had made predictions in advance about the effect size of the Oregon intervention, I suspect that their predictions would have fallen within the study's confidence interval.
There are presumably some people who did expect the results of the study to be statistically significant (otherwise, why run the study?), and they were wrong. But this isn't a competition between opponents and proponents where every slipup by one side cedes territory to the other side. The data and results are there for us to look at, so we can update based on what the study actually found instead of on which side of the conflict fought better in this battle. In this case, it looks like the correct update based on the study (for most people, to a first approximation) is to not update at all. The confidence interval for the effects that they examined covers the full range of results that seemed plausible beforehand (including the no-effect-whatsoever hypothesis and the tens-of-thousands-of-lives-each-year hypothesis), so the study provides little information for updating one's priors about the effectiveness of Medicaid.
For the people who did make the erroneous prediction that the study would find statistically significant results, why did they get it wrong? I'm not sure. A few possibilities: 1) they didn't do an analysis of the study's statistical power (or used some crude & mistaken heuristic to estimate power), 2) they overestimated how large a health benefit Medicaid would produce, 3) the control group in Oregon turned out to be healthier than they expected which left less room for Medicaid to show benefits, 4) fewer members of the experimental group than they expected ended up actually receiving Medicaid, which reduced the actual sample size and also added noise to the intent-to-treat analysis (reducing the effective sample size).
I do want to point out that, while I agree with your general points, I think that unless the proponents put numerical estimates up beforehand, it's not quite fair to assume they meant "it will be statistically significant in a sample size of N at least 95% of the time." Even if they said that, unless they explicitly calculated N, they probably underestimated it by at least one order of magnitude. (Professional researchers in social science make this mistake very frequently, and even when they avoid it, they can only very rarely find funding to actually collect N samples.)
I haven't looked into this study in depth, so semi-related anecdote time: there was recently a study of calorie restriction in monkeys which had ~70 monkeys. The confidence interval for the hazard ratio included 1 (no effect), and so they concluded no statistically significant benefit to CR on mortality, though they could declare statistically significant benefit on a few varieties of mortality and several health proxies.
I ran the numbers to determine the power; turns out that they couldn't have reliably noticed the effects of smoking (hazard ratio ~2) on longevity with a study of ~70 monkeys, and while I haven't seen many quoted estimates of the hazard ratio of eating normally compared to CR, I don't think there are many people that put them higher than 2.
When you don't have the power to reliably conclude that all-cause mortality decreased, you can eke out some extra information by looking at the signs of all the proxies you measured. If insurance does nothing, we should expect to see the effect estimates scattered around 0. If insurance has a positive effect, we should expect to see more effect estimates above 0 than below 0, even though most will include 0 in their CI. (Suppose they measure 30 mortality proxies, and all of them show a positive effect, though the univariate CI includes 0 for all of them. If the ground truth was no effect on mortality proxies, that's a very unlikely result to see; if the ground truth was a positive effect on mortality proxies, that's a likely result to see.)
Incidentally, how did you do that?
If I throw a die once and it comes up heads I'm going to be confused. Now, assuming you meant "toss a coin and it comes up heads six times out of ten".
What is your intended 'correct' answer to the question? I think I would indeed hesitantly update a very (very) tiny bit in the direction of the coin being biased but different priors regarding the possibility of the coin being biased in various ways and degrees could easily make the update be towards not-biased. I'd significantly lower p(the coin is biased by having two heads) but very slightly raise p(the coin is slightly heavier on the tails side), etc.
My intended correct answer is that, on this data, you technically can adjust your belief very slightly; but because the prior for a biased coin is so tiny, the update is not worth doing. The calculation cost way exceeds any benefit you can get from gruel this thin. I would say "Null hypothesis [ie unbiased coin] not disconfirmed; move along, nothing to see here". And if you had a political reason for wishing the coin to be biased towards heads, then you should definitely not make any such update; because you certainly wouldn't have done so, if tails had come up six times. In that case it would immediately have been "P-level is in the double digits" and "no statistical significance means exactly that" and "with those errors we're still consistent with a heads bias".
I would think that our prior for "health care improves health" should be quite a bit larger than the prior for a coin to be biased.
That is Kevin Drum's take. Post 1:
Post 2:
If this were the only medical study in all of history, then yes, a non-significant result should cause you to update as your quote says. In a world with thousands of studies yearly, you cannot do any such thing, because you're sure to bias yourself by paying attention to the slightly-positive results you like, and ignore the slightly-negative ones you dislike. (That's aside from the well-known publication bias where positive results are reported and negative ones aren't.) If the study had come out with a non-significant negative effect, would comrade Drum have been updating slightly in the direction of "Medicaid is bad"? Hah. This is why we impose the 95% confidence cutoff, which actually is way too low, but that's another discussion. It prevents us from seeing, or worse, creating, patterns in the noise, which humans are really good at.
The significance cutoff is not a technique of rationality, it is a technique of science, like blinding your results while you're studying the systematics. It's something we do because we run on untrusted hardware. Please do not relax your safeguards if a noisy result happens to agree with your opinions! That's what the safeguards are for!
Then also, please note that Kevin Drum's prior was not actually "Medicaid will slightly improve these three markers", it was "Medicaid will drastically reduce mortality". (See links in discussion with TheOtherDave, below). If you switch your priors around as convenient for claiming support from studies, then of course no study can possibly cause you to update downwards. I would gently suggest that this is not a good epistemic state to occupy.
Dan Ariely, Predictably Irrational: The Hidden Forces that Shape Our Decisions, New York, 2008, pp. 138-139
-- Karen Pryor, Don't Shoot the Dog!: The New Art of Teaching and Training
-- Matthew Leifer
I like the quote, but I have a nitpick:
When I've lost in the first round of single-elimination tournaments, I've found myself hoping that the person who beat me would prove skilled enough to win the entire tournament. That way, my loss wouldn't mean that I totally sucked, but only that I wasn't the best. So I think the quoted observation fails to account for nuances relating to how losses inform us about our skill level.
Daniel Dennett
-- Byron Katie, Loving What Is
-Adam Stark
Upvoted initially because this seemed like a good example of what I've taken to calling a "leprechaun" - a fact that spreads in spite of limited empirical backing; however a quick Google search (fact-checking the fact-check, as it were) leads to this article which at the very least suggests that the second-hand story told above is somewhat exaggerated: the evidence for bleeding associated with Gingko Biloba is rather more solid than "one case report - of a single person". Upvote retracted, I'm afraid...
(ETA: also, the other story at that link makes for... interesting reading for a rationalist.)
-- Paul Crowley
This is a claim about reality. Do we actually know that pulling numbers out of your arse actually does produce better results than pulling the decisions out directly? Or does it just feel better, because you have a theory now?
Years later, this unsurprising intuition is spectacularly confirmed by the Good Judgement Project; details in "Superforecasting".
How often? I can imagine this heuristic being better or worse depending on the details of which figures are chosen and how the are used.
I figure it works better about 80% of the time, so I'm going to go with it.
If I had to guess, I'd say that it's often better because picking a few random numbers leads to actually thinking about the decision for at least half a minute.
In practice, guessing at numbers and running a calculation actually serves as a quick second opinion on your original intuitive decision. If the numbers imply something far different from the decision that System 1 is offering, I don't immediately shrug and go with the numbers: I notice that I am confused, and flag this as something where I need to consider the reliability both of the calculation and of my basic intuition. If the calculation checks out with my original intuition, then I simply go for it.
Basically, a heuristic utility calculation is a cheap error flag which pops up more often when my intuitions are out of step with reality than when they're in step with reality. That makes it incredibly valuable.
On first pass, I read this as "which figures are chosen and how the arse is used". That seemed oddly appropriate.
Does Paul Crowley fall under the recent clarification that the spirit of the quotes thread is against quoting LessWrong regulars?
Umberto Eco, Foucault's Pendulum (1989)
In statistics this is known as "overfitting".
-- aristosophy
This is often a good idea in mathematics. Two concepts that are equivalent in some context may no longer be equivalent once you move to a more general context; for example, familiar equivalent definitions are often no longer equivalent if you start dropping axioms from set theory or logic (e.g. the axiom of choice or excluded middle).
Outside of mathematical logic, some familiar examples include:
Arguable example: probability and uncertainty. (More or less identical in my theorizing, but some call the idea of their identity the ludic fallacy.)
There's still a couple related fallacies that Bayesians can commit.
Most related to the "ludic fallacy" as you've described it: if you treat both epistemic (lack of knowledge) and aleatory (lack of predetermination) uncertainty with the same general probability distribution function framework, it becomes tempting to try to collapse the two together. But a PDF-over-PDFs-over-outcomes still isn't the same thing as a PDF-over-outcomes, and if you try to compute with the latter you won't get the right results.
Most related to the "ludic fallacy" as I inferred it from Taleb: if you perform your calculations by assigning zero priors to various models, as everybody does to make the calculations tractable, then if evidence actually points towards one of those neglected priors and you don't recompute with it in mind, you'll find that your posterior estimates can be grossly mistaken.
Oh, I read Taleb as using 'ludic fallacy' to mean using distributions with light tails.
However, equivalences are also the bread and butter of inference. Distinguishing more than you need to will slow you down.
Unfortunately I only have a finite amount of storage available, so I can only do that up to a certain point.
I was pleasantly surprised to see this elegant phrasing of a (Machian?) rationalist principle in popular culture:
-- Joe Adama in the TV series Caprica
This is at least as old as Leibnitz.
This doesn't really make sense. Just because the mice can't be coached for success, aren't aware of corporate goals, etc., it does not follow that they are one's "real bosses". Can the mice fire you? Can they give you a raise? Can they write you up for violations of corporate protocol? If you are having trouble with a coworker, can you appeal to the mice to resolve the issue? Do the mice, finally, decide what you work on? Your actual boss can take the mice away from you! Can the mice reassign you to a different boss?
There is, perhaps, a word missing from the English language. If Derek Lowe were speaking, instead of writing, he would put an exaggerated emphasis on the word real and native speakers of English would pick up on a special, metaphorical meaning for the word real in the phrase real boss. The idea is that there are hidden, behind the scenes connections more potent (more real?) than the overt connections.
There is a man in a suit, call him the actual boss, who issues orders. Perhaps one order is "run the toxicology tests". The actual boss is the same as the real boss so far. Perhaps another order is "and show that the compound is safe." Now power shifts to the mice. If the compound poisons the mice and they die, then the compound wasn't safe. The actual boss has no power here. It is the mice who are the real boss. They have final say on whether the compound is safe, regardless of the orders that the actual boss gave.
Derek Lowe is giving us an offshoot of an aphorism by Francis Bacon: "Nature, to be commanded, must be obeyed." Again the point is lost if one refuses to find a poetic reading. Nature accepts no commands; there are no Harry-Potter style spells. Nature issues no commands; we do not hear and obey, we just obey. (So why is Bacon advising us to obey?)
-- Lou Scheffer
(Most recent example from my own life that springs to mind: "It seems incredibly improbable that any Turing machine of size 100 could encode a complete solution to the halting problem for all Turing machines of size up to almost 100... oh. Nevermind.")
So what's the program? Is it the one that runs every turing machine up to length 100 for BusyBeaver(100) steps, and gets the number BusyBeaver(100) by running the BusyBeaver_100 program whose source code is hardcoded into it? That would be of length 100+c for some constant c, but maybe you didn't think the constant was worth mentioning.
Well, it's still encoded. But I actually meant to say "almost 100" in the original. And yes, that's the answer.
Pretty sure that also happens in fields other than the hard sciences. For example, it is said that converts to a religion are usually much more fervent than people who grew up with it (though there's an obvious selection bias).
(The advanced, dark-artsy version of this is claiming with a straight face to never have believed A in the first place, and hope the listener trusts what you're saying now more than their memory of what you said earlier, and if it doesn't work, claim they had misunderstood you. My maternal grandpa always tries to use that on my father, and almost always fails, but if he does that I guess it's because it does work on other people.)
The operative glory is doing it in five seconds.
And, being right.
That's harder to distinguish from the outside.
That does (did?) seem improbable to me. I'd have expected n needed to be far larger than 100 before the overhead became negligible enough for 'almost n' to fit (ie. size 10,000 gives almost 10,000 would have seemed a lot more likely than size 100 gives almost 100). Do I need to update in the direction of optimal Turing machine code requiring very few bits?
In general, probably yes. Have you checked out the known parts of the Busy Beaver sequence? Be sure to guess what you expect to see before you look.
In specific, I don't know the size of the constant c.
I've also found this to be medium evidence that I'm not as informed about the subject as I thought that I was, so I back down by confidence somewhat. If I recently made an error that would have resulted in something very bad happening, I should be very careful about thinking that my next design is safe.
David Eagleman, Incognito, p. 71
Minor nitpick:
The reason we can learn the local language is that languages are memetically selected for learnability by humans.
So is everything else except biology and physics.
The memetic evolution of baroque music in Europe is a development towards learnability? There are probably no more than 100 people alive that can make their way through Bach's 2nd Partita for violin.
I'm pretty sure you're underestimating that by...a lot. Fermi estimate time:
Bach's sonatas and partitas for solo violin are a cornerstone of the violin repertory. We may therefore assume that every professor of violin at a major university or conservatory has performed at least one of them at least once, just like we may assume that every professor of mathematics has studied the Lebesgue dominated convergence theorem. How many professors of violin are there? Let's just consider one country, the United States. Each state in the U.S. has at least two major public universities (typically "University of X" and "X State University", where X is the state); some have many more, and this doesn't even count private universities. Personal experience suggests that the average big state university has about one professor of violin. There are 50 states in the U.S., so that's 100 people already right there. And we have yet to count:
Thus, it wouldn't surprise me at all if there were at least 10,000 people alive who have performed one of the sonatas and partitas (to say nothing of those who would be capable of performing them). There are six of these works in total, so we can divide this already-conservative estimate by six to (under)estimate the number who have performed the Second Partita in particular. (This is likely an underestimate because many of them will have performed more than one -- indeed, all six, in a fair number of cases.)
A glance at the recordings available on Amazon, sorted by release date may help put things into perspective.
The estimate "no more than 100 alive who can make it through" would be much more appropriate for a difficult contemporary work (like, say, Melismata by Milton Babbitt) than a 300-year-old standard.
Ever notice how you never hear humans playing music that humans aren't capable of learning to play? I think there may be some selection effects at play here...
Well, I never notice the satisfaction of that contradiction, quite, but I do notice that the history of baroque music includes the steady achievement of theretofore unreached technical difficulty.
It might be more accurate to say that pretty much everything, including what we call biology and physics -- humans are the ones codifying it -- is memetically selected to be learnable by humans. Not that it all develops towards being easier to learn.
This is a wild guess, but (on the assumption that you endorse this quote) is the thought that MWI stands in relation to experimentally testable physics as something like a metaphysical thesis, and that instrumentalism doesn't lack metaphysical theses of this kind, but simply refuses to acknowledge and examine them?
Anyway, a related quote, and so far as I know the oldest of this kind:
Actually, it was someone asking what the heck I meant by "reality fluid", to which the answer is that I don't know either which is why I always call it "magical reality fluid". I mean, I could add in something that sounded impressive and might to some degree be helpful along the lines of "It's the mind-projection-fallacy conjugate of 'probability' as it appears inside hypotheses about collections of real things in which some real things are more predicted to happen to me than others for purposes of executing post-observation Bayesian updates, like, if the squared modulus rule appearing in the Born statistics reflected the quantity present of an actual kind of stuff" but I think saying, "It's magic, which is the mind-projection-fallacy conjugate of 'I'm confused'" would be wiser in a conversation like that. I think it's very important not to create the illusion of knowing more than you do, when you try to operate at the frontiers of your own ability to be coherent. At the same time, refusing to digress into metaphysics even to demarcate the things that confuse you, even to form ideas which can be explicitly incoherent rather than implicitly incoherent, is indeed to become the slave of the unexamined thought.
William James
-- Teddy Atlas
--Clue (1985)
Daniel Waterhouse says to Hooke in Neal Stephenson's Quicksilver
Daniel Dennett
Daniel Dennett
Is this from the new book?
Yes.
And yes, it's great.
--Clyde Coombs, A theory of data 1964, pg284,488
Bo Dahlbom
-Paul Graham
-- René Descartes
-Marcus Aurelius
That's actually kind of sad. Hopefully times have changed since then.
It's my understanding that Marcus Aurelius no longer voices this opinion.
And the people who preserved his words to reach us were more like wise men who watched the skies and solved the puzzle of cheaply distributing text, than like emperors or philosophers.
Observing the sky is good and productive science. Perhaps he meant that as an emperor (or responsible senator, etc) he should not have been drawn into a serious scientific or philosophical career, but for those who can afford the time and effort, it's a fine pursuit.
I was told that that part was actually a reference to astrology.
-Arabian proverb
A not-entirely-different quote has been posted in the past
–A.L. Kitselman
...this seems exactly, diametrically wrong.
Why do you say that? Many times, you say something publicly, it then becomes part of your identity, and after that there is a subconscious force that tries to make sure that your future actions and words are in line with what you said earlier.
This is also what I take from the quote - before I state a belief out loud I have a much easier time adjusting and retracting it - once it's out there, I've got pride and status tied up with it being right. Once I realized this a few years ago I starting making a conscious effort to not say things out loud until I was extremely confident that I was right. I still make this mistake more often than I would like, but less frequently.
I would have said merely wrong. ie. When reversed it would still be stupidity. There seem to be both advantages and disadvantages to public expression with respect to it influencing you. Something along the lines of identity commitments on one side and the potential for denial, hypocrisy and lack of feedback on the other.
I thought over verbal overshadowing when I read it.
One of the things that I dislike about aphorisms is that they sometimes compress insight so much that it's not easy to see what they were actually saying. I intuitively think that this is sometimes done because sounding deeply wise is often high status.
To recognize that some of the things our culture believes are not true imposes on us the duty of finding out which are true and which are not.
--Allan Bloom, Giants and Dwarfs, "Western Civ"
That clashes in an interesting way with the recent post on Privileging the Question. Let us draw up our own, independent list of things that matter. There will be some, high up our list, about which our culture has no particular belief. Our self imposed duty is to find out whether they are true or not, leaving less important, culturally prominent beliefs alone.
Culture changes and many prominent beliefs of our culture will fade away, truth unchecked, before we are through with more urgent matters.
-- Byron Katie, Loving What Is
Leibniz in Neal Stephenson's The Confusion
Huh, I thought the point of atoms is that they're not infinitely small.
Kevin Warwick
I don't suppose you've got a cite for the central claim here? It's a decent enough example of reasoning from the bottom line whether or not it turns out to be true, but I Googled a couple different sets of keywords, and the only thing that came up besides a whole mess of birth records and obstetricians' papers was Warwick's lecture notes.
Google turns up a source for the "women of genius" quote, a book "Sex Differences in Cognitive Abilities" by D. Halpern. The book's quote is from someone named Bayertahl, and it's an indirect quotation from a 1989 article, "Sexual Dimorphism in the Human Brain: Dispelling the Myths" supposedly by a J. Janowsky. I say supposedly because looking for a fulltext leads me to a version with a similar title ("Sexual Dimorphism of the Human Brain: Myths and Realities") but is by M. A. Hofman and D. F. Swaab; it contains the Bayertahl quote in the original German and says that the primary source is this 1932 article by a Louis Bolk, "Hersenen en Cultuur" (Brains and Culture). This is also a full text, in Dutch; Google's translation seems to roughly confirm the claim as reported by Warwick (though the "women of genius" quote does not seem to appear in Bolk's article, at a first cursory glance).
Johann Wolfgang von Goethe, Hermann und Dorothea, IX. 303.
George Bernard Shaw
Can someone please explain to me how this is a rationality quote? (not sarcastic)
Seems to be along the lines of encouraging proactive agency. (Actively taking actions to optimise the world according to his preferences.) An instrumental rationality lesson.
(There are also less positive messages embedded there, which are a mix of anti-epistemology and dark arts, but I assume Malik is intending the instrumental message.)
Nietzsche's hilariously intense (albeit somewhat tempestuous) intellectual crush on Goethe makes a little more sense to me now.
-- Doug McDuff, M.D., and John Little, Body by Science, pp. ix-x
Lee Kelly.
Warren Buffet on proponents of the efficient market hypothesis
Citation? I've read the Tao Teh Ching in a few translations and I don't recognize that at all; a Google and Google Books makes it sound like the usual apocrypha.
This is technically true for inclusive definitions of 'want' but highly misleading. There is a world of difference between "I want X but the opportunity cost (Y) is too great" and "I actively prefer !X". X and Y may be the prevention of parasitic worm infections and combating malaria. Precisely which limited resource is being allocated (time or money) changes little.
If "I don't have time" is to be replaced with an expression which conveys more personal acceptance of responsibility then it would be reasonable to translate it to "I have other priorities" but verging on disingenuous to translate it into "I don't want to".
I think you're reading this too literally. To my mind this says "You have the power to allocate your time" which is a non-trivial realization to some people. You can also understand this as saying "You allocate time to tasks according to how much you want to do them", an observation which also does not always rise to the conscious level.
This also requires a strange definition of "want" in order to become correct. Actions chosen for instrumental reasons sometimes differ from both the emotional urge and the all-else-equal reasoned preference, and so it's not particularly natural to include them under the label of "wanting".
Rodger Cotes defending Newton from the charge that his theory treats gravity as an occult cause:
--from the ongoing animation xkcd: Time, dialogue transcript found here
Marcel Kinsbourne, quoted in Dennett (2013)
Matthew 7:13-14
Illustration of availability bias:
http://www.youtube.com/watch?v=LVM4jR3TZsU
Thus the availability bias defeats the Pascal mugging.
That's not true at all. It's those who made up their minds to be good but aren't who do the most evil.
I'm not sure that's true in aggregate. I think most of the evil is done by people going along with things - like, if you talked to them about it for a while they'd concede that some aspects of what they were going along with were sort of questionable and maybe a bit bad, but they don't think about that spontaneously.
Right, sure, every evil overlord needs a group of willing henchmen and an army of reluctant-to-object enablers. So, the original quote is probably right "in aggregate", though not in the amount of evil per person. Even then, how do you attribute/distribute the amount of evil between, say, Pol Pot ordering the destruction of intelligentsia and the genocide of the Chinese minority and a peasant working his rice field in the countryside, occasionally affirming his allegiance to the regime, as required? Hmm, I recall HPMoR!Quirrell talking about it, but I'm not sure how much of it is author tract.
Daniel Dennett
Does Dennett offer supporting arguments for this assertion?
"People don't pay much attention to anything unless you give them reason to"
--The Night Circus
People don't do anything unless they have a reason to - given a sufficiently broad definition of "reason".
Dupe: http://lesswrong.com/lw/8n9/rationality_quotes_december_2011/5erg
I've never seen the Icarus story as a lesson about the limitations of humans. I see it as a lesson about the limitations of wax as an adhesive, - Randall Munroe.