Recent updates to gwern.net (2014-2015)
“Receive my instruction, and not silver; and knowledge rather than choice gold. / For wisdom is better than rubies; and all the things that may be desired are not to be compared to it.”
Sorted by topic:
Darknet market related:
- Darknet Market archives, 2011-2015: 1.5tb of mirrors of scores of Tor-Bitcoin black-markets & forums 2013-2015, and other material; this is the single largest public archive of all DNM materials, and creating it was a major focus of mine since December 2013. The release also marks the end of my career as DNM expert - I’ve lost interest in the topic due to the apparent stability of the DNMs & being trapped in a local equilibrium
- DNM arrests compilation: a census of all known arrests Jan 2011-June 2015
- “Silk Goxed: How DPR used MtGox for hedging & lost big”
- there was an ICE subpoena on my Reddit account
Statistics & decision theory:
- When Does The Mail Come? A subjective Bayesian decision-theoretic analysis of local mail delivery times
resortertool for statistically re-ranking a set of ratings- analysis of Effective Altruists’ donations as reported in the LW survey
- anthology on how “everything is correlated”
- electric vs stove kettle boiling-time analysis: collected some simple data on my kettles & demonstrated some statistics tools on the dataset like a Bayesian measurement-error model
- dysgenics power analysis: how much genetic data would it take to falsify those claims?
- noisy polls: modeling potentially falsified poll data
- Value of Information for suicide (example cost-benefit analysis of weakly predicting suicide)
- Air conditioner upgrade cost-benefit analysis
- probability/gerontology problem: can one visit 566 centenarians before any die? No.
- do causal networks explain why correlation≠causation is so often true?
- a little example of estimating scores from censored data
QS related:
- 2015 modafinil community survey (not quite finished)
- Bitter Melon experimental & cost-benefit analysis
- Redshift self-experiment: screen-reddening software shifts bedtime forward by 20 minutes
- magnesium citrate experiment finished: initial benefits but apparent cumulative overdose led to net negative effect and mixed effects on sleep
- playing with inferring Bayesian networks for my Zeo & body weight data (powerful generalization of SEMs, but requires a lot of data before networks stabilize)
- Nootropics: initial results on LLLT correlated with large increases; but the followup randomized experiment showed zero effect
- LLLT re-analysis: no change in sleep as hypothesized by another LLLT user
- analysis of sceaduwe’s spirulina/allergies self-experiment (no reduction in allergies)
- Noopept experiment (no benefits)
- Treadmill spaced repetition experiment: expanded analysis to cover treadmill’s impact on successive reviews with SEM (no additional damage to recall beyond that implied by the original damage)
- lithium orotate experiment finished: no effects positive or negative
-
sleep correlations:
- alcohol: no harm
- optimal bedtime: a little earlier than usual
- optimal wakeup time: a little earlier than usual
Tech:
- “Effective Use of arbtt”: My window tracker/time-logger of choice is arbtt which records X window info for later classification and analysis; but one of the challenges is you don’t know how to set up arbtt or improve your environment or write classifications rules. So I wrote a tutorial.
- Time-lock crypto: wrote a Bash implementation of serial hashing time-lock crypto, link to all known implementations of hash time-lock crypto; discuss recent major theoretical breakthroughs involving Bitcoin
Debunking:
- Bicycle face
- “Rail travel at high speed is not possible because passengers, unable to breathe, would die of asphyxia.”
- did Fifty Shades of Gray have only 4k readers as the original Twilight fanfiction?
gwern.net-related:
- switched to Patreon for donations
- continued sending out my newsletter; up to 24 issues now
- rewrote
gwern.netCSS to be mobile-friendly; should now be readable in an iPhone 6 browser - optimized website loading (removed Custom Search Engine, A/B testing, non-validating XML, outbound link-tracking; simplified Disqus; minified JS, and fully async/deferred JS loading)
-
A/B testing:
- proposal towards recurrent neural network for reinforcement learning of CSS
- metadata test: indicates moving it from the sidebar to the top of page works as well
- indentation test: no real result, defaulted to 2em
- floating footnotes test: verified no apparent harm (as hoped)
- paragraph indentation test (responding to anonymous complaint; they were wrong)
Why the tails come apart
[I'm unsure how much this rehashes things 'everyone knows already' - if old hat, feel free to downvote into oblivion. My other motivation for the cross-post is the hope it might catch the interest of someone with a stronger mathematical background who could make this line of argument more robust]
[Edit 2014/11/14: mainly adjustments and rewording in light of the many helpful comments below (thanks!). I've also added a geometric explanation.]
Many outcomes of interest have pretty good predictors. It seems that height correlates to performance in basketball (the average height in the NBA is around 6'7"). Faster serves in tennis improve one's likelihood of winning. IQ scores are known to predict a slew of factors, from income, to chance of being imprisoned, to lifespan.
What's interesting is what happens to these relationships 'out on the tail': extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6'7" is very tall, it lies within a couple of standard deviations of the median US adult male height - there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren't the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tend to be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this) (1).
The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). Why?
Too much of a good thing?
One candidate explanation would be that more isn't always better, and the correlations one gets looking at the whole population doesn't capture a reversal at the right tail. Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility. Maybe although having a faster serve is better all things being equal, but focusing too heavily on one's serve counterproductively neglects other areas of one's game. Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.
I would guess that these sorts of 'hidden trade-offs' are common. But, the 'divergence of tails' seems pretty ubiquitous (the tallest aren't the heaviest, the smartest parents don't have the smartest children, the fastest runners aren't the best footballers, etc. etc.), and it would be weird if there was always a 'too much of a good thing' story to be told for all of these associations. I think there is a more general explanation.
The simple graphical explanation
[Inspired by this essay from Grady Towers]
Suppose you make a scatter plot of two correlated variables. Here's one I grabbed off google, comparing the speed of a ball out of a baseball pitchers hand compared to its speed crossing crossing the plate:

It is unsurprising to see these are correlated (I'd guess the R-square is > 0.8). But if one looks at the extreme end of the graph, the very fastest balls out of the hand aren't the very fastest balls crossing the plate, and vice versa. This feature is general. Look at this data (again convenience sampled from googling 'scatter plot') of this:

Or this:

Or this:

Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker: (2)

The thing is, as one approaches the far corners of this ellipse, we see 'divergence of the tails': as the ellipse doesn't sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

So this offers an explanation why divergence at the tails is ubiquitous. Providing the sample size is largeish, and the correlation not too tight (the tighter the correlation, the larger the sample size required), one will observe the ellipses with the bulging sides of the distribution. (3)
Hence the very best basketball players aren't the very tallest (and vice versa), the very wealthiest not the very smartest, and so on and so forth for any correlated X and Y. If X and Y are "Estimated effect size" and "Actual effect size", or "Performance at T", and "Performance at T+n", then you have a graphical display of winner's curse and regression to the mean.
An intuitive explanation of the graphical explanation
It would be nice to have an intuitive handle on why this happens, even if we can be convinced that it happens. Here's my offer towards an explanation:
The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.
For a toy model, pretend that wealth is wholly explained by two factors: intelligence and conscientiousness. Let's also say these are equally important to the outcome, independent of one another and are normally distributed. (4) So, ceteris paribus, being more intelligent will make one richer, and the toy model stipulates there aren't 'hidden trade-offs': there's no negative correlation between intelligence and conscientiousness, even at the extremes. Yet the graphical explanation suggests we should still see divergence of the tails: the very smartest shouldn't be the very richest.
The intuitive explanation would go like this: start at the extreme tail - +4SD above the mean for intelligence, say. Although this gives them a massive boost to their wealth, we'd expect them to be average with respect to conscientiousness (we've stipulated they're independent). Further, as this ultra-smart population is small, we'd expect them to fall close to the average in this other independent factor: with 10 people at +4SD, you wouldn't expect any of them to be +2SD in conscientiousness.
Move down the tail to less extremely smart people - +3SD say. These people don't get such a boost to their wealth from their intelligence, but there should be a lot more of them (if 10 at +4SD, around 500 at +3SD), this means one should expect more variation in conscientiousness - it is much less surprising to find someone +3SD in intelligence and also +2SD in conscientiousness, and in the world where these things were equally important, they would 'beat' someone +4SD in intelligence but average in conscientiousness. Although a +4SD intelligence person will likely be better than a given +3SD intelligence person (the mean conscientiousness in both populations is 0SD, and so the average wealth of the +4SD intelligence population is 1SD higher than the 3SD intelligence people), the wealthiest of the +4SDs will not be as good as the best of the much larger number of +3SDs. The same sort of story emerges when we look at larger numbers of factors, and in cases where the factors contribute unequally to the outcome of interest.
When looking at a factor known to be predictive of an outcome, the largest outcome values will occur with sub-maximal factor values, as the larger population increases the chances of 'getting lucky' with the other factors:

So that's why the tails diverge.
A parallel geometric explanation
There's also a geometric explanation. The R-square measure of correlation between two sets of data is the same as the cosine of the angle between them when presented as vectors in N-dimensional space (explanations, derivations, and elaborations here, here, and here). (5) So here's another intuitive handle for tail divergence:

Grant a factor correlated with an outcome, which we represent with two vectors at an angle theta, the inverse cosine equal the R-squared. 'Reading off the expected outcome given a factor score is just moving along the factor vector and multiplying by cosine theta to get the distance along the outcome vector. As cos theta is never greater than 1, we see regression to the mean. The geometrical analogue to the tails coming apart is the absolute difference in length along factor versus length along outcome|factor scales with the length along the factor; the gap between extreme values of a factor and the less extreme values of the outcome grows linearly as the factor value gets more extreme. For concreteness (and granting normality), an R-square of 0.5 (corresponding to an angle of sixty degrees) means that +4SD (~1/15000) on a factor will be expected to be 'merely' +2SD (~1/40) in the outcome - and an R-square of 0.5 is remarkably strong in the social sciences, implying it accounts for half the variance.(6) The reverse - extreme outliers on outcome are not expected to be so extreme an outlier on a given contributing factor - follows by symmetry.
Endnote: EA relevance
I think this is interesting in and of itself, but it has relevance to Effective Altruism, given it generally focuses on the right tail of various things (What are the most effective charities? What is the best career? etc.) It generally vindicates worries about regression to the mean or winner's curse, and suggests that these will be pretty insoluble in all cases where the populations are large: even if you have really good means of assessing the best charities or the best careers so that your assessments correlate really strongly with what ones actually are the best, the very best ones you identify are unlikely to be actually the very best, as the tails will diverge.
This probably has limited practical relevance. Although you might expect that one of the 'not estimated as the very best' charities is in fact better than your estimated-to-be-best charity, you don't know which one, and your best bet remains your estimate (in the same way - at least in the toy model above - you should bet a 6'11" person is better at basketball than someone who is 6'4".)
There may be spread betting or portfolio scenarios where this factor comes into play - perhaps instead of funding AMF to diminishing returns when its marginal effectiveness dips below charity #2, we should be willing to spread funds sooner.(6) Mainly, though, it should lead us to be less self-confident.
1. Given income isn't normally distributed, using SDs might be misleading. But non-parametric ranking to get a similar picture: if Bill Gates is ~+4SD in intelligence, despite being the richest man in america, he is 'merely' in the smartest tens of thousands. Looking the other way, one might look at the generally modest achievements of people in high-IQ societies, but there are worries about adverse selection.
2. As nshepperd notes below, this depends on something like multivariate CLT. I'm pretty sure this can be weakened: all that is needed, by the lights of my graphical intuition, is that the envelope be concave. It is also worth clarifying the 'envelope' is only meant to illustrate the shape of the distribution, rather than some boundary that contains the entire probability density: as suggested by homunq: it is an 'pdf isobar' where probability density is higher inside the line than outside it.
3. One needs a large enough sample to 'fill in' the elliptical population density envelope, and the tighter the correlation, the larger the sample needed to fill in the sub-maximal bulges. The old faithful case is an example where actually you do get a 'point', although it is likely an outlier.
![]()
4. It's clear that this model is fairly easy to extend to >2 factor cases, but it is worth noting that in cases where the factors are positively correlated, one would need to take whatever component of the factors which are independent of one another.
5. My intuition is that in cartesian coordinates the R-square between correlated X and Y is actually also the cosine of the angle between the regression lines of X on Y and Y on X. But I can't see an obvious derivation, and I'm too lazy to demonstrate it myself. Sorry!
6. Another intuitive dividend is that this makes it clear why you can by R-squared to move between z-scores of correlated normal variables, which wasn't straightforwardly obvious to me.
7. I'd intuit, but again I can't demonstrate, the case for this becomes stronger with highly skewed interventions where almost all the impact is focused in relatively low probability channels, like averting a very specified existential risk.
Confound it! Correlation is (usually) not causation! But why not?
It is widely understood that statistical correlation between two variables ≠ causation. But despite this admonition, people are routinely overconfident in claiming correlations to support particular causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimating the prevalence of confounds/common-causation. I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.
Full article: http://www.gwern.net/Causality
Recent updates to gwern.net (2013-2014)
“It cannot be gotten for gold, neither shall silver be weighed for the price thereof. / It cannot be valued with the gold of Ophir, with the precious onyx, nor the sapphire. / The gold and the crystal cannot equal it: and the exchange of it shall not be for vessels of fine gold. / No mention shall be made of coral, or of pearls: for the price of wisdom is above rubies.”
Another 477 days are past, so what have I been up to? In roughly topical & chronological order, here are some major additions to gwern.net:
Statistics:
- Google Alerts: analysis of all my emails from Google Alerts to see whether/when they started to be less useful.
- Google shutdowns: compiled dataset of past & present Google products for a survival analysis attempting to investigate common claims about why Google abandons things & predict which would be shutdown in the next 5 years. So far the model’s predictions are doing well.
- applied survival analysis to modeling Methods of Rationality reviews on FF.net
- reproduced a paper analyzing Bitcoin exchange shutdown or theft risk.
- Public release of the Mnemosyne spaced repetition dataset (18GB of 121.2m flashcard reviews, collected ~2004-2014)
- nootropics survey analysis
- power simulation of the penalty from omitting important covariates in logistic regression
- did some spaced repetition research using the Mnemosyne logs: found weekly & time of day effects on memory performance - with a clear circadian rhythm; while my results aren’t conclusive, my analysis of 48m flashcard reviews from the public database finds that the best time for recalling your flashcards seem to be noon. (I haven’t looked at time correlates with next review, though.)
QS:
- DNB meta-analysis expanded with a dozen or so studies & a new covariate (whether payment reduces gains: it doesn’t)
- compiled a small meta-analysis of creatine’s effect on intelligence
- updated my analysis of SDr’s sleep data
- 2013 Lewis meditation quasi-experiment: A Quantified Selfer and a few other guys did some meditation while doing an arithmetic game; turned out to be a perfect application for multilevel modeling
- Modafinil: price table update
- Sleep and lunar phases: A recent paper claimed that there’s a phase-of-the-moon effect on circadian rhythms; since I have so much sleep data on myself, I thought I’d see if there’s any effect…
- analyzed a self-experiment about low level laser therapy improving reaction time
- Treadmill/spaced repetition experiment (likely interference)
- an LSD microdosing self-experiment (while there was a lot of criticism, I still regard as worthwhile and setting a new benchmark for any future research in that area.)
- finished caffeine-pill wakeup pilot trial, began full-scale blinded self-experiment
Black-markets:
- an analysis of whether a particular vendor on Silk Road is a federal mole (probably not, but some have claimed he was the source of the bad fake IDs Ross Ulbricht ordered)
- transcribed Drugs 2.0: “Your Crack’s in the Post” (book chapter)
- betting all and sundry that BlackMarket Reloaded & Sheep Marketplace will be busted or shut down within a year (no takers; my 1-year predictions were correct, but my 6-month predictions drastically underestimated the risk)
- preliminary black market survival analysis, done for the bet
- compiled a table of all known black-markets with lifetimes (intended for a larger survival analysis)
- estimating DPR’s net fortune based on the FBI numbers
- doxed the owner of Sheep Marketplace (see http://pastebin.com/raw.php?i=9spTATw6 & https://dl.dropboxusercontent.com/u/182368464/2013-11-03-sheepmarketplace-doxxing.maff )
- BBC Radio 5 & NHK interviews
- 2 Mike Power interviews
- I have begun systematically spidering all operational black-markets, and wrote a bit on how my complacency about free-market mechanisms lead to no serious archiving early on
Bitcoin:
- Wei Dai/Satoshi Nakamoto emails
- McCaleb email interview on MtGox
- short essay on Zerocoin prospects
- bets: update on bet with qwertyoruiop btc<$50 - conceded defeat, learned a lesson about panicking, and paid up; altogether admirable
- wrote up an essay on:
- 3 attempts to blackmail/extort/scam
- a fanfiction about Satoshi Nakamoto sent to me by an anonymous user, which was too good to simply delete
- Evolution’s attempted blackmail of me to find out who was criticizing them on Reddit
Tech:
- Spatial locality for better file compression
- Epigrams on technology
- Haskell Summer of Code: 2013 review
Literature/fiction
- Scholz’s Radiance: transcribed, annotated, commentary, copy of original novella & diff with corresponding material in the final novel, and Benford essay “Old Legends” on his physics career, SF & science, the “Star Wars” program, Edward Teller, etc; tracked down and scanned a copy of “The Astounding Investigation: The Manhattan Project’s Confrontation with Science Fiction”
- Book reviews: for the LW media threads, I began writing book reviews on GoodReads, but why let them keep my reviews? So I wrote a Haskell program to parse my GoodReads ratings & reviews into Pandoc Markdown and make my own backup.
-
Sand, on:
- forgotten cleaning methods in literature; and
- the forgotten science behind early SF’s “great pain of space”
-
compiled & expanded anthology of my poems
Misc:
- wrote a short essay defending Francis Fukuyama’s end of history thesis
- Cicadas for dinner: I caught some cicadas during the most recent Maryland emergence; I review the spaghetti dinner I made with them
- compiled my tea reviews
- I researched an old family friend in his 90s who has never been willing to talk about his government work during the Cold War & found some stuff using released Census records; he has since passed away.
Site:
-
I began A/B testing my site design to try to improve readability:
- no difference between 4 fonts
- no difference between lineheights
- no difference between the null hypothesis & the null hypothesis
- a pure black/white foreground/background performed better than mixes of off-colors
- font size 100-120%: default of 100% was best
- blockquote formatting: Readability-style bad, zebra-stripes good
- header capitalization: best result was to upcase title & all section headers
- tested font size & number size & table of contents background: status quo of all was best
- BeeLine Reader: no color variant performed better than no-highlighting
- anonymous feedback analysis (feedback turned out to be useful)
- deleted Flattr, trying out Gittip for donations; Gittip turns out to work much better
-
I began a newsletter/mailing-list; the back-issues are online:
Recent updates to gwern.net (2012-2013)
Previous: Recent updates to gwern.net (2011)
“But where shall wisdom be found? / And where is the place of understanding? / Man knoweth not the price thereof; neither is it found in the land of the living…for the price of wisdom is above rubies.”
As before, here is material I’ve worked on in the 477 days since my last update which LWers may find interesting. In roughly chronological & topical order, here are the major additions to gwern.net:
- I interviewed translator Michael House about his work in Japan as a translator
- finished data collection for my hafu anime statistics page and begun analysis. (I’ve achieved good coverage of characters, found an astonishingly consistent absence of Korean characters, and confirmed the blond-haired/blue-eyed stereotype; but my original thesis doesn’t seem to work and the data is too unevenly distributed to identify time trends.)
- judged the 2011 & 2012 results for the Haskell Summer of Codes and the accuracy of my predictions
- did a meta-analysis on whether dual n-back increases IQ, and examining possible biases and various claims about what makes the training work or not work
- did another meta-analysis on whether iodine increases IQ, etc
-
modafinil:
- checked for subjective effects of blinded modafinil
- updated my modafinil price-chart twice, and expanded with brand data and a new armodafinil table
- researched modafinil-related prosecutions & convictions in the USA
- and any connection with schizophrenia
- tried kratom
- did a nicotine gum/n-back experiment
- did 2 potassium experiments; neither improved my mood/productivity, and one damaged my sleep
- my Silk Road page has been expanded with a BBC interview, putting SR in a historical cypherpunk context, an updated account of all arrests & law enforcement actions, and application of basic statistics to ordering
- ran 2 sleep experiments on the timing of taking a vitamin D supplement: I found that taking vitamin D before bed substantially damaged my sleep, while taking vitamin D after waking up did not hurt & somewhat helped
- checked whether a walking desk (treadmill) damaged typing speed or accuracy
- I have run 3 Wikipedia experiments establishing that: Talk page edits are ignored by editors; random link deletions (and their restoration) are also ignored by editors; and external link suggestions on Talk pages are also ignored by readers. (I take the former 2 as indicative of the decline in edit activity and rise of deletionist beliefs on Wikipedia.)
- tried some economic/historical analysis: “Reasons of State: Why Didn’t Denmark Sell Greenland to the USA?”
- Defending sunk costs essay (LW discussion)
- “Slowing Moore’s Law: Why You Might Want To and How You Would Do It”
- “The Hyperbolic Time Chamber as Brain Emulation Analogy”
- tried estimating the bandwidth of a Death Note
- compiled predictions for Harry Potter and the Methods of Rationality
- looked into Conscientiousness and online education; studies so far are useless from a meta-analytic standpoint
- tripled length of appendix dealing with the reliability of mainstream science (methodological flaws, replication rates, etc)
- finished meta-ethics essay, “The Narrowing Circle”
- explained the philosophy saying “one man’s modus ponens is another man’s modus tollens”
- speculation about a restoration of the British monarchy
- clean up & exploratory data analysis of SDr’s lucid dreaming data
- Who wrote the Death Note script? (LW discussion)
- 2012 US election predictions: statistical comparison
- Turing-completeness in surprising places (inventory of particularly “weird machines”; relevant to computer and AI security)
Transcribed or translated:
- Nash’s letters on cryptography
- Douglas Hofstadter’s superrationality columns (from Metamagical Themas, 1985)
- “The Iron Law Of Evaluation And Other Metallic Rules”, Rossi 1987 (lessons from the large RCTs evaluating social & welfare interventions)
- “The Ups and Downs of the Hope Function In a Fruitless Search”, Falk et al 1994
- Gene Wolfe on writing
- “Shiny balls of Mud: William Gibson Looks at Japanese Pursuits of Perfection” (2002)
- “Otaku Talk”, Okada et al 2004
- “Earth in My Window”, Murakami 2005
- “On The Battlefield of ‘Superflat’”
- “Ero-Anime: Manga Comes Alive”, Sarrazin 2010
- 1996 NewType interview with Hideaki Anno (translated by me, with the help of an EGFer)
- 1997 Animeland interview with Hideaki Anno (bought, transcribed, and translated by me with the help of other LWers)
- 1997 Utena interviews
More technical:
- added edit history statistics/visualization for
gwern.netusing GitStats - site traffic updates: July-December 2011, January 2012-July 2012, July 2012-Jan 2013
- There’s also been a lot of backend changes: switching to Amazon S3+Cloudflare, adding error pages, metadata like tags, A/B testing, but no need to go into detail.
Personal:
- dumped my notes on my 2011 visit to San Francisco
- posted summaries of my personality & attitudes & my RSS feed collection
- enjoyed some mead; I still like tea better, though
- dumped notes on the 2012 SF convention ICON
Case Study: the Death Note Script and Bayes
"Who wrote the Death Note script?"
I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.
Using degrees of freedom to change the past for fun and profit
Follow-up to: Follow-up on ESP study: "We don't publish replications", Feed the Spinoff Heuristic!
Related to: Parapsychology: the control group for science, Dealing with the high quantity of scientific error in medicine
Using the same method as in Study 1, we asked 20 University of Pennsylvania undergraduates to listen to either “When I’m Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensibly unrelated task, they indicated their birth date (mm/dd/yyyy) and their father’s age. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040
That's from "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant," which runs simulations of a version of Shalizi's "neutral model of inquiry," with random (null) experimental results, augmented with a handful of choices in the setup and analysis of an experiment. Even before accounting for publication bias, these few choices produced a desired result "significant at the 5% level" 60.7% of the time, and at the 1% level 21.5% at the time.
I found it because of another paper claiming time-defying effects, during a search through all of the papers on Google Scholar citing Daryl Bem's precognition paper, which I discussed in a past post about the problems of publication bias and selection over the course of a study. For Bem, Richard Wiseman established a registry for the methods, and tests of the registered studies could be set prior to seeing the data (in addition to avoiding the file drawer).
Now a number of purported replications have been completed, with several available as preprints online, including a large "straight replication" carefully following the methods in Bem's paper, with some interesting findings discussed below. The picture does not look good for psi, and is a good reminder of the sheer cumulative power of applying a biased filter to many small choices.
How to Fix Science
Like The Cognitive Science of Rationality, this is a post for beginners. Send the link to your friends!

Science is broken. We know why, and we know how to fix it. What we lack is the will to change things.
In 2005, several analyses suggested that most published results in medicine are false. A 2008 review showed that perhaps 80% of academic journal articles mistake "statistical significance" for "significance" in the colloquial meaning of the word, an elementary error every introductory statistics textbook warns against. This year, a detailed investigation showed that half of published neuroscience papers contain one particular simple statistical mistake.
Also this year, a respected senior psychologist published in a leading journal a study claiming to show evidence of precognition. The editors explained that the paper was accepted because it was written clearly and followed the usual standards for experimental design and statistical methods.
Science writer Jonah Lehrer asks: "Is there something wrong with the scientific method?"
Yes, there is.
This shouldn't be a surprise. What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.
As time passes we learn new things about how to do science better. The Ancient Greeks practiced some science, but few scientists tested hypotheses against mathematical models before Ibn al-Haytham's 11th-century Book of Optics (which also contained hints of Occam's razor and positivism). Around the same time, Al-Biruni emphasized the importance of repeated trials for reducing the effect of accidents and errors. Galileo brought mathematics to greater prominence in scientific method, Bacon described eliminative induction, Newton demonstrated the power of consilience (unification), Peirce clarified the roles of deduction, induction, and abduction, and Popper emphasized the importance of falsification. We've also discovered the usefulness of peer review, control groups, blind and double-blind studies, plus a variety of statistical methods, and added these to "the" scientific method.
In many ways, the best science done today is better than ever — but it still has problems, and most science is done poorly. The good news is that we know what these problems are and we know multiple ways to fix them. What we lack is the will to change things.
This post won't list all the problems with science, nor will it list all the promising solutions for any of these problems. (Here's one I left out.) Below, I only describe a few of the basics.
Funnel plots: the study that didn't bark, or, visualizing regression to the null
Marginal Revolution linked a post at Genomes Unzipped, "Size matters, and other lessons from medical genetics", with the interesting centerpiece graph:

This is from pg 3 of an Ioannidis 2001 et al article (who else?) on what is called a funnel plot: each line represents a series of studies about some particularly hot gene-disease correlations, plotted where Y = the odds ratio (measure of effect size; all results are 'statistically significant', of course) and X = the sample size. The 1 line is the null hypothesis, here. You will notice something dramatic: as we move along the X-axis and sample sizes increase, everything begins to converge on 1:
Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstetric cholestasis to meningococcal disease in children, virtually none of which have ever been convincingly replicated.
(See also "Why epidemiology will not correct itself" or the DNB FAQ.)
MSF Theory: Another Explanation of Subjectively Objective Probability
Before I read Probability is in the Mind and Probability is Subjectively Objective I was a realist about probabilities; I was a frequentest. After I read them, I was just confused. I couldn't understand how a mind could accurately say the probability of getting a heart in a standard deck of playing cards was not 25%. It wasn't until I tried to explain the contrast between my view and the subjective view in a comment on Probability is Subjectively Objective that I realized I was a subjective Bayesian all along. So, if you've read Probability is in the Mind and read Probability is Subjectively Objective but still feel a little confused, hopefully, this will help.
I should mention that I'm not sure that EY would agree with my view of probability, but the view to be presented agrees with EY's view on at least these propositions:
- Probability is always in a mind, not in the world.
- The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.
- There is only one correct probability to assign to a proposition given your partial knowledge of the world.
- If there is no uncertainty, there is no probability.
And any position that holds these propositions is a non-realist-subjective view of probability.
Imagine a pre-shuffled deck of playing cards and two agents (they don't have to be humans), named "Johnny" and "Sally", which are betting 1 dollar each on the suit of the top card. As everyone knows, 1/4 of the cards in a playing card deck are hearts. We will name this belief F1; F1 stands for "1/4 of the cards in the deck are hearts.". Johnny and Sally both believe F1. F1 is all that Johnny knows about the deck of cards, but sally knows a little bit more about this deck. Sally also knows that 8 of the top 10 cards are hearts. Let F2 stand for "8 out of the 10 top cards are hearts.". Sally believes F2. John doesn't know whether or not F2. F1 and F2 are beliefs about the deck of cards and they are either true or false.
So, sally bets that the top card is a heart and Johnny bets against her, i.e., she puts her money on "Top card is a heart." being true; he puts his money on "~The top card is a heart." being true. After they make their bets, one could imagine Johnny making fun of Sally; he might say something like: "Are you nuts? You know, I have a 75% chance of winning. 1/4 of the cards are hearts; you can't argue with that!" Sally might reply: "Don't forget that the probability you assign to '~The top card is a heart.' depends on what you know about the deck. I think you would agree with me that there is an 80% chance that 'The top card is a heart' if you knew just a bit more about the state of the deck."
To be undecided about a proposition is to not know which possible world you are in; am I in the possible world where that proposition is true, or in the one where it is false? Both Johnny and Sally are undecided about "The top card is a heart."; their model of the world splits at that point of representation. Their knowledge is consistent with being in a possible world where the top card is a heart, or in a possible world where the top card is not a heart. The more statements they decide on, the smaller the configuration space of possible worlds they think they might find themselves in; deciding on a proposition takes a chunk off of that configuration space, and the content of that proposition determines the shape of the eliminated chunk; Sally's and Johnny's beliefs constrain their respective expected experiences, but not all the way to a point. The trick when constraining one's space of viable worlds, is to make sure that the real world is among the possible worlds that satisfy your beliefs. Sally still has the upper hand, because her space of viably possible worlds is smaller than Johnny's. There are many more ways you could arrange a standard deck of playing cards that satisfies F1 than there are ways to arrange a deck of cards that satisfies F1 and F2. To be clear, we don't need to believe that possible worlds actually exist to accept this view of belief; we just need to believe that any agent capable of being undecided about a proposition is also capable of imagining alternative ways the world could consistently turn out to be, i.e., capable of imagining possible worlds.
For convenience, we will say that a possible world W, is viable for an agent A, if and only if, W satisfies A's background knowledge of decided propositions, i.e., A thinks that W might be the world it finds itself in.
Of the possible worlds that satisfy F1, i.e., of the possible worlds where "1/4 of the cards are hearts" is true, 3/4 of them also satisfy "~The top card is a heart." Since Johnny holds that F1, and since he has no further information that might put stronger restrictions on his space of viable worlds, he ascribes a 75% probability to "~The top card is a heart." Sally, however, holds that F2 as well as F1. She knows that of the possible worlds that satisfy F1 only 1/4 of them satisfy "The top card is a heart." But she holds a proposition that constrains her space of viably possible worlds even further, namely F2. Most of the possible worlds that satisfy F1 are eliminated as viable worlds if we hold that F2 as well, because most of the possible worlds that satisfy F1 don't satisfy F2. Of the possible worlds that satisfy F2 exactly 80% of them satisfy "The top card is a heart." So, duh, Sally assigns an 80% probability to "The top card is a heart." They give that proposition different probabilities, and they are both right in assigning their respective probabilities; they don't disagree about how to assign probabilities, they just have different resources for doing so in this case. P(~The top card is a heart|F1) really is 75% and P(The top card is a heart|F2) really is 80%.
This setup makes it clear (to me at least) that the right probability to assign to a proposition depends on what you know. The more you know, i.e., the more you constrain the space of worlds you think you might be in, the more useful the probability you assign. The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.
This setup also makes it easy to see how an agent can be wrong about the probability it assigns to a proposition given its background knowledge. Imagine a third agent, named "Billy", that has the same information as Sally, but say's that there's a 99% chance of "The top card is a heart." Billy doesn't have any information that further constrains the possible worlds he thinks he might find himself in; he's just wrong about the fraction of possible worlds that satisfy F2 that also satisfy "The top card is a heart.". Of all the possible worlds that satisfy F2 exactly 80% of them satisfy "The top card is a heart.", no more, no less. There is only one correct probability to assign to a proposition given your partial knowledge.
The last benefit of this way of talking I'll mention is that it makes probability's dependence on ignorance clear. We can imagine another agent that knows the truth value of every proposition, lets call him "FSM". There is only one possible world that satisfies all of FSM's background knowledge; the only viable world for FSM is the real world. Of the possible worlds that satisfy FSM's background knowledge, either all of them satisfy "The top card is a heart." or none of them do, since there is only one viable world for FSM. So the only probabilities FSM can assign to "The top card is a heart." are 1 or 0. In fact, those are the only probabilities FSM can assign to any proposition. If there is no uncertainty, there is no probability.
The world knows whether or not any given proposition is true (assuming determinism). The world itself is never uncertain, only the parts of the world that we call agents can be uncertain. Hence, Probability is always in a mind, not in the world. The probabilities that the universe assigns to a proposition are always 1 or 0, for the same reasons FSM only assigns a 1 or 0, and 1 and 0 aren't really probabilities.
In conclusion, I'll risk the hypothesis that: Where 0≤x≤1, "P(a|b)=x" is true, if and only if, of the possible worlds that satisfy "b", x of them also satisfy "a". Probabilities are propositional attitudes, and the probability value (or range of values) you assign to a proposition is representative of the fraction of possible worlds you find viable that satisfy that proposition. You may be wrong about the value of that fraction, and as a result you may be wrong about the probability you assign.
We may call the position summarized by the hypothesis above "Modal Satisfaction Frequency theory", or "MSF theory".
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)