## Questions from an imaginary statistical methods exam

Answers to these questions should be expressed numerically, where possible, but no number should be given without a justification for the specific value.

1. Suppose that you have mislaid your house keys, something most people have experienced at one time or another. You look in various places for them: where you remember having them last, places you've been recently, places they should be, places they shouldn't be, places they couldn't be, places you've looked already, and so on. Eventually, you find them and stop looking.

Every time you looked somewhere, you were testing a hypothesis about their location. You may have looked in a hundred places before finding them.

As a piece of scientific research to answer the question "where are my keys?", this procedure has massive methodological flaws. You tested a hundred hypotheses before finding one that the data supported, ignoring every failed hypothesis. You really wanted each of these hypotheses in turn to be true, and made no attempt to avoid bias. You stopped collecting data the moment a hypothesis was confirmed. When you were running out of ideas to test, you frantically thought up some more. You repeated some failed experiments in the hope of getting a different result. Multiple hypotheses, file drawer effect, motivated cognition, motivated stopping, researcher degrees of freedom, remining of old data: there is hardly a methodological sin you have not committed.

(a) Should these considerations modify your confidence or anyone else's that you have in fact found your keys? If not, why not, and if so, what correction is required?

(b) Should these considerations affect your subsequent decisions (e.g. to go out, locking the door behind you)?

2. You have a lottery ticket. (Of course, you are far too sensible to ever buy such a thing, but nevertheless suppose that you have one. Maybe it was an unexpected free gift with your groceries.) The lottery is to be drawn later that day, the results available from a web site whose brief URL is printed on the ticket. You calculate a chance of about 1 in 100 million of a prize worth getting excited about.

(a) Once the lottery results are out, do you check your ticket? Why, or why not?

(b) Suppose that you do, and it appears that you have won a very large sum of money. But you remember that the prior chance of this happening was 1 in 100 million. How confident are you at this point that you have won? What alternative hypotheses are also raised to your attention by the experience of observing the coincidence of the numbers on your ticket and the numbers on the lottery web site?

(c) Suppose that you go through the steps of contacting the lottery organisers to make a claim, having them verify the ticket, collecting the prize, seeing your own bank confirm the deposit, and using the money in whatever way you think best. At what point, if any, do you become confident that you really did win the lottery? If never, what alternative hypotheses are you still seriously entertaining, to the extent of acting differently on account of them?

Daniel Kahneman is being interviewed on Desert Island Discs on BBC Radio 4 right now (09:00-09:45 BST). The recording should be permanently available at that link from an hour after the programme ends.

A current article in Science reports on this study about how good people are at predicting what their future selves will be like. Not very good, apparently. Daniel Gilbert, a psychologist at Harvard, with other colleagues conducted several experiments online, in which 19,000 people were asked about such things as personality traits, preferences in music, etc., answering about the present, about themselves 10 years earlier, and about what they expected 10 years hence. More precisely, this not being a longitudinal study, people of any age X predicted less difference with their X+10 selves than people of age X+10 recollected of themselves at age X. The effect did not go away with increasing age: 58-year-olds still expected less change in the next 10 years than 68-year-olds reported in the last ten.

Gilbert and colleagues call this effect "the end of history illusion," because it suggests that people believe, consciously or not, that the present marks the point at which they've finally stopped changing.

"What these data suggest, and what scads of other data from our lab and others suggest, is that people really aren't very good at knowing who they're going to be and hence what they're going to want a decade from now," Gilbert says.

Someone suggests an alternative explanation:

Another possibility is that people "might well anticipate substantial change, yet not know how they would change, and thus, just predict the status quo"

An actionable moral:

"The single best way to make predictions about what you're going to want in the future isn't to imagine yourself in the future, … it's to look at other people who are in the very future you're imagining," [Gilbert] says.

## Rational subjects and rational practitioners

Half-closing my eyes and looking at the recent topic of morality from a distance, I am struck by the following trend.

In mathematics, there are no substantial controversies. (I am speaking of the present era in mathematics, since around the early 20th century. There were some before then, before it had been clearly worked out what was a proof and what was not.) There are few in physics, chemistry, molecular biology, astronomy. There are some but they are not the bulk of any of these subjects. Look at biology more generally, history, psychology, sociology, and controversy is a larger and larger part of the practice, in proportion to the distance of the subject from the possibility of reasonably conclusive experiments. Finally, politics and morality consist of nothing but controversy and always have done.

Curiously, participants in discussions of all of these subjects seem equally confident, regardless of the field's distance from experimental acquisition of reliable knowledge. What correlates with distance from objective knowledge is not uncertainty, but controversy. Across these fields (not necessarily within them), opinions are firmly held, independently of how well they can be supported. They are firmly defended and attacked in inverse proportion to that support. The less information there is about actual facts, the more scope there is for continuing the fight instead of changing one's mind. (So much for the Aumann agreement of Bayesian rationalists.)

Perhaps mathematicians and hard scientists are not more rational than others, but work in fields where it is easier to be rational. When they turn into crackpots outside their discipline, they were actually that irrational already, but have wandered into an area without safety rails.

## [Paper] Simulation of a complete cell

"A Whole-Cell Computational Model Predicts Phenotype from Genotype" by Jonathan Karr et al.

This paper appeared a few days ago in Cell, and describes a computational simulation of the bacterium Mycoplasma genitalium, conducted at this lab. The paper is behind a paywall, but is blogged about here. The simulation software is freely available from the project web site.

From the abstract: "Here, we present a ‘‘whole-cell’’ model of the bacterium Mycoplasma genitalium, a human urogenital parasite whose genome contains 525 genes. Our model attempts to: (1) describe the life cycle of a single cell from the level of individual molecules and their interactions; (2) account for the specific function of every annotated gene product; and (3) accurately predict a wide range of observable cellular behaviors."

According to an editorial commentary in the same issue, this is the first simulation of a complete free-living microbe.

## Another reason why a lot of studies may be wrong

It appears that standard lab rats and mice are all morbidly obese. Using them as model organisms may give misleading results that fail to transfer to humans, or even to healthy rats and mice.

Does this reduce the whole calorie-restriction thing to nothing more than the advice to not be a fat slob? Well, maybe not, according to the author, but it has to make one wonder.

## Another cooperative rationality exercise

I don't know how well this is going to work, but I mention it here because it's actually going to be done in a few weeks time at a day-long meeting of the research group that I work with. (Not my idea. I don't know which of us thought it up.)

Keyword game: explaining a scientific term. Everyone puts a keyword used in their project (for example, "Selective Sweep") into a hat. For each keyword in turn, get someone who does not understand the keyword to explain what they think it might mean.  They can then be enlightened by the people who know (of which there should be at least one!).

This is to be done in groups of four, and afterwards, the groups reassemble and each group presents its newly understood keyword meanings to the main group.

There are twenty people altogether.
Trying to guess what e.g. "Selective Sweep" is just from the words doesn't seem very sensible to me, but in practice I expect the result to be more of a conversation between the one on the spot and those who actually know. How do you know when you've grasped an idea that someone is explaining to you, and when you have not?

## Memory in the microtubules

A recent article in PloS Computational Biology suggests that memory is encoded in the microtubules. "Signaling and encoding in MTs and other cytoskeletal structures offer rapid, robust solid-state information processing which may reflect a general code for MT-based memory and information processing within neurons and other eukaryotic cells."

They argue that synaptic connections are transient compared with the lifetime of memories, and therefore memories cannot be stored in them, but in some more persistent structure. The structure they suggest is the phosphorylation state of sites on microtubule lattices within neurons. And that's about as much of the technical detail as I feel able to summarise. It's not all speculation, they report technical work on the structures of these cellular components. Total memory capacity would be somewhere upwards of 10^20 bits (or in more everyday units, 10 million terabytes), depending on the encoding, of which they suggest several schemes.

Journalistic writeup here.

Note that Stuart Hameroff, one of the authors, is known for his proposals for microtubules as the mechanism of consciousness through quantum effects (and with Penrose, quantum gravitational effects). The present paper, however, is solely about memory and does not touch on quantum coherence or consciousness.

## How to prove anything with a review article

Thus the subtitle of this blog posting at PLoS, referencing this article on "Cigarette smoking: an underused tool in high-performance endurance training". The point being that you can write a review article to argue anything you want, with sufficient cherry-picking and chains of links.

If you are doing actual experiments and making observations or proving theorems, then to a large extent -- larger in some sciences than in others -- you are constrained by the brute facts. But when writing secondary literature, especially in areas where data is generally fuzzier, it is easy, whether deliberately or not, to write to a bottom line, including findings you like and excluding those you don't.

Something to bear in mind when reading or writing any review article.

## What visionary project would you fund?

I have just received a survey questionnaire regarding future directions in EU (European Union) research funding, and thought it would be interesting to see how LessWrong would answer the main question:

Imagine that EU funding is available for one ambitious, visionary project extending beyond 2020.

• What kind of research challenges should such a project address in your area?
• What would be the most urgent research tasks?

