The Logic of the Hypothesis Test: A Steel Man

Matt_Simpson

9 The Logic of the Hypothesis Test: A Steel Man

21st Feb 2013

3 min read

9

Related to: Beyond Bayesians and Frequentists

Update: This comment by Cyan clearly explains the mistake I made - I forgot that the ordering of the hypothesis space is important is necessary for hypothesis testing to work. I'm not entirely convinced that NHST can't be recast in some "thin" theory of induction that may well change the details of the actual test, but I have no idea how to formalize this notion of a "thin" theory and most of the commenters either 1) misunderstood my aim (my fault, not theirs) or 2) don't think it can be formalized.

I'm teaching an econometrics course this semester and one of the things I'm trying to do is make sure that my students actually understand the logic of the hypothesis test. You can motivate it in terms of controlling false positives but that sort of interpretation doesn't seem to be generally applicable. Another motivation is a simple deductive syllogism with a small but very important inductive component. I'm borrowing the idea from a something we discussed in a course I had with Mark Kaiser - he called it the "nested syllogism of experimentation." I think it applies equally well to most or even all hypothesis tests. It goes something like this:

1. Either the null hypothesis or the alternative hypothesis is true.

2. If the null hypothesis is true, then the data has a certain probability distribution.

3. Under this distribution, our sample is extremely unlikely.

4. Therefore under the null hypothesis, our sample is extremely unlikely.

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

An example looks like this:

Suppose we have a random sample from a population with a normal distribution that has an unknown mean $\mu$ and unknown variance $\sigma^2$ . Then:

1. Either $H_0:\mu=c$ or $H_a:\mu\neq c$ where $c$ is some constant.

2. Construct the test statistic $t^*=\frac{\bar{X}-c}{s/\sqrt{n}}$ where $n$ is the sample size, $\bar{X}$ is the sample mean, and $s$ is the sample standard deviation.

3. Under the null hypothesis, $t^*$ has a $t$ distribution with $n-1$ degrees of freedom.

4. $P(|t|>|t^*|)$ is really small under the null hypothesis (e.g. less than 0.05).

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

What's interesting to me about this process is that it almost tries to avoid induction altogether. Only the move from step 4 to 5 seems anything like an inductive argument. The rest is purely deductive - though admittedly it takes a couple premises in order to quantify just how likely our sample was and that surely has something to do with induction. But it's still a bit like solving the problem of induction by sweeping it under the rug then putting a big heavy deduction table on top so no one notices the lumps underneath.

This sounds like it's a criticism, but actually I think it might be a virtue to minimize the amount of induction in your argument. Suppose you're really uncertain about how to handle induction. Maybe you see a lot of plausible sounding approaches, but you can poke holes in all of them. So instead of trying to actually solve the problem of induction, you set out to come up with a process which is robust to alternative views of induction. Ideally, if one or another theory of induction turns out to be correct, you'd like it to do the least damage possible to any specific inductive inferences you've made. One way to do this is to avoid induction as much as possible so that you prevent "inductive contamination" spreading to everything you believe.

That's exactly what hypothesis testing seems to do. You start with a set of premises and keep deriving logical conclusions from them until you're forced to say "this seems really unlikely if a certain hypothesis is true, so we'll assume that the hypothesis is false" in order to get any further. Then you just keep on deriving logical conclusions with your new premise. Bayesians start yelling about the base rate fallacy in the inductive step, but they're presupposing their own theory of induction. If you're trying to be robust to inductive theories, why should you listen to a Bayesian instead of anyone else?

Now does hypothesis testing actually accomplish induction that is robust to philosophical views of induction? Well, I don't know - I'm really just spitballing here. But it does seem to be a useful steel man.

Personal Blog

9

New Comment

Rendering 0/37 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:56 PM

Moderation Log

9 The Logic of the Hypothesis Test: A Steel Man

by Matt_Simpson

21st Feb 2013

3 min read

9

Related to: Beyond Bayesians and Frequentists

1. Either the null hypothesis or the alternative hypothesis is true.

2. If the null hypothesis is true, then the data has a certain probability distribution.

3. Under this distribution, our sample is extremely unlikely.

4. Therefore under the null hypothesis, our sample is extremely unlikely.

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

An example looks like this:

Suppose we have a random sample from a population with a normal distribution that has an unknown mean $\mu$ and unknown variance $\sigma^2$ . Then:

1. Either $H_0:\mu=c$ or $H_a:\mu\neq c$ where $c$ is some constant.

2. Construct the test statistic $t^*=\frac{\bar{X}-c}{s/\sqrt{n}}$ where $n$ is the sample size, $\bar{X}$ is the sample mean, and $s$ is the sample standard deviation.

3. Under the null hypothesis, $t^*$ has a $t$ distribution with $n-1$ degrees of freedom.

4. $P(|t|>|t^*|)$ is really small under the null hypothesis (e.g. less than 0.05).

5. Therefore the null hypothesis is false.

6. Therefore the alternative hypothesis is true.

Personal Blog

9

New Comment

Rendering 0/37 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:56 PM

Moderation Log

More from Matt_Simpson

Curated and popular this week

37Comments

Comment Permalink

gwern13y00

I'm interested in the calculated confidence interval, not the p-value necessarily. Noodling around some more, I think I'm starting to understand it more: the confidence interval isn't calculated with respect to the H0 of 0 which the R code defaults to, it's calculated based purely on the mean (and then an H0 of 0 is assumed to spit out some p-value)

R> set.seed(12345); t.test(rnorm(20,100,15))

    One Sample t-test

data:  rnorm(20, 100, 15)
t = 36.16, df = 19, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  95.29 107.00
sample estimates:
mean of x
    101.1
R>
R> 107-95.29
[1] 11.71
R> 107 - (11.71/2)
[1] 101.1

Hm... I'm trying to fit this assumption into your framework....

Either h0, true mean = sample mean; or ha, true mean != sample mean
construct the test statistic: 't = sample mean - sample mean / s/sqrt(n)'
't = 0 / s/sqrt(n)'; t = 0
... a confidence interval

Matt_Simpson13y00

A 95% confidence interval is sort of like testing H0:mu=c vs Ha:mu=\=c for all values of c at the same time. In fact if you reject the null hypothesis for a given c when c is outside your calculated confidence interval and fail to reject otherwise, you're performing the exact same t-test with the exact same rejection criteria as the usual one (that is if the p-value is less than 0.05).

The formula for the test statistic is (generally) t = (estimate - c)/(standard error of estimate) while the formula for a confidence interval is (generally) estimate +/- t^(s... (read more)

See in context