# binomial variance problem

5 06 April 2012 10:59PM

Found in an old Kahneman & Tversky paper:

There are two programs in a high school. Boys are a majority (65%) in program A, and a minority (45%) in program B. There is an equal number of classes in each of the two programs.

You enter a class at random, and observe that 55% of the students are boys. What is your best guess -- does the class belong to program A or to program B?

Sort By: Best
Comment author: 07 April 2012 05:42:59PM 2 points [-]
Comment author: 07 April 2012 01:11:02AM 1 point [-]

Um, B, but only by a hair. 55 is equidistant between 45 and 65, but the variance is smaller for A because 65 is farther from 50 than 45 is, so measured by the relevant standard deviations, 55 is closer to 45 than 65. (Making the obviously obvious assumption that children are assigned to classes independent of gender.)

I had to google up the source to find out why the "obvious" answer is supposed to be A.

Comment author: 08 April 2012 03:11:59AM 1 point [-]

What's the name of the principle that variance increases further from 50%?

Comment author: 08 April 2012 02:07:51PM *  2 points [-]

Not having memorized the formula for variance in binomial distributions, but intuiting that said principle was true, was my weaker reason for concluding B.

More saliently, the problem statement contains the gratuitous information that boys are a majority in program A. It's Kahneman and Tversky, for FSM's sake; therefore this information is used to mislead. Therefore, B.

Comment author: 08 April 2012 02:22:09PM 1 point [-]

Decreases! Note that there's zero variance when p = 0 versus non-zero variance when p = 0.5.

Comment author: 08 April 2012 09:41:29AM 0 points [-]

No principle, just the fact that the variance of the binomial distribution is p(1-p), which peaks at p=0.5.

Comment author: 08 April 2012 12:55:40AM 0 points [-]

It looks like I approached the problem in exactly the same way you did. I'm very curious as to how common it is for people to think A is more likely; it really doesn't seem obvious to me either.

Comment author: 08 April 2012 03:10:48AM 0 points [-]

75% choose program A

Comment author: 06 April 2012 11:14:19PM 1 point [-]

Cute problem. And you can probably go a bit further in assessing how good your best guess is by inferring that the class size is at least 20 and lower bounding your variances.

[Or you can be dickish/clever and claim that the problem is underspecified because you're only given the overall boy/girl percentages for the two programs, and not their distribution. E.g., if each class has either exactly 65% or exactly 45% boys, then your observation is consistent with neither of the classes.]

Comment author: 07 April 2012 02:28:08AM 2 points [-]

[Actually you can't be dickish/clever that way: The problem isn't underspecified as the goal is to do the best you can with the information you've got. You've got no information/evidence regarding the distribution between classes so your best bet is to treat it as random. From there you can use Bayes theorem, blah blah, etc. etc....]

Comment author: 07 April 2012 02:55:32AM *  8 points [-]

Or just change the 45% and 65% to 11% and 99%. That makes the correct answer pretty obvious without changing anything important.

Comment author: 08 April 2012 01:53:33PM *  0 points [-]

Oops, you're right. The variant of the problem I mentioned above got rid of the assumption of binomially distributed boys (equivalently, girls).

The following setup should work, though:

$\\ z_i \sim \text{Bernoulli}(0.5) \\ p_i | z_i = 0 \sim \text{Beta}(a_0, b_0) \\ p_i | z_i = 1 \sim \text{Beta}(a_1, b_1) \\ x_i \sim \text{Binomial}(n, p_i)$

In words, this says that to generate the i-th class, you flip a coin to tell whether it's in program A or program B, conditioned on the program, the proportion of boys is drawn from a program-specific beta distribution, and then the number of boys is drawn from the corresponding binomial distribution. Under the constraints that $a_0 / (a_0 + b_0) = 0.65$ and $a_1 / (a_1 + b_1) = 0.45$, the average proportion of boys matches up with the problem.

However, by taking $a_0$ or $b_0$ small (where $a_1$ and $b_1$ are adjusted accordingly to maintain the constraint), you can play with the variance so that the observed 55% boys class is more likely under either of the programs. If you had available repeated trials, you might be able to learn $a_0$ and $b_0$. In a single trial, you can't be sure that your strategy will do worse than chance.

Comment author: 07 April 2012 07:53:39AM *  0 points [-]

SPOILER ALERT: solution presented here. Rot-13 would be immensely painful, so I'll just present some facts from which an LW reader can piece together a solution. The probability of drawing 11 blue balls followed by 9 green balls from an urn that's 45% blue is 7.0567033E-7. If the urn is 65% blue it's 6.8969856E-7. The log-likelihood-ratio is 0.099425348 decibans. So for practical purposes the answer is "my best guess is still 50/50" but the posterior probability is really 0.50572313. If you draw 100 rather than 20 it's 0.52858571.

I think even if it were a real-life problem I would have correctly guessed, without doing arithmetic, which class had more evidence; but I was sort of spoilered by knowing that the answer has to be the "counterintuitive" one.

ETA: another way to do the sums is that each boy provides 1.5970084 decibans of evidence for program A, and each girl 1.9629465 for program B.

Comment author: 07 April 2012 08:04:03AM 0 points [-]

If you look at girls in addition to boys, it's no longer quite so counterintuitive.