Who wants to be a Millionaire?

Bucky

tldr; Using Toy models, the Kelly criterion, prospect theory, Bayes and amateur psychology in an unnecessarily detailed analysis of Who wants to be a Millionaire. I strongly suspect there will be errors but hopefully not too many.

Motivating example

I was watching Who wants to be a Millionaire? a couple of nights ago for the first time in about 20 years after its recent recommissioning in the UK.

One contestant’s general knowledge got her to £250,000, 2 correct answer away from £1,000,000. She had £125,000 guaranteed and still had 2 lifelines (ask the audience and phone a friend).

Question: The Norwegian explorer Roald Amundsen reached the South Pole on 14th December of which year?

A: 1891

B: 1901

C: 1911

D: 1921

She thought about/discussed this at length (there was no time limit and the total time spent on the question was about 20 minutes!). She knew that Amundsen beat Scott to the south pole and was confident that Scott was Victorian, which ruled out C & D. She pointed out that if it was 1911 then the 100 year anniversary would have been 2011 and she felt she would have remembered there being something about it in the news.

This didn’t help her choose between A & B so she asked the audience (she was confident none of her friends would know). The results were (from memory, maybe slightly off):

A: 28%

B: 48%

C: 24%

D: 0%

What would you do?

The result that stood out to me was the 24% who said C. After everything that she said about how confident she was that it isn’t 1911, who are these people voting C? It turns out they’re the people who knew the right answer.

Unfortunately, the contestant went with the majority, said B, and left with only £125k. Not too bad really, even if the name Roald Amundsen is haunting her and popping up everywhere she goes.

I’ll admit that even though I suspected that the answer was C based purely on the ask the audience result, I don’t think I would have been confident enough to go for it based only on that.

If I knew that b was the right answer, would I have been surprised that it got 48% of the vote? No, that would make sense.

If I knew that b was the wrong answer, would I have been surprised that it got 48% of the vote? Maybe a little, it is quite high, but not out of the question, especially this late in the game.

If I knew that c was the right answer, would I have been surprised that it got 24% of the vote? No, that would make sense; it’s a tough question so if most of the people who pressed c actually knew the answer then 24% sounds about right.

If I knew that c was the wrong answer, would I have been surprised that it got 24% of the vote? Yes, very, you have to be really committed to an answer to stick with it after the someone who is clearly good at general knowledge has ruled it out and given a couple of reasons why she thinks it’s wrong. Anything more than 10% would be a surprise, 24% would be really weird. This is especially true as D got 0% of the vote.

Let’s dig into some more detail.

Modelling ask the audience

Simple model

A simple model for Ask the Audience would be to expect that those who know the right answer would press the correct button and those who didn’t would guess equally spread between the 4 answers.

If we estimate that 20% of the audience actually know the answer, this gives 20:20:20:20 from the guessers, with an additional 20 for the correct answer. We get 40:20:20:20 and the correct answer is obvious. Even with a bit of random noise in the results the correct answer should be clear, provided enough people actually know the answer.

On late game questions, one would expect that fewer people will know the answer as few people win the jackpot. However, the game requires consecutive answering of questions to win the jackpot so each individual question doesn’t need to be too hard to prevent people from winning the jackpot (provided there are no indiscreet coughs).

Consider only people who get to the last 5 questions. Even if they, on average, know the answer to (or correctly guess) 50% of the time, only 1 in 32 will actually get to the jackpot (provided the questions are on sufficiently different subjects). 33% knowledge gives 1 in 243 wins. In the original UK series there were 5 winners in 1200 contestants (1 in 240) but as some contestants weren't good enough to reach the final 5 questions, 33% is a lower bound.

The average audience member probably isn’t as good as the best contestants but they have applied to be an audience member of WWTBAM so probably have a good interest in general knowledge (or are there with someone who does!). I think that 15-20% of the audience on average knowing the correct answer is probably not a bad starting point.

Salience

Imagine you’re an audience member faced with a question you don’t know the answer to. Possibly one of them stands out to you for some reason or other. Maybe it’s a name you recognise or a date which seems like a good enough guess.

Unfortunately, the girl next to you is thinking the same answer for a similar reason, as is the guy a couple of seat further down. 20% of the audience who guess the same thing as you for roughly the same reason. Instead of 40:20:20:20, the results are now 35:35:15:15.

This is bad news for the contestant as two answers are indistinguishable as the same number of people came up with a particular wrong answer due to a consistent reason as actually knew the answer. In reality this is rarely the case, with one effect being larger.

In this circumstance, most people will either take the money and run or go with the highest scoring answer. 20 years ago when I used to watch the program I had the heuristic that in the final 5 questions, it’s a better option to take the second highest scoring answer from ask the audience.

A better solution than to go consistently with the highest or second highest answer would be to consider which effect sizes you would expect – people actually knowing the answer and salience. Then, when you see the results, consider how surprising they would be based on the hypothesis that A, B, C or D is the correct answer.

Estimating salience is tricky but as an upper bound a brief look on the millionaire wiki gives an example of the audience voting 81:19 in favour of the wrong answer! This page gives even more extreme examples.

Using 50:50 and ask the audience for the same question

On that first page, a number of other examples are given of people using 50:50 and ask the audience lifelines on the same question. It is notable that all 4 used their 50:50 lifeline first, followed by ask the audience.

Superficially this makes sense – you want to give the audience as much information as possible to help them make the best choice.

However in reality there is probably a fairly binary split in the audience members – those who know and those who are guessing. You don’t care what the guessers think as its very hard for them to provide you with enough evidence to justify answering the question.

The only thing you actually care about is identifying the people who actually do know. If you refrain from using your 50:50 until after using ask the audience, the 50:50 serves the additional purpose of removing from the statistics a section of the audience who don’t know., increasing your signal to noise ratio.

In addition, if there is a highly salient wrong answer then you have a 2/3 chance of removing it. Say we have 35:35:15:15 ratio due to knowledge and salience effects as described above. Using the 50:50 has a 2/3 chance of leaving 35:15 and 1/3 chance of leaving 35:35.

35:15 gives an obvious best response under the model. Using the same model and removing the same questions but using the 50:50 lifeline first you would have 60:40 in favour of the correct answer. This is much less strong evidence and you would only need a relatively small bit of salience in favour of the wrong answer to tip the scales in the wrong direction.

If 35:35 is left you’re still stuck but you wouldn’t be in any better situation if you’d selected 50:50 first.

Salience influence by the contestant

One of the classic things to shout at the T.V. during Millionaire is “Don’t discuss the answers before using your Ask the Audience lifeline!”

It is often said that if you talk about which answers you think are most likely then you will influence the audience towards those answers. This means that you don’t get the audience’s true opinion on the subject. This seems to have happened in the Amundsen question.

The obvious thought comes to mind that one might use that influence deliberately but I’ll come back to that later.

(I’m working on the assumption that the rules prevent you from telling the audience what to do if they’re not sure!)

How confident do I need to be?

3 alternative models

Millionaire includes 2 safety nets, such that once a contestant reaches these nets they are guaranteed to win at least that amount. Once a safety net has been passed the contestants are no longer having to bet their entire bankroll on each answer.

I’m going to invoke the Kelly criterion here even though I know that the assumptions of the derivation are not met. Adjust up or down according to taste.

(Interestingly one of the Kelly assumptions is that you will get unlimited future opportunities to wager with an edge. In Millionaire you get another opportunity to wager iff you take the wager offered and win. Additionally, Kelly relates to the optimal amount to bet, rather than whether to accept a set sized wager.)

If we rearrange Kelly and apply it to Millionaire (where the prize doubles for each successful question answered (or very close)) then we arrive at a formula for how confident one should be in order to guess.

p_{k} > 1 - \frac{1}{2^{q + 1} - 1}

where q is the number of questions since your last safety net.

This implies that for each question past the last safety net you need about an extra bit of evidence before you are justified in answering the question.

Kelly might not be the best option for choosing a required probability. Let's say I value money logarithmically and calculate expected utility. In order to justify answering I require:

p_{e u} > \frac{q}{q + 1}

This is considerably less stringent, particularly as q increases.

We can compare this to prospect theory. I'll consider a loss to be twice as painful as a similar gain is pleasureable and anchor on how much money I'll get if I don't answer. I will let the decision weights equal the probabilities as they are not extreme. I then require:

p_{p} > \frac{2^{q} - 1}{1.5 \times 2^{q} - 1}

This is even less stringent for high q, tending towards $^{2} /_{3}$ for high q.

As prospect theory is descriptive rather than prescriptive this isn't necessarily how you should behave, but may represent how people do behave.

In the original UK version of WWTBAM, the host used to present the contestant with a cheque for the amount that they had just reached (for question ~11 onwards), before taking it away and saying "but we don't want to give you that" and proceeding to the next question. I used to assume that it was just showmanship, now I wonder whether it was a cunning plan to encourage anchoring to the current amount and making the contestant more likely to guess.

Working through the odds

In the example given, the contestant was 1 question past a lifeline so should require $p >^{2} /_{3}$ minimum in order to answer (I'll stick with Kelly for the moment).

Let’s say she had assigned probabilities of roughly 50:50:0:0 before she asked the audience. So in order to answer she only needed a single bit of evidence for one answer over the other.

$\frac{p (E | a)}{p (E | b)} = \frac{p (a | E)}{p (b | E)} \frac{p (b)}{p (a)} = \frac{^{2} /_{3}}{^{1} /_{3}} \frac{0.5}{0.5} = \frac{2}{1}$

The audience voting 28:48 in favour of option b maybe did represent a bit of evidence for b over a if you don't expect a high salience effect (she hadn’t distinguished between the 2 in her deliberations).

Coming at the same question and evidence, I started with 25:25:25:25 odds on which answer was correct because I had no real clue. In order for one of the probabilities to rise to $p >^{2} /_{3}$ , I would need 6:1 evidence in favour of that answer over all of the other answers combined.

I didn’t really assign any particular salience to any answer before the contestant started talking – none of the answers particularly drew me in and I couldn’t think why they would draw other people in. There might be a small effect with people being more likely to guess the middle 2 answers but not enough for me to adjust confidently.

However, after she ruled out C & D fairly confidently, I was looking at those straight away when the results came in, expecting that if one was more than 10% points higher than the other then that would be fairly good evidence in favour of that answer.

The actual result was about as clear as you could get: C=24% & D=0%. Assuming it is correct and useful, I think it comfortably assigns better than 6:1 odds in favour of C over the other answers.

How confident am I in my model?

However, I didn’t have any specific evidence that the model was correct, just my amateur psychology at work. If I let m be the model being true (and me having interpreted the results correctly), say that my model assigns 90% of its probability mass to option c and that if my model is wrong then I still have 25:25:25:25 odds, I can calculate how confident I need to be in my model to achieve $p (c) > 2 / 3.$

^{2} /_{3} < p (c) = p (m) p (c | m) + p (\neg m) p (c | \neg m) = p (m) (p (c | m) - p (c | \neg m)) + p (c | \neg m) p (m) > \frac{^{2} /_{3} - p (c | \neg m)}{p (c | m) - p (c | \neg m)} \approx 0.64

So should I have been willing to take the gamble? That depends on whether I thought that the model was more than 64% likely to be accurate.

I would have assigned prior probability of $p (m) \approx 0.4$ based purely on it seeming sensible vs my lack of expert knowledge and the fact that any effect might be smaller than I anticipated. The fact that answer d received 0% of the vote increased my confidence in the model and effect size but only up to maybe $p (m) = 0.5$ at best. I think that this counts as my gut instinct roughly matching with what my maths has come up with – I shouldn’t take the bet but it’s a close thing.

Having seen that the model worked on this occasion, I should update:

\frac{p (m | c)}{p (\neg m | c)} = \frac{p (c | m)}{p (c | \neg m)} \times \frac{p (m)}{p (\neg m)} = \frac{0.9}{0.25} \times \frac{0.4}{0.6} = 2.4

p (m | c) = 0.71

So now that I’ve seen the model work on this occasion I should be willing to bet based on a similar situation arising in future. However, if it was 2 or more questions since my last lifeline (or the salience model provided less decisive evidence) I should wait for more evidence for the model before being willing to bet. Again, this roughly matches my intuition, looking back, my prior was possibly a bit high.

Deliberately influencing the audience

I mentioned beforehand that an interesting strategy would be for the contestant to influence the audience deliberately to encourage it to vote in a particular way.

Imagine you were able to get all of the audience who didn’t know to vote for a single answer which you knew (or were fairly confident) was wrong. The most likely way to influence people in a particular way would be to pretend that you thought that this was the correct answer.

If it works, this should produce a very high vote for that question plus a 15-20% vote for the correct answer and a small vote for the remaining 2 options. The low vote on the other 2 which you influence away from would be good evidence that your influence attempt was successful.

If you get almost everyone in the audience voting for the influenced answer and no spike on any of the other 3 that either means that it’s the correct answer after all or that very few people know the actual answer.

I’m not sure how easy it would be to apply this level of influence towards a single answer. I suspect that audience members like to feel as though they are making some form of choice so it would probably be wise to leave open the possibility of at least one other option in your deliberations so that people can feel like they’re deciding on your favoured choice.

This might be made easier if one of the answers is particularly salient anyway. You shouldn’t need to do as much pushing to get more people to choose this answer.

If you have your 50:50 lifeline left then this will help you if your results are inconclusive. I suspect that if one maximised the use of ask the audience and 50:50 combined then it would be a rare occasion that you wouldn’t be able to get to at least $p >^{2} /_{3}$ in favour of one answer.

If I plan to influence the audience when I use my lifeline, I need to maximise my effect. Looking at Cialdini’s 6 principles, I think authority and consensus (a.k.a. social proof) are most likely to helpful here.

If people are going to be influenced by my statements then they need to believe that I am an authority on the question at hand. This can be done in at least 2 ways prior to asking the audience:

1. Establishing that you have good overall general knowledge

2. Establishing you ability to work through tricky questions to get to the correct answer

In order to persuade people away from some answers and towards others I need to give them a reason for changing. The reason doesn’t have to be true, just believable and I have to be able to come up with this quickly.

As for consensus, I think that when I choose to use my ask the audience I should say “I’m pretty sure the answer is D and that when I see results I’m going to feel like I wasted a lifeline but I just want to be sure as there’s a small chance it might be C”. Even though the audience members don’t know what the consensus is, an expectation of the consensus is created.

My main worry is that people might realise that everyone is going to vote the same way and then try to be helpful by selecting their original thought. However, I would expect the number of people who did this to be relatively low so I hope I'd be safe.

Summary

Superficial readings of ask the audience results are dangerous.

If you're going to use ask the audience and 50:50 on the same question, ask the audience first.

For each additional question past your latest safety net, approximately 1 more bit of evidence is required to justify answering (Kelly). Alternatively, $p > \frac{q}{q + 1}$ for expected utility. Don't trust your gut.

Watch lots of episodes beforehand to test your ability to predict what people who don't know the answer will guess.

If possible, influence the audience so that you are better able to perform this prediction in your game.

Even if you do all this, it will, at best, get you 1 question further in the quiz - your performance is still dominated by you general knowledge and the luck of the draw as to whether you get the questions which match your areas of knowledge.

Bonus material

Watch out if you play in Russia (from tv tropes):

Audiences of the Russian version are infamous for deliberately giving the wrong answer out of spite, especially to certain aggravating celebrities.

Similarly, either this French audience were really stupid or complete bastards.

[-]299881226y40

After reading this post, I have some questions and I asked the author directly for discussions.

Here's the questions and replies I got from Bucky! Hope it'd be useful to some of you.

***

Q1.

In the section of Working through the odds, how is the priori of the contestant's judge (0% for C and D) affecting the posteriori (28% 48% 24% 0% from the audience)?

And, how did it change from 6:1 (66% 11% 11% 11%, 2/3 given by Kelly's criterion) to only a 10% edge?

A1.

6:1 is the amount of evidence I required in order to be justified in guessing. This is calculated from my prior odds for each answer (p=0.25) requiring to move to p=0.67 for an individual answer.

The p=0.67 is calculated from Kelly - it creates enough of an edge to make the bet worthwhile.

The 10% doesn't refer to an edge in the Kelly criterion sense. Because the contestant had said that she was confident that both C and D were incorrect it seems likely that any audience member who didn't know the answer would say either A or B. If C or D got a high proportion of the vote share then that is strong evidence that those people are really confident in their answer. Of course some people who don't know still might say C or D, 10% was my limit of what fraction of the audience I thought might do this.

As the actual fraction was 24%, this gave me some evidence in favor of C. I don't need to calculate exactly how much evidence I think this gives me, only whether it is better evidence than 6:1. This is entirely subjective and depends on how well my amateur psychology works but I felt it was better than 6:1.

It's important to say that the audience's answer percentages aren't directly involved in the Bayesian update. The evidence in favor of C is

how likely the voting result would be given that C is true
versus
how likely the voting result would be given that C is false

This depends on the assumptions that you make about the audience's voting patterns.

Q2.

Considering purely theoretical assumptions i.e., excluding real world variables such as:

-whether the audience/contestant knows the answers or not

-the difficulty of the question

-should you wage it or quit immediately

Under such assumptions, then:

A: 50-50, then Ask the Audience

B: Ask the Audience, then 50-50

A or B, which order will be the best strategy?

Or, there will be no single "best" strategy solely based on Bayesian inference? (by always using lifeline in order A or B, you could be cutting out more "branches" in the total possible outcome than another one.)

A2.

My analysis depended on the assumptions that I make:

a) There's a fairly binary split between people who know and people who are just guessing

b) A small fraction of the audience actually know the answer for sure

c) The effect of salience is comparable to the fraction of people who know the correct answer (the contestant doesn't know which one is bigger)

d) You are confident that you are definitely going to use both lifelines on that question

If those things are all true then I'd be confident that B is the better option. I think "b" and "c" are fairly likely for end-game questions. If "a" isn't true (e.g. some people know that one answer is definitely wrong but aren't sure about the others) then I suspect that for most cases B is still the better option.

"d" is more complicated. If it isn't true then order A or B would partially depend on how likely you expect to give an answer after just one lifeline and how valuable you expect each lifeline to be later.

So I would consider there to be a best strategy for a given scenario and that generally order B should be favored more than it is intuitively. In Bayesian inference you always have to start with a prior and if that prior expects that all 4 assumptions are true then B is best. If assumption "d" is not true then this requires recalculation. If you were wanting to make this more general I guess you could change assumption "a" to different distributions of knowledge and see how it works out.

Q3.

Can our model be seen as a variant of Monty Hall Problem?

A3.

I can't see a way to make a Monty Hall analogy work.

In Monty Hall the point is that the host knows the correct answer and by giving constrained information about one which is incorrect he gives some extra evidence about which is correct.

If before 50:50 there was one randomly selected answer which the host declared would stay (whether it was right or wrong) then we'd be closer to a Monty Hall situation.

LESSWRONG
LW