So, not quite an explanation, more of an exercise:
Oswald brings his laptop to a bar, loads up Matlab, and types:
p=rand(); c=0;
p is now a double between 0 and 1, which we can treat as continuously and uniformly distributed across that range. c is the number of times you've gotten a hint.
Now, Oswald types in another line:
[c+=1, rand()<p]
This will both increase the number of hints you've received, and give you a 0 or 1, if a new, uniformly selected random number is smaller than the first random number. (Basically, this is flipping a biased coin which gives 'heads' with probability p and tails with probability 1-p. You can repeat this line as many times as you like.
Now, this bar is called The Improper Prior, and as such is filled with Bayesians. It's readily obvious to the patrons that their posterior on p should be a beta distribution, with α equal to one plus the number of 1s and β equal to one plus the number of 0s.
But now is when things get interesting: your chance of guessing p exactly is basically zero. So Oswald might instead reward you for guessing within .05 of the actual p. More guesses should be penalized- either by decreasing the acceptable range or by decreasing the reward for guessing correctly. Alternatively, Oswald might reward you based on the precision of your posterior, or some other function.
Unfortunately, the beta distribution's cdf is not pleasant to play with. Matlab can deal with it easily- just type:
betainc(x,a,b)
We could determine the chance that your guess is within .05 of the correct by typing:
betainc(x+.05,a,b)-betainc(x-.05,a,b)
Unfortunately (again!), this isn't maximized by centering your estimate at the mean, unless a=b. You can test this with a=3, b=2; we have:
betainc(.65,3,2)-betainc(.55,3,2)=.17200
betainc(.66,3,2)-betainc(.56,3,2)=.17331
And so if Oswald uses this reward system, we'll have to solve an optimization problem to determine what our guess is at each stage, which isn't going to be fun. (The dumb way to do it throws
betainc(x+.05,a,b)-betainc(x-.05,a,b);
into some nonlinear optimization algorithm which shifts around x until it finds a local maximum, starting with a/(a+b) as the guess. What's the smart way to do it?)
Oswald might also be reluctant to reward us based on precision, because that can grow enormously high as α and β increase. So instead let's suppose he offers a flat reward, minus some constant times the variance minus some constant times the number of guesses we made, and he wants to know how to price entry into the game, so he can set the expected profit where he wants it to be.
Now we're in an interesting situation, because the variance can increase or decrease based on what we've seen. If you get two heads in a row, the variance is .06; a tails will increase it to .077, and a third heads will decrease it to .039. On average, you expect the variance after you see another coin to be .048. On average, the variance should always decrease after we get another hint. We also know that the amount each hint is expected to lower our variance will be a decreasing function of α and β for large enough values. (Really? Why would you believe those two statements?)
We can now easily calculate the actual variance and the expected variance after another hint for any (α,β) pair. If the costs are fixed we can determine when it wouldn't be worthwhile to buy one more. If α and β and large enough, that'll be enough for us to stop because we know future hints will be less valuable than the current hint and the current hint is a bad idea.
We can then propagate backwards from the terminal states to determine the total value of playing the game optimally. We also can be certain this game valuation procedure will terminate in reasonable time for reasonable choices of the penalty parameters. (Again, why?)
I posted this problem to my own blog the other day. When I posted it, I thought it looked very easy, more fiddly than difficult:
I reasoned thus:
There's no reason that you should have any opinion on which piece of paper he's brought. So you start off thinking 50:50, and that leads you to believe that he's effectively just given you £500.
If he tells you a number, then your belief will change. Say he tells you 1, then you know that he's brought the 1D12 results, and so you're now able to tell him that, and collect your £1000.
If he tells you 7, then that's twice as likely to be the 2D6 talking as the D12, and you should shift your prior to 1:3.
If you've got a prior of 1:3, then your guess (that it's the 2D6) is now worth £750, on average.
So when you get a new number, your prior shifts, the bet changes value. Average over all the cases and that's what you'll pay to know the first number.
Using this reckoning, I thought the answer to the puzzle was £125.
But now I'm not so sure, because the same reasoning tells you that if, for whatever reason, you start out 9:1 in favour of the 1D12, then the value of the new information is zero. (Because whatever the new information is, it won't be enough to change your mind).
But can that really be true? Because that implies that if Omega keeps making you the same offer for £1, then you should keep turning it down.
But if he told you a hundred numbers, you'd be damned sure which piece of paper he'd brought. So surely they have some value over £1?
But maybe you say: "Well, you can't put a value on the information unless you know how many extra opportunities you'll get."
Really? I'm sure that I'd pay £1 for the number in the original problem, and sure that I wouldn't pay £1000.
Where am I mis-thinking, and how should I calculate the answer to my puzzle?
Edit:
Just to clarify, if you buy the first number and it's a 2, and then you buy the second number and it's a 12, then I think you're now back in the same situation with a prior of 9:1 and an expected gain of £900.
I think you'd be mad to stop buying numbers at this point, since there's £100 you're not certain of yet. But if I don't believe that the price is £0, why do I believe that the price for the first one is £125?
Edit II:
It seems that the opinion of most people is that the problem is under-determined, in the sense that you don't know what options are coming. Fair enough.
In which case, what's wrong with the intuition that your beliefs alone determine the worth of your option to guess?
And in the more specific version where Oswald charges a price of one penny for every result, and you can keep buying them one-by-one until you decide you're certain enough and guess, what criterion do you use to stop guessing?