There are many things to say about this result by N. Taleb. To start with, a minor detail: I's would have written $\hat{p} = I^{-1}_{1/2}(m+1, n - m)$, which is much more coherent with the fact that he is inverting the CDF.
He is inverting the CDF of a Beta distribution with parameters (m+1, n-m) which is a posterior in the Beta-Binomial model of a Beta(1, 0) distribution (!!!), with no explanation at all! It would have made slightly more sense to use a Beta(1, 1) instead.
Note that all he does by selecting q = 1/2 choosing as this "optimal estimate" the median of the Beta(m+1, n-m) distribution, i.e., the median of the posterior distribution.
Note that he ignores completely the base rate of 5%. Cannot he make use of it at all? So, even better than a Beta(1, 1), I'd have chosen the maximum entropy distribution among those betas with mean .05. I.e., one with a large variance; in fact, Taleb complains that the Bayesian approach provides funny results with highly informative beta priors.
If I had been facing the problem, I would have inquired about the distribution of those historical records whose aggregation is a 5% average and use it as a prior to model this new doctor.
All in all, I do not thing Taleb wrote his best page on that day. But he has many other great ones to learn from!
I recently read Maximum ignorance probability, with applications to surgery's error rates by N.N. Taleb where he proposes a new estimator for the parameter p of a Bernoulli random variable. In this article, I review the main points of it and also share my own thoughts about it.
The estimator in question (which I will call maximum ignorance estimator) takes the following form
^p=1−I−10.5(n−m,m+1)
where I is the regularized beta function, n is the number of independent trials and m is the number of successes.
This estimator is derived by solving the following equation
Fp(m)=q
where Fp is the cumulative distribution function of a binomial with n trials and probability p of success. In words, this estimator sets p to a value such that the probability of observing m successes or less is exactly q. How do we pick q? The author sets q to 0.5 as it maximizes the entropy (more on this later).
Finally, the estimator is applied to a real world problem. A surgeon works in area with a mortality rate of 5% and he has performed 60 procedures with no fatalities. What can we say about his error probability? By applying the estimator described earlier ^p=0.01148.
Taleb argues that the empirical approach (^p=m/n) does not provide a lot of information because the sample is small, i.e. the estimate is ^p=0, however, we "know" that this is value is not 0, it is just that we have not observed enough samples to see a failure.
On the other hand, the Bayesian would pick a Beta prior for p. A Beta distribution has two parameters and we have only one constraint (mortality rate of 5% on the area) which leaves us with one degree of freedom. The choice of this remaining degree of freedom is arbitrary and it is shown that it has a significant impact on the final estimate obtained.
Having gone through the main points of the article, here follow my own thoughts: