In the paragraph 4th from last, page says the sequence HHHHHT "is assigned 1/30 probability by the Rule of Succession". Where does this number come from? They don't explain. I do understand the part about that same sequence being assigned 1/64 by the fair coin hypothesis, but the part about the rule of succession isn't so clear to me.
The second example, in the paragraph 2nd from last, is also confusing to me: the part that says that the sequence HHHHH HTHHH HHTHH gives the Bayesian a 19.5 : 1 chance of the coin being biased vs it being fair.
I propose that this concept be called "unexpected surprise" rather than "strictly confused":
"Strictly confused" suggests logical incoherence.
"Unexpected surprise" can be motivated the following way: let s(d)=surprise(d∣H)=−logPr(d∣H) be how surprising data d is on hypothesis H. Then one is "strictly confused" if the observed s is larger than than one would expect assuming a H holds.
This terminology is nice because the average of s under H is the entropy or expected surprise in (d∣H). It also connects with Bayes, since log-likelihood=−surprise is the evidential support d gives H.
The section on "Distinction from frequentist p-values" is, I think, both technically incorrect and a bit uncharitable.
It's technically incorrect because the following isn't true:
The classical frequentist test for rejecting the null hypothesis involves considering the probability assigned to particular 'obvious'-seeming partitions of the data, and asking if we ended up inside a low-probability partition.
Actually, the classical frequentist test involves specifying an obvious-seeming measure of surprise t(d), and seeing whether t is higher than expected on H. This is even more arbitrary than the above.
On the other hand, it's uncharitable because it's widely acknowledged one should try to choose t to be sufficient, which is exactly the condition that the partition induced by t is "compatible" with Pr(d∣H) for different H, in the sense that Pr(H∣d)=Pr(H∣t(d)) for all the considered H.
Clearly s is sufficient in this sense. But there might be simpler functions of d that do the job too ("minimal sufficient statistics").
Note that t being sufficient doesn't make it non-arbitrary, as it may not be a monotone function of s.
Finally, I think that this concept is clearly "extra-Bayesian", in the sense that it's about non-probabilistic ("Knightian") uncertainty over H, and one is considering probabilities attached to unobserved d (i.e., not conditioning on the observed d).
I don't think being "extra-Bayesian" in this sense is problematic. But I think it should be owned-up to.
Actually, "unexpected surprise" reveals a nice connection between Bayesian and sampling-based uncertainty intervals:
To get a (HPD) credible interval, exclude those H that are relatively surprised by the observed d (or which are a priori surprising).
To get a (nice) confidence interval, exclude those H that are "unexpectedly surprised" by d.
In the paragraph 4th from last, page says the sequence HHHHHT "is assigned 1/30 probability by the Rule of Succession". Where does this number come from? They don't explain. I do understand the part about that same sequence being assigned 1/64 by the fair coin hypothesis, but the part about the rule of succession isn't so clear to me.
The second example, in the paragraph 2nd from last, is also confusing to me: the part that says that the sequence HHHHH HTHHH HHTHH gives the Bayesian a 19.5 : 1 chance of the coin being biased vs it being fair.
I propose that this concept be called "unexpected surprise" rather than "strictly confused":
This terminology is nice because the average of s under H is the entropy or expected surprise in (d∣H). It also connects with Bayes, since log-likelihood=−surprise is the evidential support d gives H.
The section on "Distinction from frequentist p-values" is, I think, both technically incorrect and a bit uncharitable.
It's technically incorrect because the following isn't true:
Actually, the classical frequentist test involves specifying an obvious-seeming measure of surprise t(d), and seeing whether t is higher than expected on H. This is even more arbitrary than the above.
On the other hand, it's uncharitable because it's widely acknowledged one should try to choose t to be sufficient, which is exactly the condition that the partition induced by t is "compatible" with Pr(d∣H) for different H, in the sense that Pr(H∣d)=Pr(H∣t(d)) for all the considered H.
Clearly s is sufficient in this sense. But there might be simpler functions of d that do the job too ("minimal sufficient statistics").
Note that t being sufficient doesn't make it non-arbitrary, as it may not be a monotone function of s.
Finally, I think that this concept is clearly "extra-Bayesian", in the sense that it's about non-probabilistic ("Knightian") uncertainty over H, and one is considering probabilities attached to unobserved d (i.e., not conditioning on the observed d).
I don't think being "extra-Bayesian" in this sense is problematic. But I think it should be owned-up to.
Actually, "unexpected surprise" reveals a nice connection between Bayesian and sampling-based uncertainty intervals: