Do you accept that by Cox's theorems, probability theory is the normative theory of epistemology?
Not Cox's theorem, although the complete class theorem is more convincing (as well as dutch book arguments).
Do you accept that a "bayesian" method based on explicitly approximating ideal probability theory will always give a more accurate answer?
Only in the very weak sense that by the complete class theorem there exists a Bayesian method (or a limit of Bayesian methods) that does at least as well as whatever you're doing. So sure, if you really had infinite computational resources then you could find such a method and use it...but I think that has almost no bearing on practice. Certainly I think there are many situations where a prior is unavailable.
Do you accept that each of the examples above work because and to the extent that they (nonexplicitly) approximate the correct probability-theory answer (the bayes-structure argument)?
Almost certainly not, although maybe we should taboo "because". First of all, the "correct" probability-theory answer is not well-defined because the choice of both the prior and likelihood are both completely unconstrained. Secondly, I think the choice of whether to be Bayesian or frequentist is not nearly as important as e.g. the choice of likelihood function.
We can immediately see that building in the prior disallows aggregation of different information sources.
I don't think the prior is what allows aggregation of different information sources, you can do transfer learning with vanilla logistic regression if you choose the right set of features.
Only reporting the mode hides confidence interval and goes way off in the presence of skew.
I agree with this although "being Bayesian" is neither necessary nor sufficient to deal with this (but would probably help on average).
Bayesian logistic regression is easy and superior to max liklihood for most things.
What do you mean by "Bayesian logistic regression"?
Can you recommend an explanation of the complete class theorem(s)? Preferably online. I've been googling pretty hard and I've turned up almost nothing. I'd like to understand what conditions they start from (suspecting that maybe the result is not quite as strong as "Bayes Rules!"). I've found only one paper, which basically said "what Wald proved is extremely difficult to understand, and probably not what you wanted."
Thank you very much!
Question in title.
This is obviously subjective, but I figure there ought to be some "go-to" paper. Maybe I've even seen it once, but can't find it now and I don't know if there's anything better.
Links to multiple papers with different focus would be welcome. For my current purpose I have a preference for one that aims low and isn't too long.