Just to confirm: Writing , the probability of at time , as (here is the sigma-algebra at time ), we see that must be a martingale via the tower rule.
The log-odds are not martingales unless because Itô gives us
So unless is continuous and of bounded variation (⇒ , but this also implies that ; the integrand of the drift part only vanishes if for all ), the log-odds are not a martingale.
Interesting analysis on log-odds might still be possible (just use and for discrete-time/jump processes as we naturally get when working with real data), but it's not obvious to me if this comes with any advantages over just working with directly.
(Why) are you not happy with Velenik's answer or "a probabilistic theory tells us that if we look at an event and perform the same experiment times, then the fraction of experiments where happened approaches in a LLN-like manner"? Is there something special about physical phenomena as opposed to observables?
> can be written as the union of a meager set and a set of null measure. This result forces us to make a choice as to which class of sets we will neglect, or otherwise we will end up neglecting the whole space !
Either neither of these sets are measurable or this meagre set has measure 1. Either way, it seems obvious what to neglect.
I think this depends a lot on what you're interested in, i.e. what scoring rules you use. Someone who runs the same analysis with Brier instead of log-scores might disagree.
More generally, I'm not convinced it makes sense to think of "precision" as a constant, let alone a universal one, since it depends on
I don't think it's very counterintuitive/undesirable for (what, in practice, is essentially) noise to make worse-than-random forecasts better. As a matter of fact, this also happens if you replace log-scores with Brier in your analysis with random noise instead of rounding.
Also, regarding oscillations: I don't think properties of "precision" obtained from small datasets are too important, for similar reasons why I usually don't pay a lot of attention to calibration plots obtained from a handful of forecasts.
This conjecture is true and should easily generalise to more general 1-parameter families of centered, symmetric distributions admitting suitable couplings (e.g. additive N(0,\sigma^2) noise in log-odds space) using the fact that log(sigmoid(x+y))+log(sigmoid(x-y)) is decreasing in y for all log-odds x and all positive y (QED).
(NB: This fails when replacing log-scores with Brier.)
I could make a similar argument for the noise-based version, if I chose to use Brier (or any other scoring rule S that depends only on |p-outcome| and converges to finite values as p tends towards 0 and 1): With sufficiently strong noise, every forecast becomes ≈0% and ≈100% with equal probability, so the expected score in the "large noise limit" converges to (S(0, outcome) + S(1, outcome))/2.