It must have a subsequence S1 which converges for the first sentence (because the interval [0,1] is compact). This subsequence must itself have a subsequence S2 which converges in the second sentence, which must have a subsequence S3 which converges in the third sentence and so on.
The subsequence we want takes the first entry of S1, then the second entry of S2, then the third entry of S3, and so on. For every n, after the nth entry, this is a subsequence of S_n, so the probabilities of the nth sentence must converge.
Note, that all the probabilities converge, but they do not converge uniformly. At any given time, there will be some probabilities that are still way off. This is a common analysis trick. Converging simultaneously on countably many axes is no harder that converging simultaneously on finitely many axes. Let me know if I should clarify further.
Thanks! That's a good trick, I didn't know it.
Reading further, it seems like your definition of P3 in terms of P1 and P2 is indeterminate when P1(phi)=0 and P2(phi)=1. I assume this hole can be patched. (ETA: I'm being stupid, this can't happen if P1 and P2 both maximize WCB, because we can assign a truth value to phi that will make either P1 or P2 have negative infinity Bayes score.)
Otherwise the proof seems fine at first glance. Great work! This is exactly the kind of stuff I want to see on LW.
In this post, I propose an answer to the following question:
Given a consistent but incomplete theory, how should one choose a random model of that theory?
My proposal is rather simple. Just assign probabilities to sentences in such that if an adversary were to choose a model, your Worst Case Bayes Score is maximized. This assignment of probabilities represents a probability distribution on models, and choose randomly from this distribution. However, it will take some work to show that what I just described even makes sense. We need to show that Worst Case Bayes Score can be maximized, that such a maximum is unique, and that this assignment of probabilities to sentences represents an actual probability distribution. This post gives the necessary definitions, and proves these three facts.
Finally, I will show that any given probability assignment is coherent if and only if it is impossible to change the probability assignment in a way that simultaneously improves the Bayes Score by an amount bounded away from 0 in all models. This is nice because it gives us a measure of how far a probability assignment is from being coherent. Namely, we can define the "incoherence" of a probability assignment to be the supremum amount by which you can simultaneously improve the Bayes Score in all models. This could be a useful notion since we usually cannot compute a coherent probability assignment so in practice we need to work with incoherent probability assignments which approach a coherent one.
I wrote up all the definitions and proofs on my blog, and I do not want to go through the work of translating all of the latex code over here, so you will have to read the rest of the post there. Sorry. In case you do not care enough about this to read the formal definitions, let me just say that my definition of the "Bayes Score" of a probability assignment P with respect to a model M is the sum over all true sentences s of m(s)log(P(s)) plus the sum over all false sentences s of m(s)log(1-P(s)), where m is some fixed nowhere zero probability measure on all sentences. (e.g. m(s) is 1/2 to the number of bits needed to encode s)
I would be very grateful if anyone can come up with a proof that this probability distribution which maximizes Worst Case Bayes Score has the property that its Bayes Score is independent of the choice of what model we use to judge it. I believe it is true, but have not yet found a proof.