(Rewritten entirely after seeing pragmatist's answer.)
In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren't independent. (DanielLC's method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I'd expected to be correct.)
I ran into problems applying this, because sometimes the prior isn't constant across samples. Right now I'm combining different sources of information to choose the correct transcription start site for a gene. These bacterial genes typically have from 1 to 20 possible start sites. The prior is 1 / (number of possible sites).
Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q. Assume this multiplier, lm, is constant, regardless of the prior. This is reasonable, since we always factor out the prior. Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s). That's P(Q(s) | prior=1/numStarts, O(s)).
Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.
9:1 / 1:3 = 27:1
Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:
19:1 / 1:1 = 19:1
I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts. I think I can just compute it as
(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where
numerator = (numStarts_i-1) * Q(s_i)
denominator = (1-Q(s_i))
Is this correct?
If the sample data that you're using to estimate P(Q(s) | numStarts=4, O(s)) = 0.9 has the same P(numStarts) distribution as the real data you're going to run this over, then you don't need to do anything special; just estimate P(Q(s) | O(s)) directly, caring not about numStarts, and go from there. Since you're not doing that, I assume your sample data and real data have different numStarts distributions.
Here is the information I assume you have to work with. Call numStarts=S. P(Q | S=k) = 1/k P(S) in real data Pd(Q | O,S) from an expert classifier over sample data d, with Pd(S) instead of P(S), and in particular your data d is basically a list of judgments (Q,O,S) that I can aggregate however I choose. P(Q | N) from another bunch of expert classifiers independent of the first
What you'd like is to be able to compute P(Q | N,O) on real data. And to make it nice, do that by P(Q | N,O) = 1 - 1 / (1 + Odds(Q | N,O) with Odds(Q | N,O) = Odds(Q) L(N|Q) L(O|Q)
You already know how to find Odds(Q) and L(N|Q). The question is how to find L(O|Q) on real data given that you have Pd(Q | O,S) rather than P(Q | O,S), the expert's judgment on sample data d rather than real data. The answer as far as I can tell, unless I've missed part of your question or assumptions, is as follows:
L(O|Q) = sum(P(O|Q,S) P(S)) / sum(P(O|~Q,S) P(S))
[note that P(O|Q,S) remains the same across samples]
P(O|Q,S) = P(Q,O|S) / P(Q|S), so (with C=Count)
P(O|Q,S) = C(Q,O,S)/C(S) / (1/k) = k C(Q,O,S)/C(S) and P(O|~Q,S) = (k/(k-1)) C(~Q,O,S)/C(S)
thus
L(O|Q) = sum(k C(Q,O,S=k)/C(S=k) P(S=k)) / sum((k/(k-1)) C(~Q,O,S=k)/C(S=k) P(S=k))
so to calculate L(O|Q) on your real data, first note P(S=k) on your real data, then on your sample data d say
L(O|Q) = L[numerator]/L[denominator]
You have to bin your training data, you don't have to bin your test data.
Edit: I found and fixed a couple of errors so there are probably more. Think, debug, and test for yourself as usual. :D