(Rewritten entirely after seeing pragmatist's answer.)
In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren't independent. (DanielLC's method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I'd expected to be correct.)
I ran into problems applying this, because sometimes the prior isn't constant across samples. Right now I'm combining different sources of information to choose the correct transcription start site for a gene. These bacterial genes typically have from 1 to 20 possible start sites. The prior is 1 / (number of possible sites).
Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q. Assume this multiplier, lm, is constant, regardless of the prior. This is reasonable, since we always factor out the prior. Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s). That's P(Q(s) | prior=1/numStarts, O(s)).
Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.
9:1 / 1:3 = 27:1
Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:
19:1 / 1:1 = 19:1
I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts. I think I can just compute it as
(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where
numerator = (numStarts_i-1) * Q(s_i)
denominator = (1-Q(s_i))
Is this correct?
Here s means one start. It's an atom in these equations; it doesn't have a domain. Q(s) is 1 if that start is the true start, 0 if it is not. O(s) is 1 if that start overlaps a start, and 0 if it does not.
So, if there are two start sites, you'll have Q(1) and Q(2) and those will add up to 1? Or are Q(1) and Q(2) decoupled (they could both be start sites, or neither)? O(1) would then be the likelihood ratio that the start is at 1 rather than 2?
Part of my confusion is that you state the priors are 1/numStarts, but then you talk about aggregating experiments where numStarts has different values. This could cause major problems- if you thought that it was either 1 or 2 (that is, putting a prior of 0 on 3 and 4), but measured 1, 2, 3, and 4, then you won't be a... (read more)