(Rewritten entirely after seeing pragmatist's answer.)
In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren't independent. (DanielLC's method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I'd expected to be correct.)
I ran into problems applying this, because sometimes the prior isn't constant across samples. Right now I'm combining different sources of information to choose the correct transcription start site for a gene. These bacterial genes typically have from 1 to 20 possible start sites. The prior is 1 / (number of possible sites).
Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q. Assume this multiplier, lm, is constant, regardless of the prior. This is reasonable, since we always factor out the prior. Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s). That's P(Q(s) | prior=1/numStarts, O(s)).
Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.
9:1 / 1:3 = 27:1
Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:
19:1 / 1:1 = 19:1
I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts. I think I can just compute it as
(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where
numerator = (numStarts_i-1) * Q(s_i)
denominator = (1-Q(s_i))
Is this correct?
That's the source of the difficulty. The priors are different for each sample.
It will confuse you to try to assign a number to s. Each 's' is one start site. It's an atom that you make propositions about. It doesn't have content. Biologically, it has a position on the genome, which I use to compute the values of propositions about it.
This is my current model of your problem:
You have a set S of start sites, each of which we can make propositions about. Each one of them has some position on the genome.
You're interested in looking at each of the start sites and assessing some property- "does this start site overlap the previous gene's stop site?" If that's true for the particular start site s, we say Q(s)=1; otherwise, Q(s)=0 (using 0 and 1 as synonymous with true and false). This is unknown, so we refer to our uncertainty as P(Q(s)), which might starts off as 1/S for all s, or ... (read more)