Suppose you have a property Q which certain objects may or may not have. You've seen many of these objects; you know the prior probability P(Q) that an object has this property.
You have 2 independent measurements of object O, which each assign a probability that Q(O) (O has property Q). Call these two independent probabilities A and B.
What is P(Q(O) | A, B, P(Q))?
To put it another way, expert A has opinion O(A) = A, which asserts P(Q(O)) = A = .7, and expert B says P(Q(O)) = B = .8, and the prior P(Q) = .4, so what is P(Q(O))? The correlation between the opinions of the experts is unknown, but probably small. (They aren't human experts.) I face this problem all the time at work.
You can see that the problem isn't solvable without the prior P(Q), because if the prior P(Q) = .9, then two experts assigning P(Q(O)) < .9 should result in a probability lower than the lowest opinion of those experts. But if P(Q) = .1, then the same estimates by the two experts should result in a probability higher than either of their estimates. But is it solvable or at least well-defined even with the prior?
The experts both know the prior, so if you just had expert A saying P(Q(O)) = .7, the answer must be .7 . Expert B's opinion B must revise the probability upwards if B > P(Q), and downwards if B < P(Q).
When expert A says O(A) = A, she probably means, "If I consider all the n objects I've seen that looked like this one, nA of them had property Q."
One approach is to add up the bits of information each expert gives, with positive bits for indications that Q(O) and negative bits that not(Q(O)).
I think the first order of business is to straighten out the notation, and what is known.
Use these to assign P(Q | A,B,O,I).
I think that's a very bad word to use here. A,B are not independent, they're different. The trick is coming up with their joint distribution, so that you can evaluate P(Q | A,B,O,I).
If the correlation is small, your detectors suck. I doubt that's really what's happening. The usual situation is that both detectors actually have some correlation to Q, and thereby have some correlation to each other.
We need to identify some assumptions about the accuracy of A and B, and their joint distribution. A and B aren't just numbers, they're probability estimates. They were constructed so that they would be correlated with Q. How do we express P(QAB|O)? What information do we start with in this regard?
For a normal problem, you have some data {O_i} where you can evaluate P(A), your detector, versus Q and get the expectation of Q given A. Same for B.
The maximum entropy solution would proceed assuming that these statistics were the only information you had - or that you no longer had the data, but only had some subset of expectations evaluated in this fashion. I think Jaynes found the maximum entropy solution for two measurements which correlate to the same signal. I don't think he did it in a mixture of experts context, although the solution should be about the same.
If instead you have all the data, the problem is equally straightforward. Evaluate the expectation of Q given A,B across your data set, and apply on new data. Done. Yes, there's a regularization issue, but it's a 2-d -> 1-d supervised classification problem. If you're training A and B as well, do that in combination with this 2-d->1d problem as a stacked generalization problem, to avoid over fitting.
The issue is exactly what data are you working from. Can you evaluate A and B across all data, or do you just have statistics (or assumptions expressed as statistics) on A and B across the data?
The way I interpreted the claim of independence is that the verdicts of the experts are not correlated once you conditionalize on Q. If that is the case, then DanielLC's procedure gives the correct answer.
To see this more explicitly, suppose that expert A's verdict is based on evidence Ea and expert B's verdict is based on evidence Eb. The... (read more)