Suppose you have a property Q which certain objects may or may not have. You've seen many of these objects; you know the prior probability P(Q) that an object has this property.
You have 2 independent measurements of object O, which each assign a probability that Q(O) (O has property Q). Call these two independent probabilities A and B.
What is P(Q(O) | A, B, P(Q))?
To put it another way, expert A has opinion O(A) = A, which asserts P(Q(O)) = A = .7, and expert B says P(Q(O)) = B = .8, and the prior P(Q) = .4, so what is P(Q(O))? The correlation between the opinions of the experts is unknown, but probably small. (They aren't human experts.) I face this problem all the time at work.
You can see that the problem isn't solvable without the prior P(Q), because if the prior P(Q) = .9, then two experts assigning P(Q(O)) < .9 should result in a probability lower than the lowest opinion of those experts. But if P(Q) = .1, then the same estimates by the two experts should result in a probability higher than either of their estimates. But is it solvable or at least well-defined even with the prior?
The experts both know the prior, so if you just had expert A saying P(Q(O)) = .7, the answer must be .7 . Expert B's opinion B must revise the probability upwards if B > P(Q), and downwards if B < P(Q).
When expert A says O(A) = A, she probably means, "If I consider all the n objects I've seen that looked like this one, nA of them had property Q."
One approach is to add up the bits of information each expert gives, with positive bits for indications that Q(O) and negative bits that not(Q(O)).
I have no additional information. This is the general case that I need to solve. This is the information that I have, and I need to make a decision.
(The real-world problem is that I have a zillion classifiers, that give probability estimates for dozens of different things, and I have to combine their outputs for each of these dozens of things. I don't have time to look inside any of them and ask for more details. I need a function that takes as an argument one prior and N estimates, assumes the estimates are independent, and produces an output. I usually can't find their correlations due to the training data not being available or other problems, and anyway I don't have time to write the code to do that, and they're usually probably small correlations.)
There are unsupervised methods, if you have unlabeled data, which I suspect you do. I don't know about standard methods, but here are a few simple ideas off the top of my head:
First, you can check if A is consistent with the prior by seeing that average probability it predicts over your data is your prior for Q. If not, there are a lot of possible failure modes, such as your new data being different from the data used to set your prior, or A being wrong or miscalibrated. If I trusted the prior a lot and wanted to fix the problem, I would scale the evidence... (read more)