Suppose you have a property Q which certain objects may or may not have. You've seen many of these objects; you know the prior probability P(Q) that an object has this property.
You have 2 independent measurements of object O, which each assign a probability that Q(O) (O has property Q). Call these two independent probabilities A and B.
What is P(Q(O) | A, B, P(Q))?
To put it another way, expert A has opinion O(A) = A, which asserts P(Q(O)) = A = .7, and expert B says P(Q(O)) = B = .8, and the prior P(Q) = .4, so what is P(Q(O))? The correlation between the opinions of the experts is unknown, but probably small. (They aren't human experts.) I face this problem all the time at work.
You can see that the problem isn't solvable without the prior P(Q), because if the prior P(Q) = .9, then two experts assigning P(Q(O)) < .9 should result in a probability lower than the lowest opinion of those experts. But if P(Q) = .1, then the same estimates by the two experts should result in a probability higher than either of their estimates. But is it solvable or at least well-defined even with the prior?
The experts both know the prior, so if you just had expert A saying P(Q(O)) = .7, the answer must be .7 . Expert B's opinion B must revise the probability upwards if B > P(Q), and downwards if B < P(Q).
When expert A says O(A) = A, she probably means, "If I consider all the n objects I've seen that looked like this one, nA of them had property Q."
One approach is to add up the bits of information each expert gives, with positive bits for indications that Q(O) and negative bits that not(Q(O)).
If you have a lot of experts and a lot of objects, I might try a generative model where each object had unseen values from an n-dimensional feature space, and where experts decided what features to notice using weightings from a dual n-dimensional space, with the weight covectors generated as clustered in some way to represent the experts' structured non-independence. The experts' probability estimates would be something like a logistic function of the product of each object's features with the expert's weights (plus noise), and your output summary probability would be the posterior mean of an estimate based on a special "best" expert weighting, derived using the assumption that the experts' estimates are well-calibrated.
I'm not sure what an appropriate generative model of clustered expert feature weightings would be.
Actually, I guess the output of this procedure would just end up being a log-linear model of the truth given the experts' confidences. (Some of the coefficients might be negative, to cancel confounding factors.) So maybe a lot easier way to fix this is to sample from the space of such log-linear models directly, using sampled hypothetical imputed truths, while enforcing some constraint that the experts' opinions be reasonably well-calibrated.
I ignored this because I wasn't sure what you could have meant by "independent". If you meant that the experts' outputs are fully independent, conditional on the truth, then the problem is straightforward. But this seems unlikely in practice. You probably just meant the informal English connotation "not completely dependent".