wnoise comments on A probability question - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (27)
I think the first order of business is to straighten out the notation, and what is known.
Use these to assign P(Q | A,B,O,I).
I think that's a very bad word to use here. A,B are not independent, they're different. The trick is coming up with their joint distribution, so that you can evaluate P(Q | A,B,O,I).
If the correlation is small, your detectors suck. I doubt that's really what's happening. The usual situation is that both detectors actually have some correlation to Q, and thereby have some correlation to each other.
We need to identify some assumptions about the accuracy of A and B, and their joint distribution. A and B aren't just numbers, they're probability estimates. They were constructed so that they would be correlated with Q. How do we express P(QAB|O)? What information do we start with in this regard?
For a normal problem, you have some data {O_i} where you can evaluate P(A), your detector, versus Q and get the expectation of Q given A. Same for B.
The maximum entropy solution would proceed assuming that these statistics were the only information you had - or that you no longer had the data, but only had some subset of expectations evaluated in this fashion. I think Jaynes found the maximum entropy solution for two measurements which correlate to the same signal. I don't think he did it in a mixture of experts context, although the solution should be about the same.
If instead you have all the data, the problem is equally straightforward. Evaluate the expectation of Q given A,B across your data set, and apply on new data. Done. Yes, there's a regularization issue, but it's a 2-d -> 1-d supervised classification problem. If you're training A and B as well, do that in combination with this 2-d->1d problem as a stacked generalization problem, to avoid over fitting.
The issue is exactly what data are you working from. Can you evaluate A and B across all data, or do you just have statistics (or assumptions expressed as statistics) on A and B across the data?
This need not be the case. Consider a random variable Z that is the sum of two random independent variables X and Y. Expert A knows X, and is thus correlated with Z. Expert B knows Y and is thus correlated with Z. Expert A and B can still be uncorrelated. In fact, you can make X and Y slightly anticorrelated, and still have them both be positively correlated with Z.
Just consider the limiting case - both are perfect predictors of Q, with value 1 for Q, and value 0 for not Q. And therefore, perfectly correlated.
Consider small deviations from those perfect predictors. The correlation would still be large. Sometimes more, sometimes less, depending on the details of both predictors. Sometimes they will be more correlated with each other than with Q, sometimes more correlated with Q than each other. The degree of correlation with of A and B with Q will impose limits on the degree of correlation between A and B.
And of course, correlation isn't really the issue here anyway, much more like mutual information, with the same sort of triangle inequality limits to the mutual information.
If someone is feeling energetic and really wants to work this our, I'd recommend looking into triangle inequalities for mutual information measures, and the previously mentioned work by Jaynes on the maximum entropy estimate of a variable from it's known correlation with two other variables, and how that constrains the maximum entropy estimate of the correlation between the other two.