You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

buybuydandavis comments on A probability question - Less Wrong Discussion

6 Post author: PhilGoetz 19 October 2012 10:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread.

Comment author: buybuydandavis 20 October 2012 02:01:34AM 4 points [-]

I think the first order of business is to straighten out the notation, and what is known.

  • A - measurement from algorithm A on object O
  • B - measurement from algorithm B on object O
  • P(Q|I) - The probability you assign to Q based on some unspecified information I.

Use these to assign P(Q | A,B,O,I).

You have 2 independent measurements of object O,

I think that's a very bad word to use here. A,B are not independent, they're different. The trick is coming up with their joint distribution, so that you can evaluate P(Q | A,B,O,I).

The correlation between the opinions of the experts is unknown, but probably small.

If the correlation is small, your detectors suck. I doubt that's really what's happening. The usual situation is that both detectors actually have some correlation to Q, and thereby have some correlation to each other.

We need to identify some assumptions about the accuracy of A and B, and their joint distribution. A and B aren't just numbers, they're probability estimates. They were constructed so that they would be correlated with Q. How do we express P(QAB|O)? What information do we start with in this regard?

For a normal problem, you have some data {O_i} where you can evaluate P(A), your detector, versus Q and get the expectation of Q given A. Same for B.

The maximum entropy solution would proceed assuming that these statistics were the only information you had - or that you no longer had the data, but only had some subset of expectations evaluated in this fashion. I think Jaynes found the maximum entropy solution for two measurements which correlate to the same signal. I don't think he did it in a mixture of experts context, although the solution should be about the same.

If instead you have all the data, the problem is equally straightforward. Evaluate the expectation of Q given A,B across your data set, and apply on new data. Done. Yes, there's a regularization issue, but it's a 2-d -> 1-d supervised classification problem. If you're training A and B as well, do that in combination with this 2-d->1d problem as a stacked generalization problem, to avoid over fitting.

The issue is exactly what data are you working from. Can you evaluate A and B across all data, or do you just have statistics (or assumptions expressed as statistics) on A and B across the data?

Comment author: pragmatist 20 October 2012 07:53:05AM *  3 points [-]

If the correlation is small, your detectors suck. I doubt that's really what's happening. The usual situation is that both detectors actually have some correlation to Q, and thereby have some correlation to each other.

The way I interpreted the claim of independence is that the verdicts of the experts are not correlated once you conditionalize on Q. If that is the case, then DanielLC's procedure gives the correct answer.

To see this more explicitly, suppose that expert A's verdict is based on evidence Ea and expert B's verdict is based on evidence Eb. The independence assumption is that P(Ea & Eb|Q) = P(Ea|Q) * P(Eb|Q).

Since we know the posteriors P(Q|Ea) and P(Q|Eb), and we know the prior of Q, we can calculate the likelihood ratios for Ea and Eb. The independence assumption allows us to multiply these likelihood ratios together to obtain a likelihood ratio for the combined evidence Ea & Eb. We then multiply this likelihood ratio with the prior odds to obtain the correct posterior odds.

Comment author: buybuydandavis 20 October 2012 08:58:00AM *  0 points [-]

To see this more explicitly, suppose that expert A's verdict is based on evidence Ea and expert B's verdict is based on evidence Eb. The independence assumption is that P(Ea & Eb|Q) = P(Ea|Q) * P(Eb|Q).

You can write that, and it's likely possible in some cases, but step back and think, Does this really make sense to say in the general case?

I just don't think so. The whole problem with mixture of experts, or combining multiple data sources, is that the marginals are not in general independent.

Comment author: pragmatist 20 October 2012 09:40:43AM 2 points [-]

Sure, it's not generically true, but PhilGoetz is thinking about a specific application in which he claims that it is justified to regard the expert estimates as independent (conditional on Q, of course). I don't know enough about the relevant domain to assess his claim, but I'm willing to take him at his word.

I was just responding to your claim that the detectors must suck if the correlation is small. That would be true if the unconditional correlation were small, but its not true if the correlation is small conditional on Q.

Comment author: wnoise 20 October 2012 05:44:45AM 2 points [-]

The usual situation is that both detectors actually have some correlation to Q, and thereby have some correlation to each other.

This need not be the case. Consider a random variable Z that is the sum of two random independent variables X and Y. Expert A knows X, and is thus correlated with Z. Expert B knows Y and is thus correlated with Z. Expert A and B can still be uncorrelated. In fact, you can make X and Y slightly anticorrelated, and still have them both be positively correlated with Z.

Comment author: buybuydandavis 20 October 2012 08:47:27AM 0 points [-]

Just consider the limiting case - both are perfect predictors of Q, with value 1 for Q, and value 0 for not Q. And therefore, perfectly correlated.

Consider small deviations from those perfect predictors. The correlation would still be large. Sometimes more, sometimes less, depending on the details of both predictors. Sometimes they will be more correlated with each other than with Q, sometimes more correlated with Q than each other. The degree of correlation with of A and B with Q will impose limits on the degree of correlation between A and B.

And of course, correlation isn't really the issue here anyway, much more like mutual information, with the same sort of triangle inequality limits to the mutual information.

If someone is feeling energetic and really wants to work this our, I'd recommend looking into triangle inequalities for mutual information measures, and the previously mentioned work by Jaynes on the maximum entropy estimate of a variable from it's known correlation with two other variables, and how that constrains the maximum entropy estimate of the correlation between the other two.