Miller comments on Statistical Prediction Rules Out-Perform Expert Human Judgments - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (195)
You just presumed away my argument. I claim specifically that the relationship between various classes of errors is not well-defined. This can lead to abuse of the term 'better'.
Please tell me why I should take that as a presumption.
Because those are the class of problems this post discusses.
From the top of the post:
I think this is the kind of question that Miller is talking about. Just because a system is correct more often, doesn't necessarily mean it's better.
For example if the human experts allowed more people out who went on to commit relatively minor violent offences and the SPRs do this less often, but are more likely to release prisoners who go on to commit murder then there would be legitimate discussion over whether the SPR is actually better.
I think this is exactly what he is talking about when he says
Whether or not there is evidence that says this is a real effect I don't know, but to address it what you really need to measure is total utility of outcomes rather than accuracy.
Yes. You got it, exactly.
No. I'm talking about classes of errors.
As in, which is better?
The cost of fp vs. fn is not defined automatically. If humans are closer to #1 than #2, and I develop a system like #2, I might define #2 to be better. Then later on down the line I stop talking about how I defined better, and I just use the word better, and no one questions it because hey... better is better, right?
Which is more costly, false positives or false negatives? This is an easy question to answer.
If false positives, #1 is better. If false negatives, #2. I really do not see what your point is. These problems you bring up are easily solved.
Which is better: Releasing a violent prisoner, or keeping a harmless one incarcerated? If you can find an answer that 90% of the population agrees on, then I think you've done better than every politician in history.
That people do NOT agree suggest to me that it's hardly a trivial question...
Yes. Thank you. Since at least one person understood me, I'm gonna jump off the merry-go-round at this point.
(For reference, I realize an expert runs in to the same issue, I just think it's unfair to say that the issue is "easily solved")
How violent, how preventably violent, how harmless, how incarcerated, how long incarcerated? For any specific case with these agreed-upon, I am confident a supermajority would agree.
That people don't agree suggests one side is comparing releasing a serial killer to incarcerating a drifter in jail a short while, and the other side is comparing releasing a middle-aged man who in a fit of passion struck his adulterous wife to incarcerating Ghandi for the term of his natural life. More generally, they are deciding based on one specific example they have strongly available to them.
In the state you phrased it, that question is about as answerable as "how long is a piece of string?".
Many tests have a continuous, adjustable parameter for sensitivity, letting you set the trade-off however you want. In that case, we can refrain from judging the relative badness of false positives and false negatives, and use ROCA, which is basically the integral over all such trade-offs. Tests that are going to be combined into a larger predictor are usually measured this way.
Machine learning packages generally let you specify a "cost matrix", which is the cost of each possible confusion. For a 2-valued test, it would be a 2x2 matrix with zeroes on the diagonal, and the cost of A->B and B->A errors in the other two spots. For a test with N possible results, the matrix is NxN, with zeroes on the diagonals, and each (row,col) position is the cost of a mistake that confuses the result corresponding to that row with the result corresponding to that column.