Stuart_Armstrong comments on Fairness in machine learning decisions - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (18)
Redlining seems to go beyond what's economically efficient, as far as I can tell (see wikipedia).
Er, that's precisely my point here. My idea is to have certain types of data explicitly permitted; in this case I set T to be income. The definition of "fairness" I was aiming for is that once that permitted data is taken into account, there should remain no further discrimination on the part of the algorithm.
This seems a much better idea that the paper's suggestion of just balancing total fairness (eg willingness to throw away all data that correlates) with accuracy in some undefined way.
I may have been unclear - if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you'd had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.
In other words, if the disallowed data has no predictive power when added to the allowed data, it's either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.
The main point of these ideas is to be able to demonstrate that a classifying algorithm - which is often nothing more than a messy black box - is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.