Lumifer comments on Fairness in machine learning decisions - Less Wrong

-2 Post author: Stuart_Armstrong 05 August 2016 09:56AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (18)

You are viewing a single comment's thread.

Comment author: Lumifer 05 August 2016 02:29:55PM 7 points [-]

I'm not sure of the point of all this. You're taking a well-defined statistical concept of independence and renaming it 'fairness' which is a very flexible and politically-charged word.

If there is no actual relationship between S and Y, you have no problem and a properly fit classifier will ignore S since it does not provide any useful information. If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

Comment author: Stuart_Armstrong 09 August 2016 01:50:39PM -2 points [-]

I'm reusing the term from the paper, and trying to improve on it (as fairness in machine learning is relatively hot at the moment).

If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

That's what the paper essentially does, and that's what I think is wrong. Race and income are correlated; being ignorant of race means being at least partially ignorant of income. I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

Comment author: Lumifer 09 August 2016 04:50:17PM 4 points [-]

What are "allowable" variables and what makes one "allowable"?

I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

That's the same thing: if S (say, race) does not provide any useful information after controlling for X (say, income) then your classifier is going to "naturally" ignore it. If it doesn't, there is still useful information in S even after you took X into account.

This is all basic statistics, I still don't understand why there's a need to make certain variables (like race) special.

Comment author: Stuart_Armstrong 10 August 2016 07:27:12PM -2 points [-]

As I mentioned in another comment, the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased.

I still don't understand why there's a need to make certain variables (like race) special.

a) Because many people treat variables like race as special, and there is social pressure and legislation about that. b) Because historically, people have treated variables like race as more relevant than it is economically efficient for them to do so. c) Because there are arguments (whose validity I don't know) that one should ignore variables like race even when it is individually economically efficient not to. eg cycles of poverty, following of social expectations, etc...

A perfect classifier would solve b), potentially a), and not c). But demonstrating that a classifier is perfect is hard; demonstrating that a classifier is is fair or unbiased in the way I define above is much easier.

What are "allowable" variables and what makes one "allowable"?

This is mainly a social, PR, or legal decision. "Bank assesses borrower's income" is not likely to cause any scandal; "Bank uses eye colour to vet candidates" is more likely to cause problems.

From the legal perspective, it's probably quite simple. "This bank discriminated against me!" Bank: "After controlling for income, capital, past defaults, X, Y, and Z, then our classifiers are free of any discrimination." Then whether they're allowable depends on whether juries or (mainly) judges believe that income, .... X, Y, and Z are valid criteria for reaching a non-discriminatory decision.

Now, for statisticians, if there are a lot of allowable criteria and if the classifier uses them in non-linear ways, this makes the fairness criteria pretty vacuous (since deducing S from many criteria should be pretty easy for non-linear classifiers). However, the perception of fairness is probably going to be what's important here.

Comment author: Lumifer 11 August 2016 03:00:09PM 3 points [-]

the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased

If you're looking to satisfy a legal criterion you need to talk to a lawyer who'll tell you how that works. Notably, the way the law works doesn't have to look reasonable or commonsensical. For example, EEOC likes to observe outcomes and cares little about the process which leads to what they think are biased outcomes.

Because many people treat variables like race as special ... social pressure ... more relevant than it is economically efficient for them to do so ...

Sure, but then you are leaving the realm of science (aka epistemic rationality). You can certainly build models to cater to fads and prejudices of today, but all you're doing is building deliberately inaccurate maps.

I am also not sure what's the deal with "economically efficient". No one said this is the pinnacle of all values and everything must be subservient to economic efficiency.

From the legal perspective, it's probably quite simple.

I am pretty sure you're mistaken about this.

the perception of fairness is probably going to be what's important here

LOL.

I think this is a fundamentally misguided exercise and, moreover, one which you cannot win -- in part because shitstorms don't care about details of classifiers.

Comment author: Stuart_Armstrong 11 August 2016 08:46:35PM -2 points [-]

Do you not feel my definition of fairness is a better one than the one proposed in the original paper?

Comment author: Lumifer 11 August 2016 08:51:42PM *  2 points [-]

I feel this all is a category error. You're trying to introduce terms from morality ('fairness') into statistics. That, I'm pretty sure, is a bad idea. And the word 'bias' already has a well-defined meaning in stats.

If you want to introduce moral judgement into your results, first construct a good map, and then adjust it according to taste. At least then you have a better chance of seeing the trade-offs you're making.