Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Causal graphs and counterfactuals

1 Stuart_Armstrong 30 August 2016 04:12PM

A problem that's come up with my definitions of stratification.

Consider a very simple causal graph:

In this setting, A and B are both booleans, and A=B with 75% probability (independently about whether A=0 or A=1).

I now want to compute the counterfactual: suppose I assume that B=0 when A=0. What would happen if A=1 instead?

The problem is that P(B|A) seems insufficient to solve this. Let's imagine the process that outputs B as a probabilistic mix of functions, that takes the value of A and outputs that of B. There are four natural functions here:

  • f0(x) = 0
  • f1(x) = 1
  • f2(x) = x
  • f3(x) = 1-x

Then one way of modelling the causal graph is as a mix 0.75f2 + 0.25f3. In that case, knowing that B=0 when A=0 implies that P(f2)=1, so if A=1, we know that B=1.

But we could instead model the causal graph as 0.5f2+0.25f1+0.25f0. In that case, knowing that B=0 when A=0 implies that P(f2)=2/3 and P(f0)=1/3. So if A=1, B=1 with probability 2/3 and B=1 with probability 1/3.

And we can design the node B, physically, to be one or another of the two distributions over functions or anything in between (the general formula is (0.5+x)f2 + x(f3)+(0.25-x)f1+(0.25-x)f0 for 0 ≤ x ≤ 0.25). But it seems that the causal graph does not capture that.

Owain Evans has said that Pearl has papers covering these kinds of situations, but I haven't been able to find them. Does anyone know any publications on the subject?

Corrigibility through stratified indifference

4 Stuart_Armstrong 19 August 2016 04:11PM

A putative new idea for AI control; index here.

Corrigibility through indifference has a few problems. One of them is that the AI is indifferent between the world in which humans change its utility to v, and world in which humans try to change its utility, but fail.

Now the try-but-fail world is going to be somewhat odd - humans will be reacting by trying to change the utility again, trying to shut the AI down, panicking that a tiny probability event has happened, and so on.

continue reading »
Comment author: Petter 15 August 2016 07:23:01AM 1 point [-]

Looks like a solid improvement over what’s being used in the paper. Does it introduce any new optimization difficulties?

Comment author: Stuart_Armstrong 15 August 2016 09:53:40AM 0 points [-]

I suspect it makes optimisation easier, because we don't need to compute a tradeoff. But that's just an informal impression.

Comment author: Lumifer 11 August 2016 03:00:09PM 2 points [-]

the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased

If you're looking to satisfy a legal criterion you need to talk to a lawyer who'll tell you how that works. Notably, the way the law works doesn't have to look reasonable or commonsensical. For example, EEOC likes to observe outcomes and cares little about the process which leads to what they think are biased outcomes.

Because many people treat variables like race as special ... social pressure ... more relevant than it is economically efficient for them to do so ...

Sure, but then you are leaving the realm of science (aka epistemic rationality). You can certainly build models to cater to fads and prejudices of today, but all you're doing is building deliberately inaccurate maps.

I am also not sure what's the deal with "economically efficient". No one said this is the pinnacle of all values and everything must be subservient to economic efficiency.

From the legal perspective, it's probably quite simple.

I am pretty sure you're mistaken about this.

the perception of fairness is probably going to be what's important here

LOL.

I think this is a fundamentally misguided exercise and, moreover, one which you cannot win -- in part because shitstorms don't care about details of classifiers.

Comment author: Stuart_Armstrong 11 August 2016 08:46:35PM 0 points [-]

Do you not feel my definition of fairness is a better one than the one proposed in the original paper?

Comment author: Lumifer 09 August 2016 04:50:17PM 2 points [-]

What are "allowable" variables and what makes one "allowable"?

I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

That's the same thing: if S (say, race) does not provide any useful information after controlling for X (say, income) then your classifier is going to "naturally" ignore it. If it doesn't, there is still useful information in S even after you took X into account.

This is all basic statistics, I still don't understand why there's a need to make certain variables (like race) special.

Comment author: Stuart_Armstrong 10 August 2016 07:27:12PM 0 points [-]

As I mentioned in another comment, the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased.

I still don't understand why there's a need to make certain variables (like race) special.

a) Because many people treat variables like race as special, and there is social pressure and legislation about that. b) Because historically, people have treated variables like race as more relevant than it is economically efficient for them to do so. c) Because there are arguments (whose validity I don't know) that one should ignore variables like race even when it is individually economically efficient not to. eg cycles of poverty, following of social expectations, etc...

A perfect classifier would solve b), potentially a), and not c). But demonstrating that a classifier is perfect is hard; demonstrating that a classifier is is fair or unbiased in the way I define above is much easier.

What are "allowable" variables and what makes one "allowable"?

This is mainly a social, PR, or legal decision. "Bank assesses borrower's income" is not likely to cause any scandal; "Bank uses eye colour to vet candidates" is more likely to cause problems.

From the legal perspective, it's probably quite simple. "This bank discriminated against me!" Bank: "After controlling for income, capital, past defaults, X, Y, and Z, then our classifiers are free of any discrimination." Then whether they're allowable depends on whether juries or (mainly) judges believe that income, .... X, Y, and Z are valid criteria for reaching a non-discriminatory decision.

Now, for statisticians, if there are a lot of allowable criteria and if the classifier uses them in non-linear ways, this makes the fairness criteria pretty vacuous (since deducing S from many criteria should be pretty easy for non-linear classifiers). However, the perception of fairness is probably going to be what's important here.

Comment author: Dagon 10 August 2016 06:44:02AM 1 point [-]

I may have been unclear - if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you'd had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.

In other words, if the disallowed data has no predictive power when added to the allowed data, it's either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.

Comment author: Stuart_Armstrong 10 August 2016 07:09:40PM 0 points [-]

The main point of these ideas is to be able to demonstrate that a classifying algorithm - which is often nothing more than a messy black box - is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.

Comment author: bogus 05 August 2016 09:20:17PM 3 points [-]

It's not clear to me how this "fairness" criteria is supposed to work. If you simply don't include S among the predictors, then for any given x in X, the classification of x will be 'independent' of S in that a counterfactual x' with the exact same features but different S would be classified the exact same way. OTOH if you're aiming to have Y be uncorrelated with S even without controlling for X, this essentially requires adding S as a 'predictor' too; e.g. consider the Simpson paradox. But this is a weird operationalization of 'fairness'.

Comment author: Stuart_Armstrong 09 August 2016 01:54:21PM -1 points [-]

in that a counterfactual x' with the exact same features but different S would be classified the exact same way.

Except that from the x, you can often deduce S. Suppose S is race (which seems to be what people care about in this situation) while X doesn't include race but does include, eg, race of parents.

And I'm not aiming for S uncorrelated with Y (that's what the paper's authors seem to want). I'm aiming for S uncorrelated with Y, once we take into account a small number of allowable variables T (eg income).

Comment author: Lumifer 05 August 2016 02:29:55PM 6 points [-]

I'm not sure of the point of all this. You're taking a well-defined statistical concept of independence and renaming it 'fairness' which is a very flexible and politically-charged word.

If there is no actual relationship between S and Y, you have no problem and a properly fit classifier will ignore S since it does not provide any useful information. If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

Comment author: Stuart_Armstrong 09 August 2016 01:50:39PM 0 points [-]

I'm reusing the term from the paper, and trying to improve on it (as fairness in machine learning is relatively hot at the moment).

If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

That's what the paper essentially does, and that's what I think is wrong. Race and income are correlated; being ignorant of race means being at least partially ignorant of income. I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

Comment author: Dagon 05 August 2016 06:02:26PM 2 points [-]

I think there's a fundamental goal conflict between "fairness" and precision. If the socially-unpopular feature is in fact predictive, then you either explicitly want a less-predictive algorithm, or you end up using other features that correlate with S strongly enough that you might as well just use S.

If you want to ensure a given distribution of S independent of classification, then include that in your prediction goals: have your cost function include a homogeneity penalty. Not that you're now pretty seriously tipping the scales against what you previously thought your classifier was predicting. Better and simpler to design and test the classifier in a straightforward way, but don't use it as the sole decision criteria.

Redlining (or more generally, deciding who gets credit) is a great example for this. If you want accurate risk assessment, you must take into account data (income, savings, industry/job stability, other kinds of debt, etc.) that correlates with ethnic averages. The problem is not that the risk classifiers are wrong, the problem is that correct risk assessments lead to unpleasant loan distributions. And the sane solution is to explicitly subsidize the risks you want to encourage for social reasons, not to lie about the risk by throwing away data.

Comment author: Stuart_Armstrong 09 August 2016 01:32:46PM -1 points [-]

Redlining seems to go beyond what's economically efficient, as far as I can tell (see wikipedia).

Redlining (or more generally, deciding who gets credit) is a great example for this. If you want accurate risk assessment, you must take into account data (income, savings, industry/job stability, other kinds of debt, etc.) that correlates with ethnic averages.

Er, that's precisely my point here. My idea is to have certain types of data explicitly permitted; in this case I set T to be income. The definition of "fairness" I was aiming for is that once that permitted data is taken into account, there should remain no further discrimination on the part of the algorithm.

This seems a much better idea that the paper's suggestion of just balancing total fairness (eg willingness to throw away all data that correlates) with accuracy in some undefined way.

Fairness in machine learning decisions

2 Stuart_Armstrong 05 August 2016 09:56AM

There's been some recent work on ensuring fairness in automated decision making, especially around sensitive areas such as racial groups. The paper "Censoring Representations with an Adversary" looks at one way of doing this.

It looks at a binary classification task where X ⊂ Rn and Y = {0, 1} is the (output) label set. There is also S = {0, 1} which is a protected variable label set. The definition of fairness is that, if η : X → Y is your classifier, then η should be independent of S. Specifically:

  • P(η(X)=1|S=1) = P(η(X)=1|S=0)

There is a measure of discrimination, which is the extent to which the classifier violates that fairness assumption. The paper then suggests to tradeoff optimise the difference between discrimination and classification accuracy.

But this is problematic, because it risks throwing away highly relevant information. Consider redlining, the practice of denying services to residents of certain areas based on the racial or ethnic makeups of those areas. This is the kind of practice we want to avoid. However, generally the residents of these areas will be poorer than the average population. So if Y is approval for mortgages or certain financial services, a fair algorithm would essentially be required to reach a decision that ignores this income gap.

And it doesn't seem the tradeoff with accuracy is a good way of compensating for this. Instead, a better idea would be to specifically allow certain variables to be considered. For example, let T be another variable (say, income) that we want to allow. Then fairness would be defined as:

  • ∀t, P(η(X)=1|S=1, T=t) = P(η(X)=1|S=0, T=t)

What this means is that T can distinguish between S=0 and S=1, but, once we know the value of T, we can't deduce anything further about S from η. For instance, once the bank knows your income, it should be blind to other factors.

Of course, with enough T variables, S can be determined with precision. So each T variable should be fully justified, and in general, it must not be easy to establish the value of S via T.

View more: Next