Comment author: TheAncientGeek 16 September 2016 03:25:22PM *  1 point [-]

Are you saying the AI will rewrite its goals to make them easier, or will just not be motivated to fill in missing info?

In the first case, why wont it go the whole hog and wirehead? Which is to say, that any AI which is does anything except wireheading will be resistant to that behaviour -- it is something that needs to be solved, and which we can assume has been solved in a sensible AI design.

When we programmed it to "create chocolate bars, here's an incomplete definition D", what we really did was program it to find the easiest thing to create that is compatible with D, and designate them "chocolate bars".

If you programme it with incomplete info, and without any goal to fill in the gaps, then it will have the behaviour you mention...but I'm not seeing the generality. There are many other ways to programme it.

"if the AI is so smart, why would it do stuff we didn't mean?" and "why don't we just make it understand natural language and give it instructions in English?"

An AI that was programmed to attempt to fill in gaps in knowledge it detected, halt if it found conflicts, etc would not behave they way you describe. Consider the objection as actually saying:

"Why has the AI been programmed so as to have selective areas of ignorance and stupidity, which are immune from the learning abilities it displays elsewhere?"

PS This has been discussed before, see

http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/

and

http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/

see particularly

http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/ccpn

Comment author: Stuart_Armstrong 19 September 2016 10:59:28AM 1 point [-]

An AI that was programmed to attempt to fill in gaps in knowledge it detected, halt if it found conflicts, etc would not behave they way you describe.

We don't know how to program a foolproof method of "filling in the gaps" (and a lot of "filling in the gaps" would be a creative process rather that a mere learning one, such as figuring out how to extend natural language concepts to new areas).

And it helps it people speak about this problem in terms of coding, rather than high level concepts, because all the specific examples people have ever come up with for coding learning, have had these kind of flaws. Learning natural language is not some sort of natural category.

Coding learning with some imperfections might be ok if the AI is motivated to merely learn, but is positively pernicious if the AI has other motivations as to what to do with that learning (see my post here for a way of getting around it: https://agentfoundations.org/item?id=947 )

Comment author: jazzkingrt 16 September 2016 05:15:00PM *  0 points [-]

I don't think this problem is very hard to resolve. If an AI is programmed to make sense of natural-language concepts like "chocolate bar", there should be a mechanism to acquire a best-effort understanding. So you could rewrite the motivation as:

"create things which the maximum amount of people understand to be a chocolate bar"

or alternatively:

"create things which the programmer is most likely to have understood to be a chocolate bar".

Comment author: Stuart_Armstrong 19 September 2016 10:52:50AM 1 point [-]

That's just rephrasing one natural language requirement in terms of another. Unless these concepts can be phrased other than in natural language (but then those other phrasings may be susceptible to manipulation).

Comment author: Petter 15 August 2016 07:23:01AM 0 points [-]

Looks like a solid improvement over what’s being used in the paper. Does it introduce any new optimization difficulties?

Comment author: Stuart_Armstrong 15 August 2016 09:53:40AM -1 points [-]

I suspect it makes optimisation easier, because we don't need to compute a tradeoff. But that's just an informal impression.

Comment author: Lumifer 11 August 2016 03:00:09PM 3 points [-]

the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased

If you're looking to satisfy a legal criterion you need to talk to a lawyer who'll tell you how that works. Notably, the way the law works doesn't have to look reasonable or commonsensical. For example, EEOC likes to observe outcomes and cares little about the process which leads to what they think are biased outcomes.

Because many people treat variables like race as special ... social pressure ... more relevant than it is economically efficient for them to do so ...

Sure, but then you are leaving the realm of science (aka epistemic rationality). You can certainly build models to cater to fads and prejudices of today, but all you're doing is building deliberately inaccurate maps.

I am also not sure what's the deal with "economically efficient". No one said this is the pinnacle of all values and everything must be subservient to economic efficiency.

From the legal perspective, it's probably quite simple.

I am pretty sure you're mistaken about this.

the perception of fairness is probably going to be what's important here

LOL.

I think this is a fundamentally misguided exercise and, moreover, one which you cannot win -- in part because shitstorms don't care about details of classifiers.

Comment author: Stuart_Armstrong 11 August 2016 08:46:35PM -2 points [-]

Do you not feel my definition of fairness is a better one than the one proposed in the original paper?

Comment author: Lumifer 09 August 2016 04:50:17PM 4 points [-]

What are "allowable" variables and what makes one "allowable"?

I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

That's the same thing: if S (say, race) does not provide any useful information after controlling for X (say, income) then your classifier is going to "naturally" ignore it. If it doesn't, there is still useful information in S even after you took X into account.

This is all basic statistics, I still don't understand why there's a need to make certain variables (like race) special.

Comment author: Stuart_Armstrong 10 August 2016 07:27:12PM -2 points [-]

As I mentioned in another comment, the main point of these ideas is to be able to demonstrate that a certain algorithm - which may be just a complicated messy black box - is not biased.

I still don't understand why there's a need to make certain variables (like race) special.

a) Because many people treat variables like race as special, and there is social pressure and legislation about that. b) Because historically, people have treated variables like race as more relevant than it is economically efficient for them to do so. c) Because there are arguments (whose validity I don't know) that one should ignore variables like race even when it is individually economically efficient not to. eg cycles of poverty, following of social expectations, etc...

A perfect classifier would solve b), potentially a), and not c). But demonstrating that a classifier is perfect is hard; demonstrating that a classifier is is fair or unbiased in the way I define above is much easier.

What are "allowable" variables and what makes one "allowable"?

This is mainly a social, PR, or legal decision. "Bank assesses borrower's income" is not likely to cause any scandal; "Bank uses eye colour to vet candidates" is more likely to cause problems.

From the legal perspective, it's probably quite simple. "This bank discriminated against me!" Bank: "After controlling for income, capital, past defaults, X, Y, and Z, then our classifiers are free of any discrimination." Then whether they're allowable depends on whether juries or (mainly) judges believe that income, .... X, Y, and Z are valid criteria for reaching a non-discriminatory decision.

Now, for statisticians, if there are a lot of allowable criteria and if the classifier uses them in non-linear ways, this makes the fairness criteria pretty vacuous (since deducing S from many criteria should be pretty easy for non-linear classifiers). However, the perception of fairness is probably going to be what's important here.

Comment author: Dagon 10 August 2016 06:44:02AM 2 points [-]

I may have been unclear - if you disallow some data, but allow a bunch of things that correlate with that disallowed data, your results are the same as if you'd had the data in the first place. You can (and, in a good algorithm, do) back into the disallowed data.

In other words, if the disallowed data has no predictive power when added to the allowed data, it's either truly irrelevant (unlikely in real-world scenarios) or already included in the allowed data, indirectly.

Comment author: Stuart_Armstrong 10 August 2016 07:09:40PM -1 points [-]

The main point of these ideas is to be able to demonstrate that a classifying algorithm - which is often nothing more than a messy black box - is not biased. This is often something companies want to demonstrate, and may become a legal requirement in some places. The above seems a reasonable definition of non-bias that could be used quite easily.

Comment author: bogus 05 August 2016 09:20:17PM 4 points [-]

It's not clear to me how this "fairness" criteria is supposed to work. If you simply don't include S among the predictors, then for any given x in X, the classification of x will be 'independent' of S in that a counterfactual x' with the exact same features but different S would be classified the exact same way. OTOH if you're aiming to have Y be uncorrelated with S even without controlling for X, this essentially requires adding S as a 'predictor' too; e.g. consider the Simpson paradox. But this is a weird operationalization of 'fairness'.

Comment author: Stuart_Armstrong 09 August 2016 01:54:21PM -2 points [-]

in that a counterfactual x' with the exact same features but different S would be classified the exact same way.

Except that from the x, you can often deduce S. Suppose S is race (which seems to be what people care about in this situation) while X doesn't include race but does include, eg, race of parents.

And I'm not aiming for S uncorrelated with Y (that's what the paper's authors seem to want). I'm aiming for S uncorrelated with Y, once we take into account a small number of allowable variables T (eg income).

Comment author: Lumifer 05 August 2016 02:29:55PM 7 points [-]

I'm not sure of the point of all this. You're taking a well-defined statistical concept of independence and renaming it 'fairness' which is a very flexible and politically-charged word.

If there is no actual relationship between S and Y, you have no problem and a properly fit classifier will ignore S since it does not provide any useful information. If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

Comment author: Stuart_Armstrong 09 August 2016 01:50:39PM -2 points [-]

I'm reusing the term from the paper, and trying to improve on it (as fairness in machine learning is relatively hot at the moment).

If the relationship between S and Y actually exists, are you going to define fairness as closing your eyes to this information?

That's what the paper essentially does, and that's what I think is wrong. Race and income are correlated; being ignorant of race means being at least partially ignorant of income. I'm aiming for something like "once you know income (and other allowable variables) then race should not affect the decision beyond that".

Comment author: Dagon 05 August 2016 06:02:26PM 2 points [-]

I think there's a fundamental goal conflict between "fairness" and precision. If the socially-unpopular feature is in fact predictive, then you either explicitly want a less-predictive algorithm, or you end up using other features that correlate with S strongly enough that you might as well just use S.

If you want to ensure a given distribution of S independent of classification, then include that in your prediction goals: have your cost function include a homogeneity penalty. Not that you're now pretty seriously tipping the scales against what you previously thought your classifier was predicting. Better and simpler to design and test the classifier in a straightforward way, but don't use it as the sole decision criteria.

Redlining (or more generally, deciding who gets credit) is a great example for this. If you want accurate risk assessment, you must take into account data (income, savings, industry/job stability, other kinds of debt, etc.) that correlates with ethnic averages. The problem is not that the risk classifiers are wrong, the problem is that correct risk assessments lead to unpleasant loan distributions. And the sane solution is to explicitly subsidize the risks you want to encourage for social reasons, not to lie about the risk by throwing away data.

Comment author: Stuart_Armstrong 09 August 2016 01:32:46PM -2 points [-]

Redlining seems to go beyond what's economically efficient, as far as I can tell (see wikipedia).

Redlining (or more generally, deciding who gets credit) is a great example for this. If you want accurate risk assessment, you must take into account data (income, savings, industry/job stability, other kinds of debt, etc.) that correlates with ethnic averages.

Er, that's precisely my point here. My idea is to have certain types of data explicitly permitted; in this case I set T to be income. The definition of "fairness" I was aiming for is that once that permitted data is taken into account, there should remain no further discrimination on the part of the algorithm.

This seems a much better idea that the paper's suggestion of just balancing total fairness (eg willingness to throw away all data that correlates) with accuracy in some undefined way.

Comment author: capybaralet 30 July 2016 01:52:00PM 0 points [-]

"So conservation of expected moral evidence is something that would be automatically true if morality were something real and objective, and is also a desiderata when constructing general moral systems in practice."

This seems to go against your pulsar example... I guess you mean something like: "if [values were] real, objective, and immutable"?

Comment author: Stuart_Armstrong 01 August 2016 09:10:26AM 0 points [-]

Sorry, I don't get your point. Could you develop it?

View more: Prev | Next