Vaniver comments on Stranger Than History - Less Wrong

52 Post author: Eliezer_Yudkowsky 01 September 2007 06:57PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (329)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: Vaniver 31 March 2014 07:15:37PM 2 points [-]

No, quite wrong:

Which part of my statements specifically are you claiming is wrong?

But if they believed that a candidate with a better score was (ceteris paribus) a better candidate, they would presumably have no problem with this.

I think you have the causation backwards here. Because they have a problem with this, they decide that the candidate with the better score is not a better candidate. If you would like to take a look at the tests yourself, they're here.

Comment author: hairyfigment 31 March 2014 07:49:12PM 1 point [-]

Which part of my statements specifically are you claiming is wrong?

First,

Take recent firefighting anti-discrimination court cases as an example. The legally approved way to conduct promotion testing is to pass over 90% of the people, and then randomly select from everyone who passed.

I don't know what you're talking about here, but I just quoted such a decision explicitly calling it illegal to use a particular test pass/fail. Because the court explicitly didn't trust the test.

It looks to me like you assume everyone does trust the test to do something other than hurt minorities. Otherwise you wouldn't need to speculate about motives. In general, if someone wants you to improve minority representation, you can assume they don't trust your personal judgment - and if you're using tests, they don't trust you to judge the value of the tests. Should they? Should we believe these written tests produce better firefighters, based on the available evidence?

Comment author: Lumifer 31 March 2014 07:56:08PM -1 points [-]

In general, if someone wants you to improve minority representation, you can assume they don't trust your personal judgment

I don't think this is true. The doctrine of disparate impact says that your personal judgement is irrelevant -- you MUST achieve something resembling proportionate representation regardless of anything (other than a demonstratable business need). It tests for outcomes, not intentions.

Comment author: hairyfigment 31 March 2014 08:08:13PM -1 points [-]

I mean they don't trust your personal judgment of what constitutes "demonstrable business need". Either that or they suspect you have conscious motives beyond business need.

Comment author: Lumifer 31 March 2014 08:22:05PM *  -2 points [-]

You are assuming there are no significant race- or sex-based differences.

For example, let's say I run a business and I like to hire smart people. Basically, I prefer high-IQ people to low-IQ people. Given that the average black IQ is about one standard deviation below the average white IQ which is lower than average East Asian IQ, I would end up with employing relatively more Asian and white people and relatively less black people.

This is very straightforward case of disparate impact. What is it about my personal judgement that "they" should not trust?

Comment author: hairyfigment 31 March 2014 08:25:28PM 1 point [-]

Are you being serious? Did you notice how you went from "business need" to "like to hire smart people" to "prefer high-IQ"?

Comment author: Lumifer 31 March 2014 08:50:14PM *  -2 points [-]

Are you being serious? Did you notice how you went from "business need" to "like to hire smart people" to "prefer high-IQ"?

Yes, I am. I do not have a legally demonstratable business need (that's why I said it's a straightforward case). It just happens that business runs better with smart people than with stupid people. Therefore I prefer to hire smart people and in this context "high-IQ" is a synonym of "smart".

The outcome is clearly illegal under the disparate impact doctrine.

I am not sure what your position is here. That my desire to hire smart people is mistaken? That my ability to identify smart people is not be trusted?

Comment author: hairyfigment 31 March 2014 10:34:20PM 0 points [-]

I don't have a clue who 'you' are. For the firefighting department we started with, I challenge both inferences. And I'm baffled at having to spell this out.

Comment author: Lumifer 01 April 2014 01:48:13AM 0 points [-]

I don't have a clue who 'you' are.

In this subthread "I" means a fictional business manager in a hypothetical situation. Specifically, that manager wants to hire smart people and runs head-first into a disparate impact case.

And I'm baffled at having to spell this out.

Perhaps you should consider that other people think differently than you and often start from different assumptions, too.

Comment author: Eugine_Nier 01 April 2014 01:32:39AM 2 points [-]

It might help to taboo what we mean by "business need". Does it mean, "probably won't go out of business next year if I don't do this", in that case it is likely that I don't have a "business need" not to hire completely unqualified people as long as the rest can fill up the slack.

On the other hand, if "business need" means "this will make my business run better", then as Lumifer pointed out, it just happens that business runs better with smart people than with stupid people.

Comment author: TheAncientGeek 01 April 2014 09:04:35AM *  -2 points [-]

Are "you" using race as a proxy for IQ, using actual IQ, or using evidence of domain relevant knowledge?

I notice that real world employers tend to emphasise the last. Rightly, because it avoids the Spolskyan problem of "smart, but doesn't get things done"

Comment author: Lumifer 01 April 2014 02:37:05PM *  0 points [-]

Are "you" using race as a proxy for IQ, using actual IQ, or using evidence of domain relevant knowledge?

(a) No; (b) Mostly; (c) Somewhat.

Domain knowledge functions as a hard cutoff at the lower end (if you need an accountant, you need someone who can do accounting) but the higher it is, the less important it becomes unless you're filling a position at the bleeding edge of a particular field.

Domain knowledge is also not the same thing as work habits, effectiveness, etc.

Comment author: TheAncientGeek 02 April 2014 09:46:19AM -1 points [-]

If you are not filling a position at the bleeding edge, you wouldn't need high domainknowledge. I don't see why you would need high IQ either.

Work habits, etc, can be judged by someone's ability to get things done, which can be judged from their resume as per standard recruitment procedures.

You seem to think IQ is a better indicator. Why?

Comment author: Lumifer 02 April 2014 04:46:55PM 1 point [-]

I don't see why you would need high IQ either.

Not necessarily high, but higher.

Basically, each job has an appropriate IQ range. It's better to pick people from the higher end of that range than from the lower end.

Work habits, etc, can be judged by someone's ability to get things done, which can be judged from their resume as per standard recruitment procedures.

No, I don't think you can effectively evaluate things like work habits on the basis of a "normal" resume. There is a reason people are hired after interviews and, sometimes, test periods and not just on the basis of their resumes.

You seem to think IQ is a better indicator. Why?

IQ is not a better indicator of work habits. However it is a good indicator of the contribution that a person can make to your organization. To make obvious observations, people with higher IQ work faster, make fewer mistakes, need less things explained to them, can handle the unexpected better, etc. etc.

Comment author: Eugine_Nier 03 April 2014 05:52:31AM 1 point [-]

Work habits, etc, can be judged by someone's ability to get things done,

These haven't been as extensively studied, but anecdotal evidence suggests these are also correlated with race. Furthermore, since judging these things is obviously going to be more subjective than looking at the results of a test, an employer relying on these is going to be even more open to accusations of racism.

Comment author: hairyfigment 02 April 2014 05:52:21AM 1 point [-]

OK, we disagree about motive. Did you notice you were objectively wrong about the reason you gave for your speculation? Or that I got downvoted after pointing this out?

Comment author: Vaniver 02 April 2014 04:57:30PM *  3 points [-]

Did you notice you were objectively wrong about the reason you gave for your speculation?

I'm still confused by this part. By 'legally approved', I'm referring to the state of things in, say, Chicago, and doing decisions by lottery is an easy way to satisfy both disparate impact and disparate treatment requirements.

By 'legally disapproved,' it sounds to me like the part you quoted is obvious that this is disapproved. But let's take a closer look at the actual decision (copied from a pdf, so there may be errors caused by my reformatting):

Before proceeding to the legal analysis, I offer a brief word about the Supreme Court’s recent decision in Ricci v. DeStefano, 129 S. Ct. 2658 (June 29, 2009). I reference Ricci not because the Supreme Court’s ruling controls the outcome in this case; to the contrary, I mention Ricci precisely to point out that it does not. In Ricci, the City of New Haven had set aside the results of a promotional examination, and the Supreme Court confronted the narrow issue of whether New Haven could defend a violation of Title VII’s disparate treatment provision by asserting that its challenged employment action was an attempt to comply with Title VII’s disparate impact provision. The Court held that such a defense is only available when “the employer can demonstrate a strong basis in evidence that, had it not taken the action, it would have been liable under the disparate-impact statute.” Id. at 2664. In contrast, this case presents the entirely separate question of whether Plaintiffs have shown that the City’s use of Exams 7029 and 2043 has actually had a disparate impact upon black and Hispanic applicants for positions as entry-level firefighters. Ricci did not confront that issue.

The Ricci Court concluded that New Haven would not likely have been liable under a disparate impact theory. See id. at 2681. In doing so, the Court relied on the various steps that New Haven took to validate its civil service examination. Id. at 2678-79. It is noteworthy, however, that in this case New York City has taken significantly fewer steps than New Haven took in validating its examination. The relevant teaching of Ricci, in this regard, is that the process of designing employment examinations is complex, requiring consultation with experts and careful consideration of accepted testing standards. As discussed below, these requirements are reflected in federal regulations and existing Second Circuit precedent. This legal authority sets forth a simple principle: municipalities must take adequate measures to ensure that their civil service examinations reliably test the relevant knowledge, skills and abilities that will determine which applicants will best perform their specific public duties.

In rendering this decision, I am aware that the use of multiple-choice examinations is typically intended to apply objective standards to employment decisions. Similarly, I recognize that it is natural to assume that the best performers on an employment test must be the best people for the job. But, the significance of these principles is undermined when an examination is not fair. As Congress recognized in enacting Title VII, when an employment test is not adequately related to the job for which it tests—and when the test adversely affects minority groups—we may not fall back on the notion that better test takers make better employees. The City asks the court to do just that. Regrettably, though, the City did not take sufficient measures to ensure that better performers on its examinations would actually be better firefighters. Accordingly, the court grants the Motions for Summary Judgment and finds that Plaintiffs have established disparate impact liability.

What does this say? In effect, that any test which has different score distributions for different races is guilty until proven innocent. They go on, in sections II and III, to discuss the numbers and conclusions of the calculations.

However, the general cognitive factor exists and differs by race, and will show up on almost any cognitive test. As a result, every test is guilty.* This is the reverse of good sense- the military has done copious research to show that, for every job, g is beneficial (see here for discussion, references to other research, and so on), and the only question is how beneficial.

*They imply that if the Ricci history had been different- that is, the city had promoted the white firefighters on the basis of a rank-ordered written test, and then the minority firefighters had sued on disparate impact grounds, the minority firefighters would have lost because the city had put in sufficient effort to validate the test- but that doesn't seem like the sort of thing that should be taken on faith. Indeed, one of the arguments in the decision,

In essence, the City asks the court to reject Plaintiffs’ statistical significance analysis because it improperly assumes “perfect parity” among groups of people (see Def. PF Mem. 1-3, 5-7)

is responded to by:

First of all, the court rejects the premise that comparison to a standard of equality among groups provides an improper foundation for statistical testing under Title VII. In order to determine whether a particular employment practice has had a disparate impact on a minority group, statistical tests “ask what the results would be for the salient variable . . . if there [had been] no discrimination.” Adams v. Ameritech Servs., Inc. , 231 F.3d 414, 424 (7th Cir. 2000) (emphasis added). To determine what results “would be,” statistical tests properly assume that racial or ethnic groups will perform equally well absent discrimination.

The only two possibilities the court considers is that either the minorities all got really unlucky on test day (stupendously unlikely, as they correctly calculate) or the city is discriminating against them; the possibility that they might not be as good at doing the job (and thus not as good at taking the test) is assumed to not be the case.

Comment author: hairyfigment 04 April 2014 08:48:59AM 1 point [-]

If the US Census Bureau has changed its hiring practices then I may be wrong. But after the initial ruling for Chicago and two rulings for NY, they were still ranking potential new-hires in every area by scores on a basic skills test. The Bureau tailored this test to the set of entry-level Census positions.

Now the last quote in the parent certainly looks disturbing. But that decision emphatically did not give a blanket endorsement of a cut-off followed by a lottery, because it found them liable for exactly that procedure. More specifically, it found them guilty of stupidity or deception for setting a "passing score" of 65 and then failing anyone who made less than 89.

Like every other source, the parent has the court say:

the City did not take sufficient measures to ensure that better performers on its examinations would actually be better firefighters.

It would appear that the court and the people who wrote the law do not share your view of this particular test's effectiveness. Perhaps you should try to convince them.

Comment author: Vaniver 04 April 2014 03:42:44PM 3 points [-]

If the US Census Bureau has changed its hiring practices then I may be wrong.

I am unfamiliar with how the Census Bureau hires; I was talking about the Chicago fire department, which I am fairly confident does use lotteries in its hiring and promotion decisions.

It would appear that the court and the people who wrote the law do not share your view of this particular test's effectiveness. Perhaps you should try to convince them.

If they won't listen to the psychometricians about g, why would I expect them to listen to me?

To clarify, the difference between my view and the court's view is that I assume that the universally replicated finding of intelligence differences between races will show up on basically any test, because that's what universally replicated means. Thus, unless the disparate impact is more than would be predicted by the relevant intelligence cutoff, then the burden to show disparate treatment should fall on those claiming discrimination.

The court's view is that if there is any statistically significant difference between races (which is more strict that the previous 4/5ths rule), the burden of demonstrating differences in racial intelligence and the relevance of intelligence to the job (combined, thankfully, into one 'validate the test for the particular job you're hiring for') falls on the maker of the test. But this falls on the maker of every test, making testing much more costly (and thus much less used) than it has it be, with the resulting efficiency losses. If you would like to use an extensively researched and validated IQ test for your narrow position (perhaps only one person will have this job at your company), that's not possible- you have to pay for experts to design a test for every position you would like to use a test for and validate that it works for that position, despite copious research demonstrating that a test that targets g specifically will be comparably effective to a specifically-designed test that targets performance on that job.

Comment author: EHeller 04 April 2014 04:23:03PM -1 points [-]

But this falls on the maker of every test, making testing much more costly (and thus much less used) than it has it be

So every job I've ever applied for required tests, and all of them looked more like general intelligence tests than specific (the standard brain teasers about buckets of water, geometry questions,etc all for statistical programming jobs). With the exception of one insurance company (who disguised their geometry questions as programming questions), none of these companies tried to pretend these were directly applicable to job performance. To my knowledge, none of these companies have been sued.

If anything, my experience is that testing is overused. A recent hire I wanted (who I've worked with before, and who is very competent at exactly what we need) was refused on the basis poor performance on two tests. I've consulted for several companies that have expressed that they hired me as a consultant because their HR's testing procedures have made staffing too inflexible.

Comment author: Nornagest 04 April 2014 04:36:20PM 1 point [-]

I'm fairly confident that you'd have an easier time in court of proving the relevance of g (or proxies for it) to statistical programming than to, say, firefighting.

Comment author: Vaniver 04 April 2014 05:43:35PM *  2 points [-]

So every job I've ever applied for required tests, and all of them looked more like general intelligence tests than specific (the standard brain teasers about buckets of water, geometry questions,etc all for statistical programming jobs).

So, a handful of brain teasers issued and interpreted by non-experts is surely inferior to an IQ test. So why don't we have nationally recognized agencies that administer IQ tests, that they then report to potential employers at your request, like the SAT and colleges?

(And it is unfortunate about that hire- organizations should make the most of local knowledge like that, but often fail to. Hiring people as consultants might be more efficient, though, especially if you know the person has the skills for the job you need done now but might not have the skills for the next job you need.)

Comment author: hairyfigment 05 April 2014 12:01:29AM *  0 points [-]

So you claim these courts (and lawmakers) all know this research on g, and you can't imagine any better way to present it?

Anyway, you said:

Take recent firefighting anti-discrimination court cases as an example. The legally approved way to conduct promotion testing is to pass over 90% of the people, and then randomly select from everyone who passed. The legally disapproved way is to test everyone, keep the scores as numbers, sort them, and promote from the top of the list going down.

This is false. The first is almost exactly what the Chicago fire department got slapped for doing, and the courts likewise said it would illegal for the NY department. The second is what the US Census Bureau did, and appears perfectly legal due to their test intuitively matching the jobs. This makes no mention of it, instead attacking the Bureau's use of a binary cut-off.

The court's explicit motive explains all this quite well. For pointing this out I lost around 50 karma.

Comment author: Vaniver 05 April 2014 12:47:06AM 4 points [-]

So you claim these courts (and lawmakers) all know this research on g, and you can't imagine any better way to present it?

I don't know what they know or don't know, and it's not clear to me that the presentation rather than the content of the research is the issue.

The first is almost exactly what the Chicago fire department got slapped for doing

All of the discrimination lawsuits I've seen for the Chicago fire department, the courts have decided in favor of the city, but I doubt I've seen all of them. Which case are you thinking of?

For pointing this out I lost around 50 karma.

I can't comment as to why others downvoted you; I did not. The primary thing I've noticed in discussing this issue with you is that you have several times declared a collection of claims false, which I would replace with putting forth specific contrasting claims. If you want to argue that promoting by lottery, after getting rid of some portion of the applicant pool by using a test, is legally disapproved, then make just that argument, and then we would discuss just that issue instead of having to figure out which issue we're discussing. If you want to argue that the burden of proof should be on the employer to validate any test which has different score or pass distributions for different groups, then say that clearly, and so on.

Comment author: hairyfigment 05 April 2014 12:54:00AM 0 points [-]
Comment author: Vaniver 05 April 2014 01:15:41AM *  3 points [-]

What?

In previous research I found this one, brought by white firefighters protesting the affirmative action policies in Chicago, and while I recall a second I'm on a different computer and so can't easily check my history.

But I don't think that case makes the point you want it to make. It does not disapprove of hiring by lottery- indeed, the remedy involves selecting which African Americans (but not white or other races!) who scored between 65 and 88 (who are still interested) will get the available jobs by lottery- they just think that the city did not put the passing score bar low enough, and the standard they used to determine what was "low enough" was the disparate impact standard, not any sort of job performance criterion.

[Edit]: I should clarify that, again, the court's decision is made with the presumption that tests are guilty until proven innocent, and so when the decision says "the test was biased" or "there was no evidence that the test was necessary," they do not mean that "there is evidence that the test was biased" or "there was evidence that the test was not necessary," they just mean "there was not sufficient presented evidence that the test was necessary."

Comment author: hairyfigment 08 April 2014 01:27:10AM 1 point [-]

I think I've disproven the factual basis you gave for your speculation: the real standard is orthogonal to your cutoff-with-lottery versus rank-by-test-scores.

And it's now 80 karma paperclips. Do you know how ridiculous this looks, how badly Less Wrong is breaking its own rules of conversation?