You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Decius comments on Problems in Education - Less Wrong Discussion

65 Post author: ThinkOfTheChildren 08 April 2013 09:29PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (318)

You are viewing a single comment's thread. Show more comments above.

Comment author: Decius 11 April 2013 04:05:14AM *  1 point [-]

See also: Pygmalion in the Classroom.

Simply put, when teachers expect students to do well and show intellectual growth, they do; when teachers do not have such expectations, performance and growth are not so encouraged and may in fact be discouraged in a variety of ways

It's entirely reasonable that teacher's ratings of children's academic abilities &tc cause future achievement.

Comment author: gwern 11 April 2013 04:50:37AM 3 points [-]

It's entirely reasonable that teacher's ratings of children's academic abilities &tc cause future achievement.

No, it's not. Did you read Carl's comment in this same thread?

Comment author: Decius 11 April 2013 04:58:46AM *  2 points [-]

It's been demonstrated by controlled research that students who have teachers who expect them to perform better than their peers do, even when the expectations of the teachers are not founded on fact.

In the Oak School experiment discussed in this book teachers were led to believe that certain students, selected at random, were likely to be showing signs of a spurt in intellectual growth and development. The results were startling. At the end of the year, the students of whom the teaches had these expectations showed significantly greater gains in intellectual growth than did those in the control group.

Comment author: CarlShulman 11 April 2013 05:46:19AM *  6 points [-]

The Jussim et al review of that literature is worth reading. Expectations do seem to have causal impact, but the effect is usually small relative to measures of past performance and ability, and teacher expectations tend to reflect past performance more.

The review covers some serious challenges to the effect sizes claimed by Rosenthal and coauthors, such as effect sizes declining with sample size and publication bias. Or, regarding the original Pygmalion/Oak School experiment:

Snow (1995) also pointed out that the intelligence test used in Pygmalion was only normed for scores between 60 and 160. If one excluded all scores outside this range, the expectancy effect disappeared. Moreover, there were five "bloomers" with wild IQ score gains: 17-110, 18-122, 133-202, 111-208, and 113-211. If one simply excluded these five bizarre gains, the difference between the bloomers and the controls evaporated.

As an aside, Rosenthal pioneered meta-analysis in psychology because the effect only replicated a third of the time in the published literature (despite the presence of publication bias and QRPs). In doing so he promulgated a test for publication bias which implicitly assumed the absence of any publication bias, and so almost always output the conclusion that no publication bias was present. These methods were eagerly adopted by the parapsychology community, as the same methodology that appeared to show strong expectancy effects also appeared to show ESP in the ganzfeld psychic experiment, as Rosenthal (1986) agreed.

Since I think that the ESP literature reflects the scale of apparent effect that can be shown in the absence of a real effect, purely through publication bias, experimenter bias, optional stopping, and other questionable research practices, this makes me suspicious of the stronger claims about expectation effects.

Comment author: Decius 11 April 2013 06:06:03AM 4 points [-]

I don't think the sample of experiments reviewed is large enough to evaluate sample size versus effect size; throw out the outliers and there's nothing left.

I'm now heavily concerned about the validity of the IQ test used; however, that's more due to the 8 point increase in the control group, when no increase is expected. I'll have to dig further, exclude any of the controls with out-of-band scores and redo the math.

One result of the meta-analysis, however, is that experimentally-induced changes to teacher expectation have a small casual effect on student performance; another result is that non-induced teacher expectations correlate well with performance in the same year, and less well with long term performance. I would rephrase that as 'Teacher expectations of student performance in their class tend to be accurate, but correlate poorly with student performance in other classes.'

In any case, thanks for the link. I'm going to have to spend some time determining how much I should change my mind with this new evidence, but my gut feeling is that the objectively worst possible data (my own experience with performing well when expected to perform well, and performing poorly when expected to perform poorly), will continue to dominate my personal opinion on the matter.

Comment author: CarlShulman 11 April 2013 06:17:52AM *  6 points [-]

I'm going to have to spend some time determining how much I should change my mind with this new evidence, but my gut feeling is that the objectively worst possible data (my own experience with performing well when expected to perform well, and performing poorly when expected to perform poorly), will continue to dominate my personal opinion on the matter.

Upvoted for candor.

Comment author: gwern 11 April 2013 09:13:54PM *  2 points [-]

I don't think the sample of experiments reviewed is large enough to evaluate sample size versus effect size; throw out the outliers and there's nothing left.

The first Rosenthal meta-analysis used 345 studies. That is pretty big. And the individual studies listed in table 17.1 have large n, ranging from 79 to 5000+.

I'm now heavily concerned about the validity of the IQ test used; however, that's more due to the 8 point increase in the control group, when no increase is expected.

No, that's not a problem that should concern you. Children IQ scores are less stable than older people's scores, test-retest effects will give you a number of IQ points (that's why one uses controls), and children are constantly growing.

What should concern you is that the researchers involved were willing to pass on and champion a result driven solely by obviously impossible nonsensical meaningless data. A kid going from 18 IQ to 122? or 113 to 211? This can't even be explained by incompetence in failing to exclude scores from kids refusing to cooperate, because tests in general (much less the specific test they used!) are never normed from 18 to 211. (How do you get a sample big enough to norm as high as 7.4 standard deviations?)

Worrying about the control's gains and not the actual data is like reading a physics paper reporting that they measured the speed of several neutrinos at 50 hogsheads per milifortnight, and saying 'Hm, yes, but are they sure they properly corrected for GPS clock skew and did accurately record the flight time of their control photons?"

Comment author: Decius 12 April 2013 04:06:20AM 0 points [-]

Unstable IQ scores should provide a net zero; an average increase of half a standard deviation across the entire population already means that the norms are fucked.

Therefore, the IQ test used simply wasn't properly normed; if we assume that it was equally improperly normed for all students in the study, we still see an increase of 4 points based on teachers being told to expect more. Whether an increase of 4 points is statistically significant on that (improperly normed) test is a new question.

Comment author: gwern 12 April 2013 03:42:49PM 1 point [-]

Unstable IQ scores should provide a net zero; an average increase of half a standard deviation across the entire population already means that the norms are fucked.

Only if you make the very strong assumptions that there is no systematic bias or selection effect or regression to the mean or anything which might cause the unstability to favor an increase.

Plus you ignored my other points.

Plus we already know from the pairs of before-afters that these researchers are either incredibly incompetent or actively dishonest.

Plus we already know biases in analysis or design or data collection can be introduced much more subtly. Gould's brainpacking problems is only the latest example.

Therefore, the IQ test used simply wasn't properly normed; if we assume that it was equally improperly normed for all students in the study,

Which claim and assumption we will make because we are terminally optimistic, and to borrow from the '90s, "I want to believe!"

we still see an increase of 4 points based on teachers being told to expect more. Whether an increase of 4 points is statistically significant on that (improperly normed) test is a new question.

Wow, you still aren't giving up on the Pygmalion study? Just let it go already. You don't even have to give up on your wish for self-fulfilling expectations - there are plenty of followup studies which turned in your desired significant effects.

Comment author: Decius 12 April 2013 04:16:27PM -1 points [-]

Only if you make the very strong assumptions that there is no systematic bias or selection effect or regression to the mean or anything which might cause the unstability to favor an increase.

What effects could cause an increase of 8 points on a properly normed test across the board? Why would there a significant benefit to being in the control group of this study?

Plus we already know from the pairs of before-afters that these researchers are either incredibly incompetent or actively dishonest.

You can rule out that they were using a test which produced the scores that they recorded, perhaps by using raw score rather than normed output. You can rule out every other explanation for why the recorded results aren't valid scores. You can even rule out that they were competently dishonest, since competent dishonesty would be nontrivial to detect; your only possible conclusion is incompetence, which isn't evidence which should change your priors.

Incompetence is the social equivalent of the null hypothesis, and there is very rarely any significant evidence against it.

Therefore, the IQ test used simply wasn't properly normed; if we assume that it was equally improperly normed for all students in the study,

Which claim and assumption we will make because we are terminally optimistic, and to borrow from the '90s, "I want to believe!"

Assuming only incompetence as you have, the expected result would be equally erratic for all students. You can assign any likelihood to the assumption that the incompetence was the primary factor and that dishonesty doesn't modify it significantly, but you have already concluded systemic incompetent dishonesty across a large number of studies.

Wow, you still aren't giving up on the Pygmalion study? Just let it go already. You don't even have to give up on your wish for self-fulfilling expectations - there are plenty of followup studies which turned in your desired significant effects.

As you say, it's been confirmed by other studies. I'm not insisting that a particular study was done correctly, I'm explaining why their conclusions being true is consistent with the errors in their study. (Which means that a study with those flaws would be expected to reach the same conclusions, if those conclusions were true)

Comment author: gwern 12 April 2013 05:37:19PM *  1 point [-]

What effects could cause an increase of 8 points on a properly normed test across the board? Why would there a significant benefit to being in the control group of this study?

I already gave you three separate explanations for why an increase is possible, even in controls.

your only possible conclusion is incompetence, which isn't evidence which should change your priors. Incompetence is the social equivalent of the null hypothesis, and there is very rarely any significant evidence against it.

I have no idea what you mean by this, and I think that if one accepts their incompetence, the best thing to do is to ignore their data as having been poisoned in unknown ways - maliciousness, ideology, and stupidity often being difficult to tell apart.

Assuming only incompetence as you have, the expected result would be equally erratic for all students.

Why is that? The competent result is, since IQ interventions almost universally fail (our prior for any result like 'we increased IQ by 8 points' ought to be very low, as in, well below 1%, because hundreds of interventions have failed to pan out and 8 points is astounding and practically on the level of iodization) and the followups confirm that there is only a much much smaller effect, that there is no or a small effect. Any incompetence is going to lead to an extreme result. Like what they found.

As you say, it's been confirmed by other studies.

'Confirmed'? Well, this is an active debate as to what counts as a replication. Near the same magnitude or just having the same sign? If someone publishes a study claiming to find a weight loss drug that will drop 100 pounds, and exhaustive replications find that the true estimate is actually 1 pound, has the original claim been "confirmed"? After all, both estimates are non-zero and both estimates have the same sign...

Comment author: CarlShulman 11 April 2013 10:10:20PM *  0 points [-]

That's going further than I did. It's a reasonable prior, and the evidence is at least consistent with weak effects.

Comment author: gwern 11 April 2013 10:54:15PM 1 point [-]

Eh. Decius was clearly thinking of, and still is thinking of, substantial and longlasting effects rather than the almost trivially small disappearing effects confirmed by the followups and meta-analyses. That is completely unreasonable a view to hold after reading that review, and I would suggest that even that small nonzero effect is dubious since it seems that few to none of the studies fully accounted for the accuracy issue and there is obviously publication bias at play.