Today's post, 3 Levels of Rationality Verification was originally published on 15 March 2009. A summary (taken from the LW wiki):

 

How far the craft of rationality can be taken, depends largely on what methods can be invented for verifying it. Tests seem usefully stratifiable into reputational, experimental, and organizational. A "reputational" test is some real-world problem that tests the ability of a teacher or a school (like running a hedge fund, say) - "keeping it real", but without being able to break down exactly what was responsible for success. An "experimental" test is one that can be run on each of a hundred students (such as a well-validated survey). An "organizational" test is one that can be used to preserve the integrity of organizations by validating individuals or small groups, even in the face of strong incentives to game the test. The strength of solution invented at each level will determine how far the craft of rationality can go in the real world.


Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Schools Proliferating Without Evidence, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

New Comment
1 comment, sorted by Click to highlight new comments since:

(I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.)

Thanks for saying this. I always felt it completely wrong that as a teacher I was supposed to indirectly measure my own success. In theory, I am grading my students, not myself. But in real life, if I get a random half of the class, and my colleague gets the other half, and at the end of the year his students get good grades and my students get bad grades, people start be curious why. (You bet that the students know what grades would they get in the other group.) One interpretation is that my colleague's grades are cheaper than my grades. Other interpretation is that my colleage is good at teaching, and I am bad. And usually you don't have more data than this, so you just have to make your guess. (Unless there is some external measure, for example students participating in a competition outside of the school.) If possible, parents will make a lot of pressure on school to provide the best teachers: which almost always means the teachers who give the best grades. Because parents' prior probabilities usually suppose that their child is briliant and only an incompetent person could claim otherwise.

The schools would be in my opinion much improved if teachers only did the teaching, and an independent institution would give the grades. There would be no pressure on teachers to give better grades, and although someone would prefer more learning and someone would prefer less learning, each student would bear the consequences of their choice. -- There is this technical problem that external examination is expensive and feedback needs to be frequent, but that could be fixed by having two kinds of grades: teacher would give informal grades for instant feedback, but at the end of the year the students would be formally tested by the external institutions, and the informal grades would be completely ignored.

Unfortunately, seems to me that today the mere idea of grading students is considered wrong by many people. Partially, there are good reasons, like the Goodhart's law, manifested by "teaching and learning for tests, not for knowledge". But many people don't want to fix this by improving the tests, but by getting rid of them completely. As if the feedback is completely unnecessary. I would prefer an imperfect measure (which is known to be imperfect) to no measure.