I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows. I have a sense that more is possible.
The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.
I suggest stratifying verification methods into 3 levels of usefulness:
- Reputational
- Experimental
- Organizational
If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur. The same would go if your school regularly competed against other schools. You'd be keepin' it real.
Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters. Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers. In this latter class we can also place the splintered schools of psychoanalysis.
So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.
But that doesn't yet get you a science. A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results. Experiments have to be replicable and replicated. This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.
The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness. And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.
But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund? Then you're going to need some test that's harder to game than just asking someone "How happy are you?"
This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society. If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential? If you give colleges the power to grant degrees, then do they have an incentive not to fail people? (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.) If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?
If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose. Colleges turn into tests of whether you can endure the classes. High schools do nothing but teach to statewide tests. Hedge funds sell puts to boost their returns.
On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect. So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?
(Added: Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance. But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)
So I now put to you the question—how do you verify rationality skills? At any of the three levels? Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics. Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method). Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.
Reputational, experimental, organizational:
- Something the masters and schools can do to keep it real (realistically real);
- Something you can do to measure each of a hundred students;
- Something you could use as a test even if people have an incentive to game it.
Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish. This is one of the Big Important Foundational Questions, so—
Think!
(PS: And ponder on your own before you look at the other comments; we need breadth of coverage here.)
I know of some other stupid tests for rationality, borrowed happily from Invader Zim.
On a less stupid note: Reputationally, I have an explicit agreement with one of my friends that we fact check each other. This was actually a one-way fact checking until fairly recently when he asked me why I didn't call him on something he later realized was total bullcrap. Note, this works best if you actually have a good memory and aren't pickling your brain with alcohol. It also seems to help check the mindkilling effects of disagreement.
A long time ago, I was reading about critical thinking, and was presented a relatively short list of questions to try and use to stimulate critical thought. Questions of this nature could be used in some form of standardized test; or could be used to build a portfolio of rationale behind opinions on all manner of things, which could be graded by peers or instructors (preferably ones who also aspire to rationality, and disagree). I suppose the portfolio would be more organizational than experimental, and almost as easy to game as cheating on essays. But those were my main thoughts before reading the cool ideas other people came up with.
In case you're interested, this was the list as I transcribed it:
Oh, and after reading the Logic of Failure, maybe running simulations like they did with the Sim City-like vibe, or the optimizing bug population or the refrigeration tests could be instructive. Even after learning about them, (especially the city planning and the African tribe) they may be sufficiently complicated to be of experimental or organizational value. On the other hand, they may turn out to be just as useless as chess for testing rationality if success strategies are posted and shared. Maybe some of the sims could have randomly assigned (Kirk resistant) Kobayashi Maru modes, but then I don't see how a predetermined loss would be very instructive unless the player didn't know it was rigged---and even then, only to illustrate Eliezer's point that even if you do everything right, you can still fail.