I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows. I have a sense that more is possible.
The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.
I suggest stratifying verification methods into 3 levels of usefulness:
- Reputational
- Experimental
- Organizational
If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur. The same would go if your school regularly competed against other schools. You'd be keepin' it real.
Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters. Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers. In this latter class we can also place the splintered schools of psychoanalysis.
So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.
But that doesn't yet get you a science. A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results. Experiments have to be replicable and replicated. This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.
The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness. And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.
But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund? Then you're going to need some test that's harder to game than just asking someone "How happy are you?"
This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society. If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential? If you give colleges the power to grant degrees, then do they have an incentive not to fail people? (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.) If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?
If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose. Colleges turn into tests of whether you can endure the classes. High schools do nothing but teach to statewide tests. Hedge funds sell puts to boost their returns.
On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect. So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?
(Added: Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance. But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)
So I now put to you the question—how do you verify rationality skills? At any of the three levels? Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics. Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method). Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.
Reputational, experimental, organizational:
- Something the masters and schools can do to keep it real (realistically real);
- Something you can do to measure each of a hundred students;
- Something you could use as a test even if people have an incentive to game it.
Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish. This is one of the Big Important Foundational Questions, so—
Think!
(PS: And ponder on your own before you look at the other comments; we need breadth of coverage here.)
I doubt a few minutes of pondering will provoke any significantly insightful thoughts, but on the off chance that they do here's what I've got:
A major pitfall of most tests is that they can end up examining a wide variety of confounding variables. For example if the test for rationality is based on a written prompt then it selects against those with dyslexia in spite of their rationality. If it's based on a spoken prompt then it selects for those with similar accents to the test-giver, or against those who had it read to them in a strange way. Ideally since the thing that we're selecting for is (I assume) practical reasoning skills, we would want the test to have some similarities to real life.
Thus the thought that comes to mind is an escape room which can be set up and run essentially-identically for each participant, whose puzzling elements require you to make Bayesian updates on multiple propositions that you were given an idea of the likelihood at the start. In order to avoid biasing the tests in favor of those with more general knowledge, the propositions would ideally be totally fictitious. It occurs to me that the elements of real-world pressure and communication would bias the test against those prone to anxiety, but given that that's a common problem when you're called on to apply your rationality skills in reality I think that may be an acceptable flaw, if no other options are obviously superior.