I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows. I have a sense that more is possible.
The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.
I suggest stratifying verification methods into 3 levels of usefulness:
- Reputational
- Experimental
- Organizational
If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur. The same would go if your school regularly competed against other schools. You'd be keepin' it real.
Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters. Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers. In this latter class we can also place the splintered schools of psychoanalysis.
So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.
But that doesn't yet get you a science. A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results. Experiments have to be replicable and replicated. This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.
The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness. And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.
But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund? Then you're going to need some test that's harder to game than just asking someone "How happy are you?"
This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society. If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential? If you give colleges the power to grant degrees, then do they have an incentive not to fail people? (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.) If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?
If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose. Colleges turn into tests of whether you can endure the classes. High schools do nothing but teach to statewide tests. Hedge funds sell puts to boost their returns.
On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect. So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?
(Added: Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance. But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)
So I now put to you the question—how do you verify rationality skills? At any of the three levels? Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics. Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method). Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.
Reputational, experimental, organizational:
- Something the masters and schools can do to keep it real (realistically real);
- Something you can do to measure each of a hundred students;
- Something you could use as a test even if people have an incentive to game it.
Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish. This is one of the Big Important Foundational Questions, so—
Think!
(PS: And ponder on your own before you look at the other comments; we need breadth of coverage here.)
An idea that might be both unsustainable and potentially dangerous, but also potentially useful, is to have someone teach as a final test. Less an exam and more a project (with oversight?). Of course, these trainees could be authentic or disguised testers.
Problems with this idea (non-exhaustive): - Rationality doesn't necessarily make you good at teaching, - Teaching the basics badly are likely to have negative effects on the trainee, - This could potentially be gamed by reformulated regurgitation.
So... What behaves differently in the presence of Rationality. I like Brennan's idea of time pressure, though he himself demonstrates that you don't need to have finished training for it, and it doesn't really hit the mark.
Or: What requires Rationality? Given Hidden Knowledge (may only require facts that are known, but not to them), one could present new true facts that need to be distinguished from new well-crafted falsehoods (QM anyone?^^). This still only indicates, but it may be part of the process. If they game this by studying everything, and thinking for themselves, and coming to correct conclusions, I think that counts as passing the test. Maybe I am currently not creative enough though. This test could also be performed in isolation, and since time would probably be a relevant component, it would likely not require huge amounts of resources to provide this isolation. Repeat tests could incorporate this (or seemingly incorporate it) too.
If you wanted to invest more effort, you could also specifically not isolate them, but put them in a pressured situation (again, I am being influenced by memories of a certain ceremony. But it is simply really good.) This doesn't have to be societal pressure, but this kind at least makes rash decisions less likely to be costly.
I can't really formulate the idea concretely, but: A test inspired by some of ye olden psychology experiments might provide double yield by both testing the rationality of the person in question and also disabuse them of their trust. Though I can see a lot of ways this idea could go awry.
An issue that most if not all of my tests run into is that they limit what could be taught, since it is still part of the test. This is a problem that should be solved, not just because it irritates me, but because this also means that random chance could easier change the results.
This is, I think, because so far all tests check for the correct answer. This, in itself, may be the wrong approach. Since we try to test techniques which have an impact on the whole person, not "just" their problem solving. I would for example hope that a crisis situation would on average benefit from the people being trained in rationality, not just in regards to "the problem solving itself", but also the emotional response, the ability to see the larger picture, prioritization and initial reaction speed, and so on.
(Maybe having them devise a test is a good test...^^ Productive, too, on the whole.)
(I can think of at least one problem of yours that I still haven't solved, though I therefore can't say whether or not my not-solving-it is actually showing a lack of rationality[though it's likely], or rather depends on something else. Not sure if I should mention it, but since you (thankfully) protect the answer, I don't think that I need to. This, still, is asking for a correct answer though.)
That's all I can think of for now. Though I am not really satisfied... Do I need to be "at a higher level" to be able to evaluate this, since I don't fully grasp what it is that should be tested yet? Seems like either an option or a stop sign..