3 Levels of Rationality Verification

24Eliezer_Yudkowsky15 March 2009 05:19PM

Previously in seriesSchools Proliferating Without Evidence
Followup to
A Sense That More Is Possible

I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows.  I have a sense that more is possible.

The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.

I suggest stratifying verification methods into 3 levels of usefulness:

  • Reputational
  • Experimental
  • Organizational

If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur.  The same would go if your school regularly competed against other schools.  You'd be keepin' it real.

Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters.  Other martial arts schools fail to compete at all - except based on charisma and good stories - and their masters decide they have chi powers.  In this latter class we can also place the splintered schools of psychoanalysis.

So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.

But that doesn't yet get you a science.  A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results.  Experiments have to be replicable and replicated.  This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.

The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness.  And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.

But suppose you wanted to put happier people in positions of power - pay happy people to train other people to be happier, or employ the happiest at a hedge fund?  Then you're going to need some test that's harder to game than just asking someone "How happy are you?"

This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society.  If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential?  If you give colleges the power to grant degrees, then do they have an incentive not to fail people?  (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.)  If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?

If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose.  Colleges turn into tests of whether you can endure the classes.  High schools do nothing but teach to statewide tests.  Hedge funds sell puts to boost their returns.

On the other hand - we still manage to teach engineers, even though our organizational verification methods aren't perfect.  So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?

(Added:  Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance.  But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)

So I now put to you the question - how do you verify rationality skills?  At any of the three levels?  Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics.  Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method).  Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.

Reputational, experimental, organizational:

  • Something the masters and schools can do to keep it real (realistically real);
  • Something you can do to measure each of a hundred students;
  • Something you could use as a test even if people have an incentive to game it.

Finding good solutions at each level determines what a whole field of study can be useful for - how much it can hope to accomplish.  This is one of the Big Important Foundational Questions, so -

Think!

(PS:  And ponder on your own before you look at the other comments; we need breadth of coverage here.)

Comments (134)

RobinHanson01 April 2009 01:14:53AM2 points [-]

I just suggested a relevant rationality test here: http://www.overcomingbias.com/2009/03/how-spend-rationality-test.html

orthonormal22 March 2009 04:23:23PM5 points [-]

(haven't looked through comments, so this may have been suggested many times over)

In a college-level rationality course, it would be most appropriate for a portion of the grade to be determined by an artificial economy. That is, set up a currency and a (relatively even) starting distribution, add (probabilistic) opportunities for investment (perhaps linked to other important parts of the course) and, most importantly, make defection possible, anonymous and easy. Make it, as much as possible, like a vast array of one-shot (or known number of iterations) Prisoner's Dilemmas.

Then allow students to organize into institutions with rules. Well-taught rationalists should be able to construct a very strong economy along these lines; poorly-taught ones will be only rational enough not to cooperate out of an irrational sense of honor. A student's final grade on that component will be the logarithm of their final wealth, curved as little as possible.

It would take a well-designed setup, of course, to ensure that we're truly measuring rationality and not (say) merely group cameraderie; but I think it could be worked out in a satisfactory way.

The main upshot of this as regards rationality verification: if two different rationality curricula run the same economy setup, a consistently better growth rate of one class economy is evidence of the second kind that more complete rationality is being taught. The students have a much bigger incentive towards their own grade than towards the reputation of the class, so it should be a pretty decent test.

patrissimo21 March 2009 10:09:06PM3 points [-]

I'm tempted to say "have them play poker", except it uses lots of domain-specific knowledge as well as general rationality. Perhaps if you could generate random games from a large enough space that people don't build up game-specific skills, and the games just end up testing general rationality? While poker-like games don't test all aspects of rationality, there are some things like "ability to keep making good decisions when frustrated / bored / angry" that these games test very well.

I think people would develop skill at the whole class of games...but at the same time, they would be improving their rationality.

steven046117 March 2009 02:54:27PM* 5 points [-]

Carry around a notepad, form probabilistic opinions on lots of little questions that you can find out the answer to soon after, record all the probabilities assigned to correct answers, where applicable add tags like "politics", "project completion", "my social status", "trivia", put into a spreadsheet or something and see if you're miscalibrated globally and for different tags.

haig16 March 2009 08:33:34PM5 points [-]

There is a recent trend of 'serious games' which use video games to teach and train people in various capacities, including military, health care, management, as well as the traditional schooling. I see no reason why this couldn't be applied to rationality training.

I always liked adventure style games as a kid, such as King's Quest or Myst, and wondered why they aren't around any more. They seemed to be testing rationality in that you would need to guide the character through many interconnected puzzles while figuring out the model of the world and how best to achieve the goals of the protagonist. It seems like the perfect video game genre for both developing and testing rationality skills.

Specifically, I've thought of a microcosm of the real world, taking place in a different setting yet similar enough to our real world that there would be analogues to religion, science, politics, etc. As you progress through the game, say from child to adult, you learn about the world and see how different beliefs and strategies effect the game. Players would encounter similar challenges to the real world but be disconnected enough not to put up a defense mechanism, yet involved enough to care about the outcome. Add MMO et al features to taste.

steven046117 March 2009 02:40:00PM* 1 point [-]

I always liked adventure style games as a kid, such as King's Quest or Myst, and wondered why they aren't around any more.

Google "interactive fiction".

Emile16 March 2009 09:03:35PM* 4 points [-]

"Piggyback" on other tests: ask people taking part in various tests (standardized exams, sport competitions, driving lessons, programming contests, art exhibitions - whatever) their chances of success (or their probability distribution over the range of results).

The other items should themselves be important enough, so it would fit well with a university cursus, so that it can be "automated" for a lot of things. The way of asking for predictions should be made so as to maximize bad predictions: for example the students are asked to give estimations in front of their peers (if that's shown to get them to overestimate), but afterwards not reminded of the prediction they gave nor of whether it came true (so that they don't deliberately try to make it come true).

It could also be extended to other events like "when I'll turn in my thesis" or even "whether I'll be single in a year" or "how much I'll weight in six months".

The more subjects they have to estimate on, the better. At the end, measure the Bayes-score.

This could be combined to some more "dramatic" and explicit rationality tests (see the other comments) to constitute the scoring method of a university rationality course. The explicit rationality tests would also help take a bit of attention away from the day-to-day probability estimates on exams and stuff, to diminish the "only rational when deliberately thinking about it" phenomenon.

Oh, also - ask the students for an estimate before the exam and after the exam (but before they have a chance of talking to someone else). Maybe even a week before and a week after too.

zaph16 March 2009 07:33:42PM* 3 points [-]

Maybe something that tests "certainty faking"? I really don't know how to construct it, per se, may use a FACS test to see how much a person is trying to convey that they're very certain of something when they aren't. That would just be conscious faking, of course; you'd still need something to assess when someone is expressing their feeling of certainty vs. the data. Maybe something like Texas Hold 'Em, except with bets being placed on how accurate the probabilities are (e.g. randomized variations of situations like the cancer scenario at EY's Bayes page).?

Sorry if I'm not articulating this well, hopefully it's good enough to live up to the stupid idea criteria, if not the good idea. Oh, and I didn't read any of the comments, so I don't know if this has been suggested.

jimrandomh16 March 2009 04:29:04AM6 points [-]

There are two problems with measuring rationality, one of which is difficult but manageable, the other of which might be insurmountable. The first problem is that most conceivable tests of rationality require using information from other fields (such as finance, physics, or psychology), such that you can gain a considerable advantage on the test by studying things from that field which don't actually make you more rational. This can be solved with sufficient cleverness.

The second problem is that how rational someone is depends on how well they maintain it under stress. Pressure, fatigue, emotionally charged situations, alcohol, and/or deliberate manipulation, can make the best rationalists act completely insane. (About a year ago, I went on a reality television show, which was in a way like a series of rationality tests. I didn't do all that well, rationality-wise, but some people who should have known better did dramatically worse.)

patrissimo21 March 2009 10:10:56PM1 point [-]

Yes, the maintaining under stress aspect is key. This is a large part of why poker is hard - it has many characteristics which maximize stress by triggering bad primal instincts.

bentarm16 March 2009 03:47:23AM6 points [-]

I'm not sure if this has already been said, but does the "biases" literature not already contain a lot of perfectly good (although probably overly game-able) rationality tests? Just pick an experiment at random from Tversky and Kahneman and see how well the people in the school do.

Of course, there is a problem of people learning how to do some of these tests, but I'm pretty sure there are some that could be reworked so that they're pretty damned hard to pass even if you're well-acquainted with the literature. I'm thinking particularly those where half of the subjects are asked a different question to the other half, and the results compared - e.g., tests for the Lake Wobegon effect, for Social Attribution Bias, etc.

zaph16 March 2009 08:03:57PM1 point [-]

Shouldn't the rationality school suggested by Eliezer, though, be able to train someone to be able to do well on these tests, by essentially becoming very familiar with the literature? Just devil's advocating against your devil's advocation; it seems like this would actually be pretty ideal, as you have scientifically benchmarked tests that show what let's say "naive" individuals think when encountering these problems, from where you could then see progress from the "trained" rationalists. The problem with gaming this system would be with people who are studying rationality but plan to subvert it at some point; the rationalist community would need to have frequent re-certifications so that rationalists don't rest one their laurels and rely on status to convey and inferred rationality of the decision.

Eliezer_Yudkowsky16 March 2009 08:31:36PM3 points [-]

The problem is if they do well on written questions in classes but no better than average at applying the same knowledge to real life.

bogdanb28 March 2009 09:47:10PM1 point [-]

This is a problem with “class tests” of anything, of course. I've thought (more than five minutes) on your post, but I didn't come up with much specifically about rationality testing. (Except for “automatically build arbitrary but coherent «worlds» automatically, let students model them and the check how well their model fits «reality» afterwards”, which is an obvious application of the definition, and has been suggested already several times.)

I've come up with a few thought on testing in general:

1) As you say, cheap-but-game-able tests are often useful; we do have useful universities despite the problem of Us awarding diplomas to their own students. I think this is more than just “works well enough”, in some case it's actually useful: (a) Having good tests (e.g., by a third party) requires defining well in advance exactly what you're testing. But in many cases it can be useful if a school experiments with what it teaches (and even why), and the only test needed is internal. (b) In many (most?) cases, you can't really test some ability until you really try using it. There are plausible cases where a quick-and-dirty (but cheap) test (e.g. university diplomas) is needed only to pre-select people (i.e., weed out most incompetents), and then get to real testing doing actual work (e.g., hiring interviews and tests, then probation period). If you make the initial test «better» (e.g., harder to game) but more expensive you may be actually loosing if it's not «better» in the sense of accurate for whatever you need people to be good in.

OK, now I'm getting to what you're saying about doing good in class but bad in real life. It seems an obvious solution that you should actually be doing the testing in real life: first weed out the bad as well as you can with an approximate test (how good you do on this tests your map against reality), then “hire” (whatever that means in the context) people who look promising, make them do real work, and evaluate them there.

You don't have to evaluate everything they do, as long as you do it randomly (i.e., nobody knows when they're evaluated). The fact that random testing is done can be safely made public: if you don't know when it's done, the only way to “game” this is to actually be as good as you can be all the time.

The random testing can be passive (e.g. audits) or active (e.g. penetration testing). The only trick is that you have to do it often enough to give significant information, and that the tested can't tell when they're being tested. For instance, testing for biases can be very useful even in a context where everybody is extensively familiar with their existence, as long as you do it often enough to have a decent chance of catching people unawares. (This is hard to do, which is why such tests are difficult. Which is why university exams are still useful.)

Note that you don't have to make all tests undetectable; having some tests detected (especially if it's not obvious that they are detectable on purpose) both reminds testees of them, and allows detecting people who react differently when tested than in real life. (This can then allow you to notice when people detect tests you're trying to keep secret, assuming there's enough testing going on.)

bogdanb28 March 2009 09:50:50PM2 points [-]

Oh, and another thing that seems obvious: change tests often enough that they can't be gamed. This is of course hard and expensive, which is why it isn't done very often.

thomblake16 March 2009 08:37:19PM2 points [-]

You have to admit that's an empirical question, though. It could be that getting the competence to do well on rationality tests requires the same skill as applying the same knowledge to real life. There are some areas where 'fake it till you make it' works, and there are some things you can't pretend to do without actually succeeding in doing the thing.

talisman15 March 2009 10:33:37PM13 points [-]

Occasionally, well-respected community members could say things that are intentionally false, but persuasive and subtle, a la http://www.overcomingbias.com/2008/02/my-favorite-lia.html.

You get points for catching these mistakes. Perhaps you submit your busts privately to some arbiter so others have the same challenge.

Later, the error is revealed and discussed.

This would also have the benefit of causing everyone to read the most-respected members' writings ultra-critically, rather than sitting back and being spoon-fed.

One key thing this idea has is short term feedback. Frequent, rapid feedback is essential for getting good at this kind of thing. (IMO that's why economics is still so useless relative to the other sciences: the experiments take fifty years to run.)

MBlume15 March 2009 10:46:11PM1 point [-]

I can see the need for anonymity to avoid spoilers, but I think doing the thing publicly has benefits too -- that way there's the risk on the other side of having publicly denounced the Great Teacher when he was speaking truthfully.

Eliezer_Yudkowsky15 March 2009 11:19:30PM2 points [-]

You could have private points subtracted off and that gives you the same incentive not to make uncertain accusations. Attach confidence levels and take Bayes-score.

JGWeissman01 April 2009 05:21:06AM1 point [-]

With the Bayes-score being always negative, I don't see what incentive one would have to submit a mistake report. I think it would be better to test for better than, for example, 90% confidence, by awarding 1 point for a correct report and deducting 9 points for an incorrect report. This achieves the goal of detecting ability to detect bad arguments. Measuring calibration would have to be a seperate test.

Emile15 March 2009 10:32:39PM9 points [-]

Organize large games/contests where a lot of candidates are locked up in an area, and have a finite time to reach a certain point / find a certain object.

The exact rules would be specially designed each time for that years challenge, by a group of rationalists and game designers. So the details would vary, but some common themes would be: - physical prowess does not come into play (beyond maybe moving around faster, not getting tired as easily etc.) - some people would be liars / saboteurs, and not real candidates

For example, the candidates are blindfolded and brought into a large underground circular room, whose only unlocked exits are twenty slides along on the edge (so, one-way exit only). The goal is to take the exit that's due north.

Or, the players are dropped in a maze, and each player is given twenty balls with his name written on them. In the maze are tall glass tubes in which the player can drop their balls. The players know that at the end of the games everyone gets points for the balls with his name that are in "good" tubes (from 10 to 1 points, depending on whether his ball is at the bottom or top - only ten balls fit in a tube), and loses points for balls in "bad" tubes (whatever it's position). There are also neutral tubes. On the tubes are various signs and portents, and on the walls are statements about the the meanings of the signs ("about 10% of good tubes have red triangles", "two squares of the same color cancel out", "a blue triangle means that there's a bad tube close to this one"). The players have 30 minutes to place their balls.

Additional twists: - there are in fact several simultaneous games taking place, in the same place, but the rules are such that it's very difficult to tell who's part of which game (for example, if some players' goal is to unmask/identify other players) - the goal may not be reachable at all (no candidates accepted this year). The "global" rules of the contest might include that there must be a certain probability each year (10% ?) that the contest is impossible. - candidates are not alone but in teams

... well, there is plenty of inspiration to take from board games and TV shows. And many factors of those can be controlled by careful design (importance of luck or of trivia knowledge, how much "herd behaviour" can come into play, etc.). The games should be more complicated than what's said above, and contain many red herrings. The designers should try to introduce as much sources of bias and irrationality as possible.

Nebu16 March 2009 05:31:37PM3 points [-]

Voted up if only because this reads like a description for the first reality TV show I would actually want to watch.

MichaelHoward16 March 2009 10:43:52PM* 1 point [-]

Here you go :) (and here's the kids' version)

JulianMorrison16 March 2009 03:41:53AM3 points [-]

I'm reminded of your own introduction to Bayes. Even a really good test won't do a darn bit of good if rationalists are vanishingly rare.

swestrup15 March 2009 08:15:03PM* 11 points [-]

Well, you asked for DUMB ideas, so here's mine. It has the advantage that I'm sure no one else will suggest it. This is based on an accidental discovery (so far as I know, unpublished) that one can compare two arbitrary documents for similarity (even if they are in different word-processor formats) by running them both through a recognizer built out of a random state machine and comparing bit masks of all the states traversed. The more common they are, the more states will be traversed in both.

So, lets assume we have a panel of highly rational individuals which are our control group. We generate a random multiple-choice questionnaire consisting of nonsensical questions and answers. Things like:

1) How Green is the Smell of Bacon?

a) 7.5

b) Neon

c) Introspection

d) Larger

You then do a correlation over how your panel of experts chose their answers and see if there is a common pattern. You then score students who take the test based on how similar to the common pattern they are.

Assuming this idea works at all, the advantage of this is that it would be extremely difficult to game. The disadvantage would be that it would penalize those who are significantly more rational than the 'norm'. It would probably also require the panel to be similar to each other in cognition. There is also the general problem of not knowing if you're really testing for what you think you're testing.

Frankly, I don't know if I'd be more happy if this was tested and shown to be workable, or if it turned out to be a really stupid idea.

thomblake16 March 2009 08:05:19PM1 point [-]

I've actually proposed something like this to test for personality type. The main reason it never got implemented is there isn't really a good, workable theory of persistent personality.

MichaelVassar16 March 2009 05:25:31AM3 points [-]

I think that this resembles the MMPI methodology. http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory

Eliezer_Yudkowsky15 March 2009 08:18:26PM5 points [-]

NOT CRAZY ENOUGH! We need EVEN STUPIDER ideas!

(Voted up for being the best try so far, though.)

Comment deleted 18 March 2009 04:14:14AM[-]
swestrup21 March 2009 04:41:23PM1 point [-]

When I look at my question there, the only answer that seems appropriate is 'Introspection' as that's at least a step towards an answer.

MichaelHoward15 March 2009 09:06:07PM7 points [-]

Give the students sodium pentothal and ask if they're one of the top 50% of rationalists in their school. However many out of 200 say 'no', that's the school's percentage score. Schools scoring over 100% are thrown out for cheating.

JGWeissman01 April 2009 05:42:02AM1 point [-]

A school that reports to each student their class ranking easily games this test. The test could even favor schools that don't teach students enough to question an arbitrary class rank.

Also, this doesn't consider the possibility that students can be good rationalists, but don't interact with enough of the other students to make a good assessment of their relative strengths.

Eliezer_Yudkowsky01 April 2009 06:02:32AM1 point [-]

Also, this doesn't consider the possibility that students can be good rationalists, but don't interact with enough of the other students to make a good assessment of their relative strengths.

Good rationalists, taken as a group, shouldn't be systematically optimistic.

pjeby01 April 2009 02:54:58PM* 1 point [-]

Good rationalists, taken as a group, shouldn't be systematically optimistic.

They should be if they want to win in practice, as opposed to just getting theoretically-correct answers. See, e.g., the studies referenced in Seligman's "Learned Optimism", that show optimists consistently out-perform pessimists (i.e., realists) in a wide variety of fields and endeavors.

(Of course, Seligman's definition of optimism may be different from yours.)

JGWeissman01 April 2009 06:36:37AM1 point [-]

Perhaps we can still test for this systematic optimism, while filtering for the noise I objected to, by instead of asking a "yes" or "no" question, asking for the probability that the student is in the top 50%. Treat the sum of these probabilities as the count of "yes" answers in the original version. Then a rational student should be able to account for his ignorance of other students in his answer.

CarlShulman15 March 2009 06:13:44PM* 11 points [-]

For 'hot' political and religious biases, create materials in which apparent advocates of different ideologies or parties are arguing for some particular empirical prediction, e.g. about the relationship between different tax rate changes and economic growth, with some predictions being right and some wrong. The subject then needs to make his or her own prediction about some easily-verifiable but obscure empirical fact related to the argument, e.g. whether a graph of GDP and tax rates matches Norway or Iceland.

Scoring would reflect the degree to which the ideological affiliation in the prompt biased the results. If it was being gamed you might need to add in scoring for accuracy. Challenges would be producing a large enough inventory of test items, keeping them secret, and the need to tailor tests to locally popular ideologies or ideologies of interest.

More surveys that study the relationship between knowledge about verifiable facts and values. What sorts of information do those with different values tend to have, and what are the values of those whose knowledge covers the pet facts of all camps? There is a fair amount of this literature in political science aimed at the electorate and its political knowledge, but it would be good to extend it to other topics, e.g. scientific ones.

Announced probability distributions (not just predictions, so as to enable better scoring) for the results of upcoming experiments. For instance, we know that in the next 2-3 years we are going to get a huge amount of genomic data that will answer a lot of questions about the genetic architecture of human diseases. Making public quantitative predictions about things like that could be quite informative.

Sebastian_Hagen15 March 2009 10:20:37PM* 5 points [-]

Use small-scale, limited-term betting markets with play money.

Put the group of people you want to rank relative to each other into a room - without internet access. Everyone starts with 0 points. People are ranked on how many points they have at the end of the test.

Participants make bets (for points) with each other. There's a time limit for settling those debts; all bets made have to be specified in a way that clearly determines the winner within a fixed period after the end of the test. Of course, bets that can be settled immediately (e.g. on current trivia, history or fiction) are also permissible.

Aside from that, there's no limits: Any time two participants agree they want to bet against each other, on whatever they specify for however many points they choose, they can register that bet.

For instance, Alice and Bob bet on the temperature as reported by <specific website> for <specific location> at 6:00 local time, monday after the test:

  • Bob will pay Alice 5 points if the temperature is at most 20 degree Celsius
  • Otherwise, Alice will pay Bob 20 points.

After enough time has passed for all bets to be settled, have a trusted third party determine the winner for each, tally up the points and rank participants by final score.

This game is absolute zero-sum: the only way to earn points is by taking them from another participant. Test runs and outcomes can be published without obviously weakening the idea: If there's something to be learned from previous rounds, all participants have a chance to learn it.

Studying obsesssively on certain subjects may help you, but only to the point that other participants don't know you've done it: If everyone knows that you are a major Higurashi no Naku Koro ni fan, they're unlikely to bet against you on that subject - or if they do, they won't bet very much.

Edit: Thinking about this some more, this kind of test has a failure mode: There's a strong incentive not to bet against people who are better at tests like this than you, so with sufficient information about the players the entire game may freeze up: For every possible bet, there's somebody who expects to end up worse off, no bets get made and everyone always walks out with 0 points.

Possible solution: Keep participants anonymous to each other during each test. If nobody knows who they're playing against, there's a higher chance they'll be willing to make some bets.

steven046117 March 2009 03:07:08PM1 point [-]

Good idea. It could work online if there's enough trust between participants.

Sebastian_Hagen18 March 2009 09:06:39AM* 1 point [-]

As an addendum, I think the whole thing could still work pretty well even if everyone is explicitly allowed to use the web (or any other data store) for research.

Bets that can be settled with immediately available information won't be very useful in that context, of course; but you could still bet on near future events. Speed research would be a valuable skill in this variant. Nevertheless, if you have any significant domain specific knowledge useful for making a short-term prediction, that should give you an advantage over someone speed-researching the topic before deciding if they want to make a specific bet on it against you.

The real problem is that access to the internet (or any nontrivial subset) also allows you to do realtime communication with other humans, so you might convince/hire a master rationalist to offer you advice during the test, which would be an extremely effective way to cheat.

MichaelHoward15 March 2009 06:33:37PM10 points [-]

People tend to compartmentalize. We need to bear in mind that anything we come up with that involves testing someone when they know they're being tested can only check how rational they can be if they put their mind to it, not how rational they are when they're not being tested.

swestrup15 March 2009 09:14:37PM3 points [-]

I agree. The only solutions to this that I can see is to either not let students know when they are being tested, or to have a system of continual testing.

Matt_Simpson15 March 2009 10:59:28PM* 3 points [-]

They key is probably to test someone without letting them know you are testing them. If I ran a martial arts dojo and wanted to make sure my students were really super badass ninjas, I would give them a convincing looking "test" that included things you would expect to see: strength, speed, form, technique, success in actual matches, etc.

This would have very little weighting in the actual grade, however. The real test would be some sort of surprise fight or fights where the student has no idea that the fight is actually one of the tests. Perhaps he (or she) is followed by the assailant until an opportunity to pick a fight arises.

The main advantage of the surprise test is that it is much hard to game. Imperfect metrics are much more likely to say something meaningful about the student in this surprise situation than if the student knows the test is coming.

When it comes to the rationality dojo, there are numerous normally easy-to-game heuristics that could be used, for example:

  • how susceptible the student is to group-think
  • what they do in some sort of strenuous situation (e.g., do they blow up the Huygens?) The situation must seem real to them.
  • are they willing to bet their beliefs even when no one important will notice?
  • What others can you guys think of?

edit: notice that lists are not working. edit 2: never mind, editing seemed to fix them.

I doubt that it would be practical to analyze all of the information and get a single number as a measure of the student's rationality. At the top of all of these tests would have to be someone whose judgment on matters of rationality can be trusted. This may be the most difficult part

Also note that this form of testing would probably be expensive.

Johnicholas15 March 2009 11:26:47PM4 points [-]

Frank Mager, in various books, including "Preparing Instructional Objectives", suggests working backward from evidence that would make you conclude that someone is, e.g. a Bayesian Master Rationalist, to the tests (and instructional objectives) for a course of instruction intended to turn someone into a Bayesian Master Rationalist (or whatever you want to turn them into).

Eliezer_Yudkowsky15 March 2009 11:41:19PM2 points [-]

Example?

Johnicholas16 March 2009 11:09:40AM5 points [-]

Telephone operators were supposed to have good "tone of service". So then the education people asked "What does good tone of service mean? What evidence would help you conclude whether an operator has good tone of service?"

And drilling down, they found that there was an entire list of behaviors implicit in the phrase "tone of service", like inflection as the operator reads the standardized phrases, such as "I'm sorry". One of the behaviors amused me - no banging - that is, hitting the telephone handset against something, presumably in anger at a frustrating customer.

So you can test for "good tone of service" by testing the observable behaviors.

If your concept of a Master Rationalist includes an "aura of competence", then probably we can break that down into concrete evidence that would cause you to conclude that someone has an "aura of competence". The concrete items become instructional objectives. If evidence that someone failed a bias or calibration test would cause you to conclude that they're NOT a Master Rationalist, then passing the bias or calibration test can be one of the instructional objectives.

MichaelHoward16 March 2009 08:51:08PM3 points [-]

Bearing in mind the human tendency to favor authority over quality given a choice between the two, I think it's important when testing to distinguish between "aura of competence" and ability to achieve useful results, and after testing to connect the former to the latter.

Johnicholas17 March 2009 04:35:34PM3 points [-]

Right. EY has mentioned a couple of times that he expects graduates of the hypothetical Rationality Dojo to exude their abilities, like Taking a Level in Badass, or his hedge-fund elites.

I want to clarify that I do not agree with this notion, and I suspect that individuals who exude preternatural skills are primarily good at exuding, not at performing. The example was just an example.

pjeby20 March 2009 06:11:35PM0 points [-]

After skimming some of his stuff on Amazon, I bought the whole "Mager Six-Pack" and am eagerly devouring it. I can already tell it''s going to make a huge difference in the way I teach mind-hacking.

One of the first ones I read, Goal Analysis, is particularly relevant to LW discussions: how to turn "fuzzies" (abstract qualities, adjectives, and adverbs) into concrete, measurable specifications of behavior. One minor catch: goal analysis can't make people magically agree on the True Meaning of a term, it can only expose the things they do or don't agree on...

...which probably makes it an incredibly valuable Rationality Tool in its own right.

Anyway, thanks for mentioning Mager's books -- I'd never heard of them before your comment.

Psy-Kosh15 March 2009 07:31:21PM7 points [-]

Hrm... Well, one initial notion I have is along the lines of this: Rationality training should improve how good one can become at other stuff, or at least improve ability to gain skills/etc in other fields.

So, maybe tests could be something along the lines of find various subjects/fields a student is unfamiliar with and basically assign them to "get some knowledge and skill in this field."

How efficiently students can basically bootstrap up into something they're unfamiliar with should vary with their rationality, right? So something like this may be a starting point.

(Yes, I can see a bunch of details that would need to be worked out, but seems to be that this notion may at least be somewhere to start for developing rationality tests.)

MichaelVassar16 March 2009 05:31:22AM3 points [-]

I think Tim Ferris was going to display this ability as the theme of a TV show.

MBlume15 March 2009 08:43:54PM* 5 points [-]

Ask a thousand married rationalists of a given school to estimate the probability that their spouses have cheated on them. Confidentially ask their spouses if they have. Measure group calibration.

ETA: This applies to any potentially painful, but verifiable question. Ask them to draw a probability distribution over their date of death, or the longevity of their marriages. Estimate the probability of various kinds of cancer appearing over the next (5,10,15) years, etc. etc.

Comment deleted 15 March 2009 09:50:55PM* [-]
MBlume15 March 2009 09:59:18PM1 point [-]

You may better serve your preferences if you remain in blissful ignorance.

There is a difference between wanting not to be a cuckold and wanting not to believe that you are a cuckold. I want the former.

Also, human relationships have a very Newcomb-like feel to them, because other humans are very good at ascertaining your true beliefs. If you are entertaining the hypothesis seriously, your wife will probably detect it.

Presumably, if you are entertaining the hypothesis -- at least beyond a societal average, or some such -- there is a root problem already in play.

But yes, this does have some self-fulfilling aspects which make it rather hard to model well.

Comment deleted 15 March 2009 10:15:56PM[-]
MBlume15 March 2009 10:29:35PM2 points [-]

On introspection, this does agree with my preferences, yes.

That does complicate things -- I'm not sure how to resolve this one.

I think we are using the world "rationalist" to cover too many meanings. One highly socially useful meaning for the word would be "person who can be reliably expected to speak the truth". Whatever you choose to call those, it'd certainly be useful to have some around for any society you'd like to build. We would want to have some tests to identify them.

swestrup15 March 2009 09:21:38PM1 point [-]

You'd have to define 'cheated on'. A fair number of the most rational folks I know live in non-traditional marriage arrangements.

MBlume15 March 2009 09:33:35PM2 points [-]

This is entirely true. We're going for emotional effect, so on that test, I'd keep it to the self-identified monogamists

Comment deleted 15 March 2009 08:37:14PM[-]
Rings_of_Saturn16 March 2009 02:12:23AM5 points [-]

Yeah... I can't think of any good actual examples either, but maybe we should be trying to falsify rationality, rather than verify it.

infotropism15 March 2009 10:14:53PM3 points [-]

I don't know if any of those particular suggestions would work, but the general idea is interesting, no one else suggested testing a negative correlate of rationality I think.

Comment deleted 15 March 2009 09:19:42PM* [-]
Johnicholas16 March 2009 01:26:18AM4 points [-]

The feature of "profitable in the real world" is very valuable. Keeps the test calibrated to what we're interested in measuring.

Real-money, real-world prediction markets also have this feature; I wonder what other examples exist.

rwallace15 March 2009 07:50:27PM5 points [-]

Compile a large enough database of historical events that nobody could memorize more than a fraction of it. For the test, choose a few events at random, describe the initial conditions and ask the candidate to predict the outcomes.

Kaj_Sotala15 March 2009 10:23:14PM* 3 points [-]

There are lots of proposals which basically say, let somebody predict the development of a situation they're previously unfamiliar with. But that'll probably be very heavily a test of IQ, and while rationality would certainly help your performance in such scenarios, it seems to me that IQ will regardless be a bigger factor. Same with using real-life performance as a factor.

I'm not opposed to using such scenarios, and I proposed something like that myself, but I do think that the scenarios have to be specifically designed so that they're likely to trigger known biases (even if in a subtle way). You can't just use totally random historical events or police cases.

Vladimir_Nesov15 March 2009 10:35:22PM1 point [-]

If the situation contains enough biasing factors, you'd need to be able to use the craft in order to correct for that, not just comprehend the situation. The situation should be simple enough for most people to notice the important details, if they know where to look.

Comment deleted 15 March 2009 08:40:00PM* [-]
simpleton16 March 2009 11:37:22PM* 4 points [-]

I strongly second the idea of using real science as a test. Jeffreyssai wouldn't be satisfied with feeding his students -- even the beginners -- artificial puzzles all day. Artificial puzzles are shallow.

It wouldn't even have to be historical science. Science is still young enough that there's a lot of low-hanging fruit. I don't think we have a shortage of scientific questions which are genuinely unanswered, but can be recognized as answerable in a moderate amount of time by a beginner or intermediate student.

infotropism15 March 2009 10:06:30PM3 points [-]

Just to mention in passing, when I read your particular example, my immediate thought was "right, I'd fail right away". Someone who sucks at math would probably find it very difficult to derive those solutions. Yet, I don't think that means they couldn't be rational. You'd have to take into account their personal skills and affinity in the scientific domain you're testing, and adjust for that.

Comment deleted 15 March 2009 05:37:28PM* [-]
Johnicholas15 March 2009 09:33:38PM3 points [-]

Here's a stupid idea: Evaluate people by auditing their domiciles. I've read (and from personal experience, I believe it) that you get really solid insight into someone's personal qualities by inspecting their home, as good as interviewing them and all of their friends and family. (I googled a bit, but I can't find the source.)

Anyway, it can probably be gamed.

Stefan_King15 March 2009 10:30:09PM2 points [-]

I totally agree with this.

It may even be hard enough to game, especially of the time of visit is sufficiently random. You don't go out buying furniture for your rationality test, and if you can change your housecleaning habits stabily overnight, you deserve a rationality prize.

swestrup15 March 2009 09:24:16PM3 points [-]

Well, there's always the idea of using fMRI scans to determine if someone is thinking in 'rational' patterns. You stick them under the machine and give them a test. You ignore the results of the test, but score the student on what parts of their brains light up.

MBlume15 March 2009 07:49:13PM* 4 points [-]

Here's an immoral one: crack a rationalist

Most, if not all, human minds are vulnerable to hacking, eg by cults, religions, pseudoscience, etc. The minds of rationalists should be harder to hack than others.

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

The myths carried by the missionaries should be invented for the challenge so everyone can agree that they are false, but should, of course, be significantly more plausible than today's religions.

JGWeissman01 April 2009 06:03:38AM* 2 points [-]

"crack a rationalist" made me think of the AI-Box Experiment ("http://yudkowsky.net/singularity/aibox") Maybe a rationality test could be something like how long the subject lasts as the gatekeeper before letting the AI out.

gwern01 April 2009 03:59:31PM1 point [-]

What ciphergoth said. Also, we can't derive an 'ought' from an 'is' - we don't actually know whether letting the AI out is the right thing to do (unless the contest had a stipulation that the AI was evil and the box keeper knew it, which I don't remember being the case). Perhaps the rational thing is to let the AI out!

Further, this could also just be a test of stubbornness or patience. Which aren't neither of them rationality. But good try anyway.

JGWeissman02 April 2009 12:26:58AM2 points [-]

For the first objection, that the AI Box experiment has too many unknowns, let us instead construct an argument based on psychological tricks for any bad conclusion to try on the subject.

For the second objection, that this tests stubbornness rather than rationality, use a sequence of tests, some using tricks to argue for false conclusions, and some using Bayesian evidence for a good conclusion. The score should reward being convinced when, and only when, the subject should be convinced. Stubbornness can only meet half this requirement.

The task of compiling arguments of both types, which would not be readily available to the subject ahead of time, remains.

ciphergoth01 April 2009 08:23:16AM3 points [-]

The means by which EY persuades people to let the AI out of the box are secret. We shouldn't draw any conclusions from that experiment except that it is plausible to think a boxed AI could talk its way out of the box.

MichaelHoward15 March 2009 07:57:24PM8 points [-]

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

Moral qualms aside, we should probably have a back-up plan just in case we don't solve human uploading before we want to start testing.

Comment deleted 15 March 2009 09:55:37PM[-]
MBlume15 March 2009 10:19:03PM14 points [-]

I'll be honest -- my life has taken a sharp downturn since I deconverted. My theist girlfriend, with whom I was very much in love, couldn't deal with this change in me, and after six months of painful vacillation, she left me for a co-worker. That was another six months ago, and I have been heartbroken, miserable, unfocused, and extremely ineffective since.

Perhaps this is an example of the valley of bad rationality of which PhilGoetz spoke, but I still hold my current situation higher in my preference ranking than happiness with false beliefs.

Eliezer_Yudkowsky15 March 2009 11:17:04PM4 points [-]

You have my sympathy and my praise.

If anyone's unusually good at deconversions, there might be a market for deconversion attempts aimed at the friends and family of atheists.

MBlume16 March 2009 09:40:54AM* 11 points [-]

Thank you. You taught me (a large chunk of) everything I know, so that means a lot.

Honestly, thinking back, I suspect the best opportunity I ever had to deconvert her was when I myself did not yet identify as atheist -- when the crisis of faith was still in full swing. I'd have been perceived as sharing my doubts, rather than as "attacking" her with arguments.

Of course, back then I feared atheism -- I saw it as something terrible happening to me, that I should avoid doing to her. If I'd done a better job of leaving a line of retreat, I might have made better choices -- I might have shared each doubt as it occurred to me, instead of winding up 30 inferential steps removed from the woman I loved.

(And no, explaining that there is an inferential distance between you greater than is likely to be encountered in the ancestral environment really does not help in a fight)

I've been thinking lately of trying to write something addressed specifically to those beginning to question their religions. Life doesn't come with save points, but standing at the spot you went wrong, calling out advice to passers-by seems like the next best thing.

PaulWright15 March 2009 11:54:56PM* 3 points [-]

My empathies: that happened to me about 6 years ago (though thankfully without as much visible vacillation).

My sister, who had some Cognitive Behaviour Therapy training, reminded me that relationships are forming and breaking all the time, and given I wasn't unattractive and hadn't retreated into monastic seclusion, it wasn't rational to think I'd be alone for the rest of my life (she turned out to be right). That was helpful at the times when my feelings hadn't completely got the better of me. I suppose we can be haunted by stuff that is real.

MBlume21 March 2009 04:40:34AM* 1 point [-]

Thank you. I've been struggling with that haunting myself. I think part of the problem is that when you're in a relationship long enough, you wind up with a term in your utility function for that person. And even if you know you could wind up with someone objectively better, better suited, the outcome doesn't seem like good news to your mind. A job for self-modification, I suppose, even if it's the slow, manual kind.

Very glad to hear she was right =)

Comment deleted 15 March 2009 05:58:15PM* [-]
MichaelBishop15 March 2009 06:29:36PM2 points [-]

You're thinking of Asch's experiments. Apparently, they are widely misrepresented: http://webpage.pace.edu/yrafferty/Yvonne/AschConformityStudy.pdf See also: http://www.hss.caltech.edu/~jkg/Conformity.pdf (I don't remember where I found these... possibly through OB)

Comment deleted 15 March 2009 06:42:28PM* [-]
MichaelHoward15 March 2009 06:52:22PM3 points [-]

Since my comment has been downvoted to 0, I assume that the LW community likes...

Hasty generalization/Belief in the law of small numbers

MichaelHoward15 March 2009 06:17:39PM* 1 point [-]

...and here's the video (the one in the OB link is dead).

CarlShulman15 March 2009 06:13:25PM1 point [-]
infotropism15 March 2009 09:52:10PM2 points [-]

Give them a motivation that is higher than the drive to game the test. I'm an immortalist. I don't want to die. I could deceive myself and others in many ways about my skills, purposes, beliefs, but in the end I can't do that at the expense of my chances of not dying. Finding a similarly important purpose, something that might even be gamed, but for which gaming means you loose. Some real life test.

Maybe, measuring someone's capability to win. I have often wondered if being rational correlates with being succesful in society. I can't be sure, though it seems to be it should, if it doesn't then I suppose it either means there's a problem with a rationality that would leave you worse off, or more likely, that you aren't being rational enough, or do not have enough mental ressources to use that rationality to make a difference. Bounded rationality, always an issue.

Capability to win could be measured in many ways, economical success for instance, or any other existing societal position of power or prestige. Of course any single of those may be gamed, but it's ok to cheat, if cheating brings you closer to what you want, then it is rational to game. However, the goal that you hold, and for which you are vying, may not be very interesting. Empty fame, etc.

It would be best to have a personal goal set, and known, and measure how a person fares as to that goal; a goal difficult enough to require the proper use rationality to win in society, that would require to apply rationality to a very large and diverse bunch of situations, a goal that you'd want to preserve.

Can't help much to determine what that would be, I have my own thing to protect, as I said, not sure what it might be for other people. It doesn't work all the time either. Sometimes short time goals are vying for dominance over my actions, and I'll give in to them, even if it means getting farther from my own personal long term goal. That's a lack of willpower, not a lack of rationality at work there I think.

CarlShulman15 March 2009 09:42:16PM2 points [-]

Experimental methods for measuring rationality can be converted into organizational tools through the measurement of biological traits that are minimally malleable. For instance, you could map genomic and brain structure information to experimental tests of particular biases or bias-promoting traits, and then use those biological markers as ungameable indicators. Unfortunately, while this could help organizations get more rational employees (possibly deriving economies of scale), it would be much less useful for measuring improvement.

swestrup15 March 2009 09:11:29PM2 points [-]

A friend of mine, the most consistently rational person I know of, once told me that his major criteria for whether a piece of information is useful is if it can allow him to forget multiple other pieces of information, because they are now derivable from his corpus of information, given this new fact.

I have a vague feeling that there should be a useful test of rationality based on this. Some sort of information modeling test whereby one is given a complex set of interrelated but random data, and a randomly-generated data-expression language. Scoring is based on how close to optimal once gets on writing a generator for the given data in the given language.

Unfortunately, I think this is someone one could explicitly train for, and someone with knowledge of data compression theory would probably be at an advantage.

Comment deleted 15 March 2009 09:27:49PM[-]
swestrup31 March 2009 06:22:24PM2 points [-]

I'm only now replying to this, since I've only just figured out what it was that I was groping for in the above.

The important thing is not compression, but integration of new knowledge so that it affects future cognition, and future behaviour. The ability to change one's methodologies and approaches based on new knowledge would seem to be key to rationality. The more subtle the influence (ie, a new bit of math changes how you approach buying meat at the supermarket) then the better the evidence for deep integration of new knowledge.

Vladimir_Golovin15 March 2009 09:39:55PM* 3 points [-]

Yes, "not equals", but compression is necessary for reality-mapping, which is one of the key components of rationality as defined at the beginning of this post. There's a great quote on this:

“We can take this huge universe, and put it inside a very tiny head -- you fold it.”

Comment deleted 15 March 2009 09:44:20PM* [-]
Eliezer_Yudkowsky15 March 2009 10:11:59PM2 points [-]

That's, um, hardly my own innovation...

Stefan_King15 March 2009 08:36:00PM* 2 points [-]

A humble attempt at a possible rationality dojo:

Start with a varied battery of about 20 tests of some intellectual ability. Include things like:

Turn-based logic games, debating contests, predicting the next part of a graph of a real (but unknown) quantity with a confidence level, predicting the next thing someone will do from a short movie, field dependency (personality test), writing a text that automatically scored on readability and persuasiveness, etc.

The games must have unstable worlds, with changing incentives and probabilities, which require constant hypothesis testing of the environmental conditions. The advantage of games, writing and prediction is that you can administer them online. Even the debating test could be a chat.

Have a sample of people take all the tests, and compute the percentile scores. Then check which tests correlate heavily in how they rank the participants.

Construct a game that is an aggregate of the best few correlating tests. This game is a test of someones general rationality. Set up a chess-like website that tracks the rating (percentile score) of each player. This guarantees visible levels in awesome.

Keep track which quickly administered test correlates well with the game. This can be used to predict general rationality outside of the game.

To keep it ecologically valid, you would have to check if game rating correlates with positive life outcomes. You also need to check of increased performance on the test correlates with improving real-life outcomes, like income and happiness.

I have some more ideas about the details of candidate logic games. In short, unpredictable and changing values of fixed, known dimensions reward players to monitor the risk-taking. A player would learn when to be cautious and when to be audacious, which is probably a big part of real-world rationality.

patrissimo21 March 2009 10:16:37PM1 point [-]

I like the idea of using games, but I worry that people would learn to get good at the specific games or game-space, especially if there are few of them. Specializing in a certain logic puzzle != being rational. Also there is the issue others mentioned that performance under stress is a big part of rationality.

MichaelHoward15 March 2009 08:12:32PM2 points [-]

Vladimir Gritsenko mentioned Rational Debating on an old post. It looks like it would be a useful addition to the list.

steven046117 March 2009 03:04:35PM4 points [-]

As the post mentions, RD participants have an incentive to argue dishonestly. They also have little incentive to say anything informative at all. To solve this, I'd propose Paranoid Debating: everyone is scored on the correctness of a team estimate, except for one participant who's secretly designated an Advocate and one participant who's secretly designated a Naysayer. The Advocate gets more points for higher team estimates and the Naysayer gets more points for lower team estimates. Variants: give points for figuring out who the A and N are, or let it be known publicly.

Kaj_Sotala15 March 2009 06:00:26PM* 3 points [-]

Hmm. Some off the top of my head:

  • Look for studies that have recognized a certain bias, then use that information to come up with reasoning problems where the participants have to reach the correct answer without falling prey to the biases. Somewhat vulnerable to people studying to beat the test, though can potentially be defeated by creatively combining several different biases and applying them into new situations. Downside: coming up with lots of different scenarios where one may fall victim to biases is a lot of work. Perhaps come up with suitable computer games where success depends on avoiding biased behavior, and the scenarios can be automatically generated?
  • Calibration tests. These could be auto-generated, drawing on a far wider field of information than the current ones.
  • As the above two, but subjects are forced to write down their reasoning. This may be more helpful in making them reflect more on their reasoning, than for actual verification - somebody's train of thought can be very hard to interpet, since they'll never write down everything that influenced their decision.
ciphergoth15 March 2009 10:02:19PM* 1 point [-]

Somewhat vulnerable to people studying to beat the test

If the test is, say, a battery of experiments already performed that demonstrate the existence of various well-known cognitive biases, most people could not study to beat the test without improving your rationality to a significant extent if they tried.

Comment deleted 15 March 2009 06:02:05PM[-]
Kaj_Sotala15 March 2009 06:07:54PM1 point [-]

I'm not sure I understand this comment.

Comment deleted 15 March 2009 06:29:26PM* [-]
Eliezer_Yudkowsky15 March 2009 07:15:34PM3 points [-]

Use Bayes-score (log of final joint probability) as primary outcome, measure calibration only secondarily.

Comment deleted 15 March 2009 07:25:23PM* [-]
MBlume15 March 2009 07:40:52PM* 4 points [-]

you maximize Bayes Score iff you use all your knowledge as well as possible. This seems to indicate that any perturbation will introduce an incentive not to do so.

Ask completely ridiculous things. Estimate the probability that the yearly rainfall in Ghana exceeds that of Switzerland. Ask questions like that, and you will learn something about how much true general knowledge a person has gained (and why not -- a rationalist should absorb more true general knowledge in X years on earth than a non-rationalist), but much more about the subject's ability to honestly estimate their own ignorance.

Comment deleted 15 March 2009 07:50:03PM[-]
Eliezer_Yudkowsky15 March 2009 08:21:33PM3 points [-]

The test would also work statistically to measure the effect of an intervention, if you had more subjects than variance. A test with too much variance can't be organizational, but it can be experimental.

MBlume15 March 2009 08:01:06PM3 points [-]

If you are asked about pokemon, AI design, 13th century chinese history, martian geology, german literature, Yankees batting averages, lyrics to popular songs from the 1820s, etc. you would be forced to get maximal mileage out of whatever knowledge you can bring to bear on each question, which would in most cases be slim to none.

If the questions are chosen randomly and eclectically enough, there should be no way to game the system, and scores should average out for people knowledgeable in different areas.

If you dependably know more than I do across a broad spectrum of subject areas, then I would assume that you have learned more than I have during your life so far, which seems to me to be symptomatic of good rationality.

Eliezer_Yudkowsky15 March 2009 07:26:01PM1 point [-]

Then use more obscure questions.

daedalus2u26 July 2010 04:08:21PM* 0 points [-]

Test for data, factual knowledge and counterfactual knowledge. True rationalists will have less counterfactual knowledge than non-rationalists because they will have filtered it out. Non-rationalits will have more false data because their counterfactual knowledge will feedback and cause them to believe things that are false are actually true. For example that Iraq or Iran was involved in 9/11.

What you really want to measure is the relative proportion of factual and counterfactual knowledge someone has, and in what particular areas. Then including areas like religion, medicine, alternative medicine, and politics in the testing space is advantageous because then you can see where the idea space is that the individuals are most non-rational in.

This can be tricky because many individuals are extremely invested in their counterfactual knowledge and will object to it being identified as counterfactual. A lot of fad-driven science is based on counterfactual knowledge, but the faddists don't want to acknowledge that.

A way to test this would be to see how well people can differentiate correct facts (data) from factual knowledge (based on and consistent with only data) from counterfactual knowledge (based on false facts and not consistent with correct facts) from opinion consistent with facts or opinion consistent with false facts.

An example: in the neurodegenerative disease of Alzheimer's, there is the association of the accumulation of amyloid with dementia. It remains not established if amyloid is a cause, or an effect or is merely associated with dementia. However there have been studies where amyloid has been removed via vaccination against amyloid and a clearing of amyloid by the immune system with no improvement.

I imagine a list of a very large number of statements to be labeled as 1.true (>99% likelihood) 2.false (>99% likelihood to be false) [edited to improve definition of false] 3.opinion based on true facts 4.opinion based on false ideas 5.no one knows 6.I don't know

A list of some examples

Iraq caused 9/11 2 WMD were found in Iraq 2 Amyloid is found in Alzheimer's 1 Amyloid causes Alzheimer's 2 (this happens to be a field I am
working in so I have non-public knowledge as to the real cause) Greenhouse gases are causing GW 1 Vaccines cause autism 2 Acupuncture is a placebo 1 There is life on Mars 5

You don't want to test for obscure things, you want to test for common things that are believed but which are wrong. I think you also want to explicitly tell people that you are testing them for rationality, so they can put themselves into “rational-mode” (a state that is not always socially acceptable).

The table-like lists look fine in the edit box but not fine once I post. :(

arundelo26 July 2010 10:28:37PM0 points [-]

http://daringfireball.net/projects/markdown/syntax

I'm not sure what effect you're     !
going for, but indenting by four    !
spaces allows you to do things like !
this.                               !
daedalus2u26 July 2010 11:15:44PM0 points [-]

Thanks, I was trying to make a list, maybe I will figure it out. I just joined and am trying to focus on getting up to speed on the ideas, the syntax of formating things is more difficult for me and less rewarding.

Mario15 March 2009 05:58:55PM3 points [-]

I get the feeling that the real problem here is repeatability. It's one thing to design a test for rationality, it's another to design a test that could not be gamed once the particulars are known. Since it probably isn't possible to control the flow of information in that way, the next-best option might be to design a test so that the testing criteria would not be understood except by those who pass.

I'm thinking of a test I heard about years ago. The teacher passes out the test, stressing to the students to read the instructions before beginning. The instructions specify that the answer to every question is C. The actual questions on the test don't matter, of course, but it's a great test of reading comprehension and the ability to follow instructions. Plus, the test is completely repeatable. All of the test questions could leak out, and still only those who deserve to pass would do so. If you are willing to assume that people who pass would not be willing to cheat (unlikely in this test, possible in a rationality test), then you would have an ungameable test.

A rationality test in this model might be one where an impossible task is given, and the correct response would be to not play.

HA215 March 2009 09:01:00PM3 points [-]

I don't think that it's reasonable to expect that secret criteria would stay secret once such a test would actually be used for anything. Sure, it could be kept a secret if there were a dozen people taking the test, of which the four who passed would get admitted to an exclusive club.

If there were ten thousand people taking the test, a thousand of which passed, I'd bet there'd be at least one who accidentally leaks it on the internet, from where it would immediately become public knowledge. (And at least a dozen who would willingly give up the answer if offered money for it, as would happen if there were anything at stake in this test.) It might work if such a test is obscure enough or not widely used, but not if it was used for anything that mattered to the test-takers and was open to many.

Mario15 March 2009 10:03:17PM1 point [-]

True, but I think that would be a problem with any test. I'm just trying to find a way around it since I think that as you add ways to avoid gaming, you both complicate and weaken the test. Perhaps a solution would be to test people without their knowledge, and reveal whether they succeeded or not at a later date.

MBlume15 March 2009 07:35:33PM2 points [-]

A rationality test in this model might be one where an impossible task is given, and the correct response would be to not play.

Kobayashi Maru?

MichaelHoward15 March 2009 07:48:30PM3 points [-]
MBlume21 March 2009 04:44:09AM* 1 point [-]

Well, only because the computer's search tree didn't include the "teleport giant psychic squid" action ;)

(spoilers behind link)

Comment deleted 15 March 2009 06:20:15PM[-]
Mario15 March 2009 06:39:21PM1 point [-]

Yes. I wasn't offering that particular formulation as a rationality test, just the idea that you should hide from the testee the nature of the test.

Comment deleted 15 March 2009 08:10:36PM[-]
MBlume15 March 2009 08:15:03PM1 point [-]

you're assuming LW karma is itself a good test of rationality...

gjm15 March 2009 08:39:03PM1 point [-]

Which Marshall is on record as not believing, so I guess he's poking fun here.

Yvain15 March 2009 07:42:07PM2 points [-]

Here is a stupid one: Detective stories. Like Encyclopedia Brown, but subtler. And with false leads. I don't think normal mass-market detective stories would work, because they may try to deliberately choose an irrational answer to surprise you. But special ones written by rationalists for rationalists could be a fun distraction if nothing else.

rwallace15 March 2009 08:25:49PM2 points [-]

That still has the problem that it doesn't test for lack of bias, but for having bias that matches that of the people who wrote the stories. I suggest instead using real cases - and not taken from the media, because that means selection bias, but taking all the cases from the files of a particular police department during a particular span of time.

Eliezer_Yudkowsky15 March 2009 08:27:11PM1 point [-]

What true stories can we use besides police cases? (Also, note that in this case you're only testing for being as smart as the police or making the same judgments as the jury - even using cases with a confession may get you false confessions.)

CarlShulman15 March 2009 09:10:58PM4 points [-]

You can take cases with enough evidence to overdetermine the result, and then subtract pieces.

rwallace15 March 2009 08:39:27PM* 2 points [-]

Point. Still, we've been recording lots of different kinds of events for a long time. Off the top of my head, other kinds of historical data that could be useful here:

Medical cases, minor scientific controversies, engineering projects, battles, the stock market, markets in general, expeditions.

CannibalSmith15 March 2009 07:13:45PM2 points [-]

Role play. Build a corpus of fictional scenarios too big to memorize and present a random subset in the test.

Also, standard tests on rationality lore and mathematics would work to a degree because they're correlated with actual rationality.

Eliezer_Yudkowsky15 March 2009 07:17:13PM1 point [-]

If they're fictional scenarios, then you're matching the taste of the students (in fictional answers) against the taste of the teachers; that may work to propagate a school, but how do you keep the tastes real?

Yvain15 March 2009 10:02:57PM* 4 points [-]

Then use scenarios that actually happened. From history, business, people's personal lives, et cetera. For example: "Here is a brief description of the Byzantine Empire in 1200. The Emperor decided to change the tax policy in the following way. Predict what happened." Gives an unfair advantage to anyone who knows a lot of history (or in this case economics), but if you vary the cases enough and use little-known enough examples you might be able to control for that.

Another example: "Here's a psych profile of my friend John, and a psych profile of his girlfriend Sally. They started dating ten years ago. Predict what happened."

CannibalSmith15 March 2009 09:07:35PM1 point [-]

I'm sorry, I don't understand your question.

Eliezer_Yudkowsky15 March 2009 09:15:01PM3 points [-]

The "right answer" in the fictional scenarios is determined by the teacher. So you're testing the degree to which the student matches the teacher, not the degree to which the student matches reality.

Lightwave16 March 2009 02:27:34PM* 1 point [-]

How can you be sure that in the historical scenario, the Byzantine Emperor actually did the "right thing", i.e. he wouldn't have done better by doing something else? It's the teachers who have to decide that. Also, what if the Emperor got the "right answer" for the wrong reasons, and the student also got the "right answer" for the wrong reasons? It's up to the teacher to decide that as well. The best thing you can do is have several groups of rationalists selecting the scenarios and verifying the students' answers, but ultimately, when using either real life or fictional scenarios, you're comparing the teachers to the students.

Same thing with measuring "success" of people in real life. They could've arrived at the correct answer for the wrong reasons, it's up to the teachers to decide whether the reasons were right or wrong, i.e. whether they were actually rational or just lucky.

In order to assess the rationality of the students you need to use the sort of things/tests that convinced you that the teachers are rational in the first place. The same things that make the teacher's tastes real can be matched against the student's tastes.

vizikahn15 March 2009 06:57:19PM1 point [-]

What we need is a rationality equivalent of a katana or a machine gun. One for each student, some basic training and even ninja masters go down pretty quickly (unless they really can dodge bullets). Occupatio "weapon of mass rationality".

Tom_Talbot17 March 2009 05:05:59PM2 points [-]

Perhaps the notion of an 'art of rationality' is completely misguided. Why are we relying on the skills of individual people who evolved to be irrational when systems can be built for the purpose of giving rational answers? Why walk to the answer when you can drive?

MichaelHoward17 March 2009 07:36:19PM1 point [-]

Following this analogy, you still need to people to get good at driving, at choosing between vehicles, building vehicles, knowing when a particular vehicle is or isn't appropriate and not driving the school bus when drunk.

Some spectacular crashes have been caused by driving systems built for the purpose of giving rational answers without due care and attention.

Johnicholas16 March 2009 01:39:11AM* 2 points [-]

Software tools for rationality, decision support systems, might very well be more valuable than extensive personal training in rationality.

swestrup15 March 2009 09:38:40PM1 point [-]

This has been voted into the negatives, but I'm not sure its so basically bad as an idea. If we can set up a system where all of the students, teachers, and any other staff, are all in continuous rationality competitions with each other, then this would quickly cause one to hone their skills.

For example, maybe the teacher of a class is chosen from within a class and has to fight (metaphorically) to maintain that position. Maybe the choice of whether you are teacher, student, principal, cafeteria cook, or janitor depends on the outcomes of numerous rationality contests between members.

And note that I don't necessarily mean that cafeteria cook or janitor would be positions that go to the losers...

biochem0692115 March 2009 05:41:27PM* 1 point [-]

Like R.A.W. has said, "The more you see yourself acting like a cosmic smuck thus less of a cosmic smuck you will become". I think it is very important that the environment stresses awareness of moment to moment actions and thoughts. If not, I think decent application of the knowledge of rationality will be very hard indeed.

If this is an important aspect of your 'school', then I think it would be hard to game the system without actually learning what is supposed to be learned. This would especially be true when it is a part of the reputation heirarchy. Sure, some could mimic to gain status but others with actual awareness would see through them easily.

Stefan_King15 March 2009 10:25:24PM* 0 points [-]

Some stupid rationality metrics:

How long you can sit on a chair in an empty room.

Marrying age (later is better).

The total number of words published.

Degree of equality of percentage of income spent on books and percentage of income spend on club memberships.

The average academic achievement of your 5 best friends.

How often you stop speaking before you are interrupted.

Total number of dentist visits divided by total number of treated caveties.

John_Maxwell_IV17 March 2009 04:38:55AM2 points [-]

Degree of equality of percentage of income spent on books and percentage of income spend on club memberships.

I'm pretty sure Rational Man never buys a book he can borrow for free from the local library.

MBlume17 March 2009 06:20:25AM3 points [-]

I certainly don't mean to refer to myself as a candidate for Rational Man, but I do like owning books. Especially textbooks, I would not want to go down to the library every time I wanted to go through my copy of Sakurai. But even old favorite novels, it's good to have them on the shelf, ready to throw in a saddlebag at a moment's notice before a long train ride.

MichaelVassar16 March 2009 05:16:56AM5 points [-]

I rate fairly poorly by these metrics. That makes me suspect that people like me also do. I see that this comment has been poorly rated and hope that people haven't rated it poorly for being unflattering. If you have done this, please rate it back up, OK.

beriukay26 July 2010 01:20:47PM* 0 points [-]

I know of some other stupid tests for rationality, borrowed happily from Invader Zim. 1. Absorbency 2. Electrical Conductivity 3. Something involving a beaver and a toy taxi.

On a less stupid note: Reputationally, I have an explicit agreement with one of my friends that we fact check each other. This was actually a one-way fact checking until fairly recently when he asked me why I didn't call him on something he later realized was total bullcrap. Note, this works best if you actually have a good memory and aren't pickling your brain with alcohol. It also seems to help check the mindkilling effects of disagreement.

A long time ago, I was reading about critical thinking, and was presented a relatively short list of questions to try and use to stimulate critical thought. Questions of this nature could be used in some form of standardized test; or could be used to build a portfolio of rationale behind opinions on all manner of things, which could be graded by peers or instructors (preferably ones who also aspire to rationality, and disagree). I suppose the portfolio would be more organizational than experimental, and almost as easy to game as cheating on essays. But those were my main thoughts before reading the cool ideas other people came up with.

In case you're interested, this was the list as I transcribed it:

What do you mean by _ ? How did you come to that conclusion? What is the source of your information? What is the source of their (opponents') source of information? What assumptions led you to that conclusion? Suppose you are wrong. What implications are there? Why did you make your inference? Is there another inference more consistent with the data? Why is this issue significant? How do I know what you say is true? What is an alternate explanation for this phenomenon?

Oh, and after reading the Logic of Failure, maybe running simulations like they did with the Sim City-like vibe, or the optimizing bug population or the refrigeration tests could be instructive. Even after learning about them, (especially the city planning and the African tribe) they may be sufficiently complicated to be of experimental or organizational value. On the other hand, they may turn out to be just as useless as chess for testing rationality if success strategies are posted and shared. Maybe some of the sims could have randomly assigned (Kirk resistant) Kobayashi Maru modes, but then I don't see how a predetermined loss would be very instructive unless the player didn't know it was rigged---and even then, only to illustrate Eliezer's point that even if you do everything right, you can still fail.