I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows. I have a sense that more is possible.
The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.
I suggest stratifying verification methods into 3 levels of usefulness:
- Reputational
- Experimental
- Organizational
If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur. The same would go if your school regularly competed against other schools. You'd be keepin' it real.
Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters. Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers. In this latter class we can also place the splintered schools of psychoanalysis.
So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.
But that doesn't yet get you a science. A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results. Experiments have to be replicable and replicated. This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.
The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness. And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.
But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund? Then you're going to need some test that's harder to game than just asking someone "How happy are you?"
This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society. If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential? If you give colleges the power to grant degrees, then do they have an incentive not to fail people? (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.) If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?
If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose. Colleges turn into tests of whether you can endure the classes. High schools do nothing but teach to statewide tests. Hedge funds sell puts to boost their returns.
On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect. So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?
(Added: Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance. But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)
So I now put to you the question—how do you verify rationality skills? At any of the three levels? Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics. Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method). Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.
Reputational, experimental, organizational:
- Something the masters and schools can do to keep it real (realistically real);
- Something you can do to measure each of a hundred students;
- Something you could use as a test even if people have an incentive to game it.
Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish. This is one of the Big Important Foundational Questions, so—
Think!
(PS: And ponder on your own before you look at the other comments; we need breadth of coverage here.)
How about a test that causes people to build and use mental models and formulas? People are asked to estimate primarily numeric facts based on other facts. In each question, give people a set of "measured facts"* and ask people to estimate more relevant facts/consequences via back-of-envelope calculations (or a computer program, for more precision). But unlike a normal math word problem, set up the test so that, say, 2/3 of the questions cannot be accurately estimated with only the information given. Among that 2/3, half can be accurately estimated by adding some common-sense info (e.g. that most people work about 40 hours a week, that life expectancy is about 80 years, that almost half of American voters vote Republican, etc.), and the other half require more esoteric information that test-takers will rarely have. For all the questions, test-takers need to build a simple mental model or formula that would allow them to do the calculation, state any information they need that is missing, and try briefly to look up the info online in order to compute a reasonable estimate. If they can't do this, they need to express the answer in terms of unknown variables and then guess what the values of the variables are. They must also state relevant assumptions.
This is a means both to improve rationality as well as test it.
Example question set:
Background: in some types of accidents at some types of nuclear plants, radioactive substances can be released into the atmosphere (radioactive substances emit ionizing radiation). It is medically plausible that there is no perfectly safe dose of ionizing radiation in human tissue, and that radiation damage to DNA is cumulative, because cells repair some DNA damage very slowly, or never, and this damage can lead to cancer years after radiation exposure. This is known as the linear no-threshold hypothesis: that the health risk is proportional to exposure and there is no safe dose. If residents are promptly evacuated during an accident, the primary risk to their health upon returning will be from long-term exposure to radioactive cesium, which mainly causes a type of cancer called non-CLL leukemia.**
• A metastudy reports that the excess relative risk (ERR) of non-CLL leukemia from 100 mGy of radiation is about 19% (this means that people get non-CLL leukemia 19% more often than normal).**
• The normal rate of leukemia in the U.S. is about 14 diagnoses per 100,000 people per year. About 1.5% of people are diagnosed with leukemia at some point in their lifetime
• The normal death rate of leukemia is about 6.4 per 100,000 people per year in the U.S.
• One third of leukemia cases are CLL leukemia cases.
• Another study estimates that in the U.S. there are about 16,000 excess deaths annually due to electricity generation emissions, which is a low rate compared to some developing countries. The researchers estimate that 91% of these deaths were the result of emissions from coal-fired power plants.
• There are 328 million people in the U.S. and 7.5 billion in the world
• About 65% of all electricity worldwide is produced by burning fossil fuels. About 10% of electricity is from nuclear plants and 38.3% is from coal.
• Assume two-thirds of cancer cases and deaths from a nuclear accident occur outside the city where the accident occurred***
Scenario: suppose that another nuclear accident were to happen, one somewhat more serious than Fukushima, inside a city of one million people, in a developed country. Suppose that all evacuated persons return to their homes after one month and, as a result, are exposed to 100 mGy of radiation on average, mostly from cesium. Assume that half of this radiation dose occurs in the first 10 years and that most of it has occurred within 40 years***.
Questions:
1. Estimate the chance that the radiation will cause non-CLL leukemia in a particular, random person in the city at some point in their lives.
2. Estimate the chance that the radiation will kill a particular, random person in the city after they move back.
3. Estimate the total number of non-CLL leukemia cases caused by the radiation (over 40+ years).
4. Estimate the total number of people that will die as a result of the radiation (over 40+ years).
5. Assume that all nuclear accidents worldwide, combined, cause this number of deaths once every 20 years (e.g. in a 20-year period there might be two accidents, each half as serious as this one). What is the expected number of deaths per year in a randomly selected city of about one million people?
6. Estimate the number of excess deaths caused by power generation in that same city (i) per year, and (ii) over a 40-year period, if all its electricity came from fossil fuels instead of the nuclear plant.
7. Brainstorm additional factors that might change your estimates above.
8. Brainstorm other considerations that would be relevant to evaluating safety of nuclear power compared to alternatives.
Example answers:
1. Assumptions: All people have lives of average length (80 years). Age distribution in the city is uniform from 0 to 80. Leukemia risk is elevated uniformly after exposure for the rest of the person's life. All developed countries have similar leukemia rates. Leukemia is diagnosed very soon after it develops. Leukemia risk does not vary by age (this is not true, but on the other hand, I question whether it was appropriate for the metastudy to use ERR instead of excess absolute risk (EAR)). Radiation exposure probably drops off mostly according to cesium's half-life, but to simplify the calculation, assume 50% of the 100 mGy dose is delivered linearly in the first 10 years and the other 50% linearly over the following 30 years.
• Normal non-CLL leukemia risk is 14*2/3 = 9.333 per 100,000 per year
• A random person has on average 40 years of life left (50% of an 80-year lifetime)
• Excess risk of non-CLL leukemia is 19%, so 9.333*0.19 = 1.773 per 100,000 once the full dose happens.
• But there's a long delay before reaching the full dose... integrating over my approximate exposure function, average excess incidence should average 1.773/2/2 per 100,000 in the first 10 years and 1.773*0.75 over the next 30. Neglecting young and old people to simplify the calculation, the lifetime risk is about 1.773*0.25*10 + 1.773*0.75*30 = 44.3 per 100,000 over 40 years, so the lifetime risk is about 0.0443%, or 1 in 2260.
Fun fact 1: Before writing this LessWrong post, I did a calculation like this to learn about the risks of radiation, because I couldn't find any research estimating what I wanted to know. Radiation risks seem to be among the world's best-kept secrets. I'd rather see a peer-reviewed paper answer "how likely is X amount of radiation to kill me" than rely on my "napkin" model, but I haven't found any such research.
Fun fact 2: the answer increases if your starting point is "1.5% of people are diagnosed with leukemia at some point in their lifetime" since "14 per 100,000 people per year" only adds up to 1.12% per 80-year lifetime. I don't know why these numbers don't match up.
Fun fact 3: I should really be using a simple (Monte Carlo) computer model for this with exponential decay of radiation exposure... no idea if it would raise or lower my estimate.
2. (Further) Assumptions: Non-CLL leukemia is the only cause of death driven by radiation. Years of life left after the first cell turns cancerous is negligible. Probably both assumptions are significantly wrong, but the first assumption underestimates deaths and the second overestimates them so it seems like a wash.
• 6.4/14 = 45.7% of cases are fatal so the risk is 0.0443%*0.457 = 0.0202% or 1 in 4939.
3. Assumption: cancer screenings do not increase as a result of the accident (I'm sure this is wrong). There will be about 0.000443*1,000,000 = 443 excess cases in the city and about 487*3 = 1329 excess cases total
4. There will be about 1329*6.4/14 = about 607 excess deaths total
5. There will be 607/20 = 30.3 deaths worldwide per year from all nuclear accidents. Given a world population of 7.5 billion, that's about 0.004 deaths in a city of one million. The risk increases somewhat in cities that contain their own nuclear plant, if the plant is one of the more hazardous (read: old) models.
6. In a random U.S. city, the expected deaths per million in the U.S. from fossil fuels is 16'000/328=48.8 per year. (i) Assuming air pollution's effects are mainly local and 100% of power generation comes from fossil fuels, the expectation for a U.S. city is 16'000/328/0.65 = 75 deaths per year due to fossil fuels. (ii) which is 3001 deaths over a 40-year period (4.5x higher than the nuclear meltdown scenario).
7.
• Increased screening due to concern about the risk will increase the rate of cancer diagnoses, but not rates of cancer, and cancer death rates may be reduced by early detection.
• Radiation could cause other types of cancer deaths (I heard, for example, that short-lived iodine isotopes can cause thyroid cancer, but that this can be mitigated with iodine pills).
• Etc.: I'm getting tired but you get the idea
8.
• Regulations passed after Post-Three-Mile-Island probably increase safety a lot in newer reactors (but make new plants cost-prohibitive to certify and build)
• Nuclear waste increases long-term risk (but less than most people think, I would add)
• It has been suggested that terrorists could steal nuclear fuel and build a bomb with it. (I don't know if this is remotely plausible, but I do know that reactor-grade uranium is not directly usable in a bomb.)
• Deaths during plant construction and related mining should be similar between nuclear and fossil fuel plants; solar plant construction seems like it should be safer than nuclear, oil/coal, and wind.
• Though deaths from fossil fuels are more numerous, each death is expected to be less bad because it should happen near the end of a person's life due to many years of lung damage, whereas in the nuclear case, some young people will be affected. It's strange to me that fossil fuel deaths are not measured as "years of life lost" instead.
* The "facts" can be real or based on back-of-envelope calculations, but the test-taker is to assume the information is factual. If it is not factual, and concerns the real world, it mustn't be excessively off-the-mark because humans can't simply erase misinformation from our minds so it's best not to intentionally mess with us.
** This is roughly correct AFAIK but I'm not an expert. Also, the metastudy strangely neglects to model time, e.g. it does not say that the risk is elevated for the rest of peoples lives, or that it is elevated for X years, or anything time-related like that. I don't see why risk would be elevated for life—if damage will cause a cell to turn cancerous, why would it wait 20 years to do so?—but conservatively this is my mental model anyway. I've seen a study that indicates 100 mGy is more than the average dose avoided by relocating residents of Fukushima; note also that mGy and mSv are the same SI units, so I don't understand the difference.
*** This datum is made-up as I haven't found information about it.
After going through this exercise I think the formulas need to be more explicit... really we should write a program for nontrivial models, e.g....
// TODO: turn into Monte Carlo simulation
let ExcessLifetimeChanceOfCancer = 0, BaseNonCLLLRisk = (14.0*2/3)/100'000, Dose = 0, InitialYearlyDose = ??
let CesiumHalfLifeYears = 30.17, YearlyDecayFactor = 0.5**(1/CesiumHalfLifeYears), ERRPermGy = 0.19/100
for year in 1..YearsOfLifeLeft {
. Dose += InitialYearlyDose
. InitialYearlyDose *= YearlyDecayFactor
. ExcessLifetimeChanceOfCancer += BaseNonCLLLRisk * ERRPermGy / 100
}
print(Dose) // TODO: pick initial dose so that total tends to be 100 or a bit less
print(ExcessLifetimeChanceOfCancer)
And also there would be need of numerous easier exercises than this one.
To make things more interesting, measure the pre-existing biases of the test-taker and then... give bonus points for assumptions and issues mentioned by the test-taker that are contrary to their own bias? e.g. if they are predisposed to be against nuclear power then a comment like "Regulations passed after Post-Three-Mile-Island probably increase safety a lot in newer reactors" would count in their favor, whereas if they are predisposed to be in favor of nuclear power, mentioning risks of nuclear waste would count in their favor. Also, correctly... (read more)