Why do you write "Flaws in mainstream science", if you mean specific parts of science only?
Some other mainstream areas have replication rates of more than 95%.
Interesting article, thanks.
I agree with the general concept. I would be a bit more careful in the conclusions, however:
No visible correlation does not mean no causation - it is just a strong hint. In the specific example, the hint comes from a single parameter - the lack of significant correlation between internet & overweight when both exercise categories are added; together with the significant correlation of internet usage with the other two parameters.
With the proposed diagram, I get:
p(Internet)=.141
p(not Internet)=.859
p(Overweight)=.209
p(not Overweight)=.791
p(Ex|Int & Ov)=.10
p(Ex|Int & no OV)=.62
p(Ex|no Int & Ov)=.27
p(Ex|no Int & no Ov)=.85
This model has 6 free parameters - the insignificant correlation between overweight and internet is the only constraint. It is true that other models have to be more complex to explain data, but we know that our world is not a small toy simulation - there are causal connections everywhere, the question is just "are they negligible or not?".
How can we quantify "difference to expectations of others" and include it in the score?
You're getting this from the "refinement" part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others' if you have better refinement (e.g. from "inside information", or from a superior methodology), even if everyone is identically (perfectly) calibrated.
This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.
ETA: what you say about my using an assumption is not correct - I've only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name - the math might become more complicated, is all. I'm not making any assumptions as to the probability distribution of the actual events. The math doesn't, either. It's quite general.
I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way. Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.
I've only been making the forecast well-specified
I think "impact here (10% confidence), no impact at that place (90% confidence)" is quite specific. It is a binary event.
Interesting, thanks, but not exactly what I looked for.
Help me understand what you're describing? Below is a stab at working out the math (I'm horrible at math, I have to laboriously work things out with a bc-like program, but I'm more confident in my grasp of the concepts).
The salient feature of your meteorite predictions is location. We can score these forecasts exactly as GJP scores multiple-choice forecasts, as long as they're well-specified. Let's refine "hit position X" to "within 10 miles of X". That translates to roughly a one in a million chance of calling the location correctly (surface area of the Earth divided by a 10-mile radius area is about 10 to the 6). We can make a similar calculation with respect to the probability that a meteorite hits at all; it comes out to roughly one per day on average, so we can simplify and assume exactly one hits every day.
So a forecast that "a meteorite will hit location X tomorrow at 10% confidence" is equivalent to dividing Earth into one million cells, each cell being one possible outcome in a multiple-outcome forecast, and putting 10% probability mass into one cell. Let's say you distribute the remaining probability evenly among the 999,999 remaining cells. We can now compute your Brier loss function, the sum of squared errors.
Either the meteorite hits X, and your score is .81 (the penalty for predicting an event at 10% confidence that turns out to happen), plus epsilon times one million minus one for the other cells. Or the meteorite hits a different cell, and your Brier score is 1.01 minus epsilon: 1 minus epsilon for hitting a cell that you had predicted would be hit at a probability close to 0, plus .01 for failing to hit X, plus epsilon for failing to hit the other cells.
So, over 100 such events, the expected value of your score ranges from 81 if you have laser-like accuracy, to 101 if you're just guessing at random. Intermediate values reflect intermediate accuracies. The range of scores is fairly narrow, because your probability mass isn't very concentrated - only a 10% bump on the "jackpot" cell, the rest spread around the surface of the earth.
If any of the above is wrong (math-wise) or stupid, or misrepresents your model, I'd appreciate knowing. :)
To calculate the Brier score, you used >your< assumption that meteorites have a 1 in a million chance to hit a specfic area. What about events without a natural way to get those assumptions?
Let's use another example:
Assume that I predict that neither Obama nor Romney will be elected with 95% confidence. If that prediction becomes true, it is amazing and indicates a high predictive power (especially if I make multiple similar predictions and most of them become true).
Assume that I predict that either Obama or Romney will be elected with 95% confidence. If that prediction becomes true, it is not surprising.
Where is the difference? The second event is expected by others. How can we quantify "difference to expectations of others" and include it in the score? Maybe with an additional weight - weight each prediction with the difference from the expectations of others (as mean of the log ratio or something like that).
Is there any established method
Yes: use a scoring rule to rate your predictions, giving you an overall evaluation of their quality. If you use, say, the Brier score, that admits decompositions into separate components, for instance "calibration" and "refinement"; if your "refinement" score was high on the lottery drawings, meaning that you'd assigned higher probabilities of winning to the people who did in fact win (as opposed to correctly calling the probabilities of winning overall), you'd be a suspect for game-rigging or psi powers. ;)
Interesting, thanks, but not exactly what I looked for. As an example, take a simplified lottery: 1 number is drawn out of 10. I can predict "number X will have a probability of 10%" 100 times in a row - this is correct, and will give a good score in all scoring rules. However, those predictions are not interesting.
If I make 100 predictions "a meteorite will hit position X tomorrow (10% confidence)" and 10% of them are correct, those predictions are very interesting - you would expect that I have some additional knowledge (for example, observed an approaching asteroid).
The difference between the examples is the quality of the predictions: Everybody can get correct (unbiased) 10%-predictions for the lottery, but getting enough evidence to make correct 10%-probabilities for asteroid impacts is hard - most predictions for those positions will be way lower.
I'm talking about probability estimates. The actual probability of what happened is 1, because it is what happened. However, we don't know what happened, that's why we make a probability estimate in the first place!
Forcing yourself to commit to only one of two possibilities in the real world (which is what all of these analogies are supposed to tie back to), when there are a lot of initially low probability possibilities that are initially ignored (and rightly so), seems incredibly foolish.
Also, your analogy doesn't fit brazil84's murder example. What evidence does the lottery win give that allows us to adjust our probability estimate for how the gun was fired? I'm not sure where you're going with that, at all.
The real probability of however the bullet was fired is 100%. All we've been talking about are our probability estimates based on the limited evidence we have. They are necessarily incomplete. If new evidence makes both of our hypotheses less likely, then it's probably smart to check and see if a third hypotheses is now feasible, where it wasn't before.
brazil84 stated that there are just two options, so let's stick to that example first.
"[rifle] no bullet will be find in or around the person's body 0.01% of the time" is contradictory evidence against the rifle (and for the handgun). But "[handgun] no bullet will be find in or around the person's body 0.001% of the time" is even stronger evidence against the handgun (and for the rifle). In total, we have some evidence for the rifle.
Now let's add a .001%-probability that it was not a gunshot wound - in this case, the probability to find no bullet is (close to) 100%. Rifle gets an initial probability of 60% and handgun gets 40% (+ rounding error).
So let's update: No gunshot: 0.001 -> 0.001 Rifle: 60 -> 0.006 Handgun: 40 -> 0.0004
Of course, the probability that one of those 3 happened has to be 1 (counting all guns as "handgun" or "rifle"), so let's convert that back to probabilities: 0.001+0.006+0.0004 = 0.0074 No gunshot: 0.001/0.0074=13.5% Rifle: 0.006/0.0074=81.1% Handgun: 0.0004/0.0074=5.4%
The rifle and handgun numbers increased the probability of a rifle shot, as the probability for "no gunshot" was very small. All numbers are our estimates, of course.
The probability of both, in that case, plummets, and you should start looking at other explanations. Like, say, that the victim was shot with a rifle at close range, which only leaves a bullet in the body 1% of the time (or whatever).
It might be true that, between two hypotheses one is now more likely to be true than the other, but the probability for both still dropped, and your confidence in your pet hypothesis should still drop right along with its probability of being correct.
So say you have hypothesis X at 60% confidence and hypotheses Y at 40% New evidence comes along that shifts your confidence of X down to 20%, and Y down to 35%. Y didn't just "win". Y is now even more likely to be wrong than it was before the new evidence came it. The only substantive difference is that now X is probably wrong too. If you notice, there's 45% probability there we haven't accounted for. If this is all bound up in a single hypothesis Z, then Z is the one that is the most likely to be correct.
Contradictory evidence shouldn't make you more confident in your hypothesis.
If either X or Y has to be true, you cannot have 20% for X and 35% for Y. The remaining 45% would be a contradiction (Neither X nor Y, but "X or Y"). While you can work with those numbers (20 and 35), they are not probabilities any more - they are relative probabilities.
It is very unlikely that the murderer won in the lottery. However, if a suspect did win in the lottery, this does not reduce the probability that he is guilty - he has the same (low) probability as all others.
That reminds me of a question about judging predictions: Is there any established method to say "x made n predictions, was underconfident / calibrated properly / overconfident and the quality of the predictions was z"? Assuming the predictions are given as "x will happen (y% confidence)".
It is easy to make 1000 unbiased predictions about lottery drawings, but this does not mean you are good in making predictions.
You get an infinite set of texts with a finite set of characters and texts of finite length merely by letting the lengths be unbounded. Proof: Consider the set of characters {a}, which has but a single character. We are restricted to the following texts: a, aa, aaa, aaaa, aaaaa,... We nevertheless spot an obvious bijection to the positive integers. (Just count the 'a's) So there are infinitely many texts.
Sorry, I was a bit unprecise. "You need texts without size limit" would be correct. The issue is: Your memory (and probably lifetime) is finite. Even if you convert the whole observable universe to your extended memory.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
A specific part of science is part of mainstream science - or is a white horse not a horse?
If something applies to white horses only, I would write "white horses" instead of "horses". Otherwise it might suggest (at least to some readers) that it applies to many, most or even all horses. It is not wrong, but it can be misleading.