Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Recent updates to gwern.net (2014-2015)

21 gwern 02 November 2015 12:06AM

Previously: 2011; 2012-2013; 2013-2014

“Receive my instruction, and not silver; and knowledge rather than choice gold. / For wisdom is better than rubies; and all the things that may be desired are not to be compared to it.”

Sorted by topic:

Darknet market related:

Statistics & decision theory:

QS related:


  • “Effective Use of arbtt”: My window tracker/time-logger of choice is arbtt which records X window info for later classification and analysis; but one of the challenges is you don’t know how to set up arbtt or improve your environment or write classifications rules. So I wrote a tutorial.
  • Time-lock crypto: wrote a Bash implementation of serial hashing time-lock crypto, link to all known implementations of hash time-lock crypto; discuss recent major theoretical breakthroughs involving Bitcoin



  • switched to Patreon for donations
  • continued sending out my newsletter; up to 24 issues now
  • rewrote gwern.net CSS to be mobile-friendly; should now be readable in an iPhone 6 browser
  • optimized website loading (removed Custom Search Engine, A/B testing, non-validating XML, outbound link-tracking; simplified Disqus; minified JS, and fully async/deferred JS loading)
  • A/B testing:

Why the tails come apart

116 Thrasymachus 01 August 2014 10:41PM

[I'm unsure how much this rehashes things 'everyone knows already' - if old hat, feel free to downvote into oblivion. My other motivation for the cross-post is the hope it might catch the interest of someone with a stronger mathematical background who could make this line of argument more robust]

[Edit 2014/11/14: mainly adjustments and rewording in light of the many helpful comments below (thanks!). I've also added a geometric explanation.]

Many outcomes of interest have pretty good predictors. It seems that height correlates to performance in basketball (the average height in the NBA is around 6'7"). Faster serves in tennis improve one's likelihood of winning. IQ scores are known to predict a slew of factors, from income, to chance of being imprisoned, to lifespan.

What's interesting is what happens to these relationships 'out on the tail': extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6'7" is very tall, it lies within a couple of standard deviations of the median US adult male height - there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren't the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tend to be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this) (1).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). Why?

Too much of a good thing?

One candidate explanation would be that more isn't always better, and the correlations one gets looking at the whole population doesn't capture a reversal at the right tail. Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility. Maybe although having a faster serve is better all things being equal, but focusing too heavily on one's serve counterproductively neglects other areas of one's game. Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.

I would guess that these sorts of 'hidden trade-offs' are common. But, the 'divergence of tails' seems pretty ubiquitous (the tallest aren't the heaviest, the smartest parents don't have the smartest children, the fastest runners aren't the best footballers, etc. etc.), and it would be weird if there was always a 'too much of a good thing' story to be told for all of these associations. I think there is a more general explanation.

The simple graphical explanation

[Inspired by this essay from Grady Towers]

Suppose you make a scatter plot of two correlated variables. Here's one I grabbed off google, comparing the speed of a ball out of a baseball pitchers hand compared to its speed crossing crossing the plate:

It is unsurprising to see these are correlated (I'd guess the R-square is > 0.8). But if one looks at the extreme end of the graph, the very fastest balls out of the hand aren't the very fastest balls crossing the plate, and vice versa. This feature is general. Look at this data (again convenience sampled from googling 'scatter plot') of this:

Or this:

Or this:

Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker: (2)

The thing is, as one approaches the far corners of this ellipse, we see 'divergence of the tails': as the ellipse doesn't sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

So this offers an explanation why divergence at the tails is ubiquitous. Providing the sample size is largeish, and the correlation not too tight (the tighter the correlation, the larger the sample size required), one will observe the ellipses with the bulging sides of the distribution. (3)

Hence the very best basketball players aren't the very tallest (and vice versa), the very wealthiest not the very smartest, and so on and so forth for any correlated X and Y. If X and Y are "Estimated effect size" and "Actual effect size", or "Performance at T", and "Performance at T+n", then you have a graphical display of winner's curse and regression to the mean.

An intuitive explanation of the graphical explanation

It would be nice to have an intuitive handle on why this happens, even if we can be convinced that it happens. Here's my offer towards an explanation:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

For a toy model, pretend that wealth is wholly explained by two factors: intelligence and conscientiousness. Let's also say these are equally important to the outcome, independent of one another and are normally distributed. (4) So, ceteris paribus, being more intelligent will make one richer, and the toy model stipulates there aren't 'hidden trade-offs': there's no negative correlation between intelligence and conscientiousness, even at the extremes. Yet the graphical explanation suggests we should still see divergence of the tails: the very smartest shouldn't be the very richest.

The intuitive explanation would go like this: start at the extreme tail - +4SD above the mean for intelligence, say. Although this gives them a massive boost to their wealth, we'd expect them to be average with respect to conscientiousness (we've stipulated they're independent). Further, as this ultra-smart population is small, we'd expect them to fall close to the average in this other independent factor: with 10 people at +4SD, you wouldn't expect any of them to be +2SD in conscientiousness.

Move down the tail to less extremely smart people - +3SD say. These people don't get such a boost to their wealth from their intelligence, but there should be a lot more of them (if 10 at +4SD, around 500 at +3SD), this means one should expect more variation in conscientiousness - it is much less surprising to find someone +3SD in intelligence and also +2SD in conscientiousness, and in the world where these things were equally important, they would 'beat' someone +4SD in intelligence but average in conscientiousness. Although a +4SD intelligence person will likely be better than a given +3SD intelligence person (the mean conscientiousness in both populations is 0SD, and so the average wealth of the +4SD intelligence population is 1SD higher than the 3SD intelligence people), the wealthiest of the +4SDs will not be as good as the best of the much larger number of +3SDs. The same sort of story emerges when we look at larger numbers of factors, and in cases where the factors contribute unequally to the outcome of interest.

When looking at a factor known to be predictive of an outcome, the largest outcome values will occur with sub-maximal factor values, as the larger population increases the chances of 'getting lucky' with the other factors:

So that's why the tails diverge.


A parallel geometric explanation

There's also a geometric explanation. The R-square measure of correlation between two sets of data is the same as the cosine of the angle between them when presented as vectors in N-dimensional space (explanations, derivations, and elaborations here, here, and here). (5) So here's another intuitive handle for tail divergence:

Grant a factor correlated with an outcome, which we represent with two vectors at an angle theta, the inverse cosine equal the R-squared. 'Reading off the expected outcome given a factor score is just moving along the factor vector and multiplying by cosine theta to get the distance along the outcome vector. As cos theta is never greater than 1, we see regression to the mean. The geometrical analogue to the tails coming apart is the absolute difference in length along factor versus length along outcome|factor scales with the length along the factor; the gap between extreme values of a factor and the less extreme values of the outcome grows linearly as the factor value gets more extreme. For concreteness (and granting normality), an R-square of 0.5 (corresponding to an angle of sixty degrees) means that +4SD (~1/15000) on a factor will be expected to be 'merely' +2SD (~1/40) in the outcome - and an R-square of 0.5 is remarkably strong in the social sciences, implying it accounts for half the variance.(6) The reverse - extreme outliers on outcome are not expected to be so extreme an outlier on a given contributing factor - follows by symmetry.


Endnote: EA relevance

I think this is interesting in and of itself, but it has relevance to Effective Altruism, given it generally focuses on the right tail of various things (What are the most effective charities? What is the best career? etc.) It generally vindicates worries about regression to the mean or winner's curse, and suggests that these will be pretty insoluble in all cases where the populations are large: even if you have really good means of assessing the best charities or the best careers so that your assessments correlate really strongly with what ones actually are the best, the very best ones you identify are unlikely to be actually the very best, as the tails will diverge.

This probably has limited practical relevance. Although you might expect that one of the 'not estimated as the very best' charities is in fact better than your estimated-to-be-best charity, you don't know which one, and your best bet remains your estimate (in the same way - at least in the toy model above - you should bet a 6'11" person is better at basketball than someone who is 6'4".)

There may be spread betting or portfolio scenarios where this factor comes into play - perhaps instead of funding AMF to diminishing returns when its marginal effectiveness dips below charity #2, we should be willing to spread funds sooner.(6) Mainly, though, it should lead us to be less self-confident.

1. Given income isn't normally distributed, using SDs might be misleading. But non-parametric ranking to get a similar picture: if Bill Gates is ~+4SD in intelligence, despite being the richest man in america, he is 'merely' in the smartest tens of thousands. Looking the other way, one might look at the generally modest achievements of people in high-IQ societies, but there are worries about adverse selection.

2. As nshepperd notes below, this depends on something like multivariate CLT. I'm pretty sure this can be weakened: all that is needed, by the lights of my graphical intuition, is that the envelope be concave. It is also worth clarifying the 'envelope' is only meant to illustrate the shape of the distribution, rather than some boundary that contains the entire probability density: as suggested by homunq: it is an 'pdf isobar' where probability density is higher inside the line than outside it. 

3. One needs a large enough sample to 'fill in' the elliptical population density envelope, and the tighter the correlation, the larger the sample needed to fill in the sub-maximal bulges. The old faithful case is an example where actually you do get a 'point', although it is likely an outlier.


4. It's clear that this model is fairly easy to extend to >2 factor cases, but it is worth noting that in cases where the factors are positively correlated, one would need to take whatever component of the factors which are independent of one another.

5. My intuition is that in cartesian coordinates the R-square between correlated X and Y is actually also the cosine of the angle between the regression lines of X on Y and Y on X. But I can't see an obvious derivation, and I'm too lazy to demonstrate it myself. Sorry!

6. Another intuitive dividend is that this makes it clear why you can by R-squared to move between z-scores of correlated normal variables, which wasn't straightforwardly obvious to me.

7. I'd intuit, but again I can't demonstrate, the case for this becomes stronger with highly skewed interventions where almost all the impact is focused in relatively low probability channels, like averting a very specified existential risk.

Confound it! Correlation is (usually) not causation! But why not?

44 gwern 09 July 2014 03:04AM

It is widely understood that statistical correlation between two variables ≠ causation. But despite this admonition, people are routinely overconfident in claiming correlations to support particular causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimating the prevalence of confounds/common-causation. I speculate that in realistic causal networks or DAGs, the number of possible correlations grows faster than the number of possible causal relationships. So confounds really are that common, and since people do not think in DAGs, the imbalance also explains overconfidence.

Full article: http://www.gwern.net/Causality

Recent updates to gwern.net (2013-2014)

26 gwern 08 July 2014 01:44AM

Previous: 2011, 2012-2013

“It cannot be gotten for gold, neither shall silver be weighed for the price thereof. / It cannot be valued with the gold of Ophir, with the precious onyx, nor the sapphire. / The gold and the crystal cannot equal it: and the exchange of it shall not be for vessels of fine gold. / No mention shall be made of coral, or of pearls: for the price of wisdom is above rubies.”

Another 477 days are past, so what have I been up to? In roughly topical & chronological order, here are some major additions to gwern.net:









  • I began A/B testing my site design to try to improve readability:

    • no difference between 4 fonts
    • no difference between lineheights
    • no difference between the null hypothesis & the null hypothesis
    • a pure black/white foreground/background performed better than mixes of off-colors
    • font size 100-120%: default of 100% was best
    • blockquote formatting: Readability-style bad, zebra-stripes good
    • header capitalization: best result was to upcase title & all section headers
    • tested font size & number size & table of contents background: status quo of all was best
    • BeeLine Reader: no color variant performed better than no-highlighting
  • anonymous feedback analysis (feedback turned out to be useful)
  • deleted Flattr, trying out Gittip for donations; Gittip turns out to work much better
  • I began a newsletter/mailing-list; the back-issues are online:

Recent updates to gwern.net (2012-2013)

63 gwern 18 March 2013 07:54PM

Previous: Recent updates to gwern.net (2011)

“But where shall wisdom be found? / And where is the place of understanding? / Man knoweth not the price thereof; neither is it found in the land of the living…for the price of wisdom is above rubies.”

As before, here is material I’ve worked on in the 477 days since my last update which LWers may find interesting. In roughly chronological & topical order, here are the major additions to gwern.net:

Transcribed or translated:

More technical:


Case Study: the Death Note Script and Bayes

25 gwern 04 January 2013 04:33AM

"Who wrote the Death Note script?"

I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.

continue reading »

Using degrees of freedom to change the past for fun and profit

41 CarlShulman 07 March 2012 02:51AM

Follow-up to: Follow-up on ESP study: "We don't publish replications"Feed the Spinoff Heuristic!

Related to: Parapsychology: the control group for scienceDealing with the high quantity of scientific error in medicine

Using the same method as in Study 1, we asked 20 University of Pennsylvania undergraduates to listen to either “When I’m Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensibly unrelated task, they indicated their birth date (mm/dd/yyyy) and their father’s age. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040

That's from "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant," which runs simulations of a version of Shalizi's "neutral model of inquiry," with random (null) experimental results, augmented with a handful of choices in the setup and analysis of an experiment. Even before accounting for publication bias, these few choices produced a desired result "significant at the 5% level" 60.7% of the time, and at the 1% level 21.5% at the time.

I found it because of another paper claiming time-defying effects, during a search through all of the papers on Google Scholar citing Daryl Bem's precognition paper, which I discussed in a past post about the problems of publication bias and selection over the course of a study. For Bem, Richard Wiseman established a registry for the methods, and tests of the registered studies could be set prior to seeing the data (in addition to avoiding the file drawer).

Now a number of purported replications have been completed, with several available as preprints online, including a large "straight replication" carefully following the methods in Bem's paper, with some interesting findings discussed below. The picture does not look good for psi, and is a good reminder of the sheer cumulative power of applying a biased filter to many small choices.

continue reading »

How to Fix Science

50 lukeprog 07 March 2012 02:51AM

Like The Cognitive Science of Rationality, this is a post for beginners. Send the link to your friends!

Science is broken. We know why, and we know how to fix it. What we lack is the will to change things.


In 2005, several analyses suggested that most published results in medicine are false. A 2008 review showed that perhaps 80% of academic journal articles mistake "statistical significance" for "significance" in the colloquial meaning of the word, an elementary error every introductory statistics textbook warns against. This year, a detailed investigation showed that half of published neuroscience papers contain one particular simple statistical mistake.

Also this year, a respected senior psychologist published in a leading journal a study claiming to show evidence of precognition. The editors explained that the paper was accepted because it was written clearly and followed the usual standards for experimental design and statistical methods.

Science writer Jonah Lehrer asks: "Is there something wrong with the scientific method?"

Yes, there is.

This shouldn't be a surprise. What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.

As time passes we learn new things about how to do science better. The Ancient Greeks practiced some science, but few scientists tested hypotheses against mathematical models before Ibn al-Haytham's 11th-century Book of Optics (which also contained hints of Occam's razor and positivism). Around the same time, Al-Biruni emphasized the importance of repeated trials for reducing the effect of accidents and errors. Galileo brought mathematics to greater prominence in scientific method, Bacon described eliminative induction, Newton demonstrated the power of consilience (unification), Peirce clarified the roles of deduction, induction, and abduction, and Popper emphasized the importance of falsification. We've also discovered the usefulness of peer review, control groups, blind and double-blind studies, plus a variety of statistical methods, and added these to "the" scientific method.

In many ways, the best science done today is better than ever — but it still has problems, and most science is done poorly. The good news is that we know what these problems are and we know multiple ways to fix them. What we lack is the will to change things.

This post won't list all the problems with science, nor will it list all the promising solutions for any of these problems. (Here's one I left out.) Below, I only describe a few of the basics.

continue reading »

Funnel plots: the study that didn't bark, or, visualizing regression to the null

47 gwern 04 December 2011 11:05AM

Marginal Revolution linked a post at Genomes Unzipped, "Size matters, and other lessons from medical genetics", with the interesting centerpiece graph:

a funnel plot of genetic studies showing null result approached as sample size increasese

This is from pg 3 of an Ioannidis 2001 et al article (who else?) on what is called a funnel plot: each line represents a series of studies about some particularly hot gene-disease correlations, plotted where Y =  the odds ratio (measure of effect size; all results are 'statistically significant', of course) and X = the sample size. The 1 line is the null hypothesis, here. You will notice something dramatic: as we move along the X-axis and sample sizes increase, everything begins to converge on 1:

Readers familiar with the history of medical association studies will be unsurprised by what happened over the next few years: initial excitement (this same polymorphism was associated with diabetes! And longevity!) was followed by inconclusive replication studies and, ultimately, disappointment. In 2000, 8 years after the initial report, a large study involving over 5,000 cases and controls found absolutely no detectable effect of the ACE polymorphism on heart attack risk. In the meantime, the same polymorphism had turned up in dozens of other association studies for a wide range of traits ranging from obstet­ric cholestasis to menin­go­­coccal disease in children, virtually none of which have ever been convincingly replicated.

(See also "Why epidemiology will not correct itself" or the DNB FAQ.)

continue reading »

MSF Theory: Another Explanation of Subjectively Objective Probability

14 potato 30 July 2011 07:46PM

Before I read Probability is in the Mind and Probability is Subjectively Objective I was a realist about probabilities; I was a frequentest. After I read them, I was just confused. I couldn't understand how a mind could accurately say the probability of getting a heart in a standard deck of playing cards was not 25%. It wasn't until I tried to explain the contrast between my view and the subjective view in a comment on Probability is Subjectively Objective that I realized I was a subjective Bayesian all along. So, if you've read Probability is in the Mind and read Probability is Subjectively Objective but still feel a little confused, hopefully, this will help.

I should mention that I'm not sure that EY would agree with my view of probability, but the view to be presented agrees with EY's view on at least these propositions:

  • Probability is always in a mind, not in the world.
  • The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.
  • There is only one correct probability to assign to a proposition given your partial knowledge of the world.
  • If there is no uncertainty, there is no probability.

And any position that holds these propositions is a non-realist-subjective view of probability. 



Imagine a pre-shuffled deck of playing cards and two agents (they don't have to be humans), named "Johnny" and "Sally", which are betting 1 dollar each on the suit of the top card. As everyone knows, 1/4 of the cards in a playing card deck are hearts. We will name this belief F1; F1 stands for "1/4 of the cards in the deck are hearts.". Johnny and Sally both believe F1. F1 is all that Johnny knows about the deck of cards, but sally knows a little bit more about this deck. Sally also knows that 8 of the top 10 cards are hearts. Let F2 stand for "8 out of the 10 top cards are hearts.". Sally believes F2. John doesn't know whether or not F2. F1 and F2 are beliefs about the deck of cards and they are either true or false.

So, sally bets that the top card is a heart and Johnny bets against her, i.e., she puts her money on "Top card is a heart." being true; he puts his money on "~The top card is a heart." being true. After they make their bets, one could imagine Johnny making fun of Sally; he might say something like: "Are you nuts? You know, I have a 75% chance of winning. 1/4 of the cards are hearts; you can't argue with that!" Sally might reply: "Don't forget that the probability you assign to '~The top card is a heart.' depends on what you know about the deck. I think you would agree with me that there is an 80% chance that 'The top card is a heart' if you knew just a bit more about the state of the deck."

To be undecided about a proposition is to not know which possible world you are in; am I in the possible world where that proposition is true, or in the one where it is false? Both Johnny and Sally are undecided about "The top card is a heart."; their model of the world splits at that point of representation. Their knowledge is consistent with being in a possible world where the top card is a heart, or in a possible world where the top card is not a heart. The more statements they decide on, the smaller the configuration space of possible worlds they think they might find themselves in; deciding on a proposition takes a chunk off of that configuration space, and the content of that proposition determines the shape of the eliminated chunk; Sally's and Johnny's beliefs constrain their respective expected experiences, but not all the way to a point. The trick when constraining one's space of viable worlds, is to make sure that the real world is among the possible worlds that satisfy your beliefs. Sally still has the upper hand, because her space of viably possible worlds is smaller than Johnny's. There are many more ways you could arrange a standard deck of playing cards that satisfies F1 than there are ways to arrange a deck of cards that satisfies F1 and F2. To be clear, we don't need to believe that possible worlds actually exist to accept this view of belief; we just need to believe that any agent capable of being undecided about a proposition is also capable of imagining alternative ways the world could consistently turn out to be, i.e., capable of imagining possible worlds.

For convenience, we will say that a possible world W, is viable for an agent A, if and only if, W satisfies A's background knowledge of decided propositions, i.e., A thinks that W might be the world it finds itself in.

Of the possible worlds that satisfy F1, i.e., of the possible worlds where "1/4 of the cards are hearts" is true, 3/4 of them also satisfy "~The top card is a heart." Since Johnny holds that F1, and since he has no further information that might put stronger restrictions on his space of viable worlds, he ascribes a 75% probability to "~The top card is a heart." Sally, however, holds that F2 as well as F1. She knows that of the possible worlds that satisfy F1 only 1/4 of them satisfy "The top card is a heart." But she holds a proposition that constrains her space of viably possible worlds even further, namely F2. Most of the possible worlds that satisfy F1 are eliminated as viable worlds if we hold that F2 as well, because most of the possible worlds that satisfy F1 don't satisfy F2. Of the possible worlds that satisfy F2 exactly 80% of them satisfy "The top card is a heart." So, duh, Sally assigns an 80% probability to "The top card is a heart." They give that proposition different probabilities, and they are both right in assigning their respective probabilities; they don't disagree about how to assign probabilities, they just have different resources for doing so in this case. P(~The top card is a heart|F1) really is 75% and P(The top card is a heart|F2) really is 80%.

This setup makes it clear (to me at least) that the right probability to assign to a proposition depends on what you know. The more you know, i.e., the more you constrain the space of worlds you think you might be in, the more useful the probability you assign. The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.

This setup also makes it easy to see how an agent can be wrong about the probability it assigns to a proposition given its background knowledge. Imagine a third agent, named "Billy", that has the same information as Sally, but say's that there's a 99% chance of "The top card is a heart." Billy doesn't have any information that further constrains the possible worlds he thinks he might find himself in; he's just wrong about the fraction of possible worlds that satisfy F2 that also satisfy "The top card is a heart.". Of all the possible worlds that satisfy F2 exactly 80% of them satisfy "The top card is a heart.", no more, no less. There is only one correct probability to assign to a proposition given your partial knowledge.

The last benefit of this way of talking I'll mention is that it makes probability's dependence on ignorance clear. We can imagine another agent that knows the truth value of every proposition, lets call him "FSM". There is only one possible world that satisfies all of FSM's background knowledge; the only viable world for FSM is the real world. Of the possible worlds that satisfy FSM's background knowledge, either all of them satisfy "The top card is a heart." or none of them do, since there is only one viable world for FSM. So the only probabilities FSM can assign to "The top card is a heart." are 1 or 0. In fact, those are the only probabilities FSM can assign to any proposition. If there is no uncertainty, there is no probability.

The world knows whether or not any given proposition is true (assuming determinism). The world itself is never uncertain, only the parts of the world that we call agents can be uncertain. Hence, Probability is always in a mind, not in the world. The probabilities that the universe assigns to a proposition are always 1 or 0, for the same reasons FSM only assigns a 1 or 0, and 1 and 0 aren't really probabilities.

In conclusion, I'll risk the hypothesis that: Where 0≤x≤1, "P(a|b)=x" is true, if and only if, of the possible worlds that satisfy "b", x of them also satisfy "a". Probabilities are propositional attitudes, and the probability value (or range of values) you assign to a proposition is representative of the fraction of possible worlds you find viable that satisfy that proposition. You may be wrong about the value of that fraction, and as a result you may be wrong about the probability you assign.

We may call the position summarized by the hypothesis above "Modal Satisfaction Frequency theory", or "MSF theory".

View more: Next