Circular belief updating

6 irrational 11 December 2013 06:26AM

This article is going to be in the form of a story, since I want to lay out all the premises in a clear way. There's a related question about religious belief.

 

Let's suppose that there's a country called Faerie. I have a book about this country which describes all people living there as rational individuals (in a traditional sense). Furthermore, it states that some people in Faerie believe that there may be some individuals there known as sorcerers. No one has ever seen one, but they may or may not interfere in people's lives in subtle ways. Sorcerers are believed to be such that there can't be more than one of them around and they can't act outside of Faerie. There are 4 common belief systems present in Faerie:

  1. Some people believe there's a sorcerer called Bright who (among other things) likes people to believe in him and may be manipulating people or events to do so. He is not believed to be universally successful.
  2. Or, there may be a sorcerer named Invisible, who interferes with people only in such ways as to provide no information about whether he exists or not.
  3. Or, there may be an (obviously evil) sorcerer named Dark, who would prefer that people don't believe he exists, and interferes with events or people for this purpose, likewise not universally successfully.
  4. Or, there may either be no sorcerers at all, or perhaps some other sorcerers that no one knows about, or perhaps some other state of things hold, such as that there are multiple sorcerers, or these sorcerers don't obey the above rules. However, everyone who lives in Faerie and is in this category simply believes there's no such thing as a sorcerer.

This is completely exhaustive, because everyone believes there can be at most one sorcerer. Of course, some individuals within each group have different ideas about what their sorcerer is like, but within each group they all absolutely agree with their dogma as stated above.

Since I don't believe in sorcery, a priori I assign very high probability for case 4, and very low (and equal) probability for the other 3.

I can't visit Faerie, but I am permitted to do a scientific phone poll. I call some random person, named Bob. It turns out he believes in Bright. Since P(Bob believes in Bright | case 1 is true) is higher than the unconditional probability, I believe I should adjust the probability of case 1 up, by Bayes rule. Does everyone agree? Likewise, the probability of case 3 should go up, since disbelief in Dark is evidence for existence of Dark in exactly the same way, although perhaps to a smaller degree. I also think the case 2 and case 4 have to lose some probability, since it adds up to 1. If I further call a second person, Daisy, who turns out to believe in Dark, I should adjust all probabilities in the opposite direction. I am not asking either of them about the actual evidence they have, just what they believe.

I think this is straightforward so far. Here's the confusing part. It turns out that both Bob and Daisy are themselves aware of this argument. So, Bob says, one of the reasons he believes in Bright, is because that's positive evidence for Bright's existence. And Daisy believes in Dark despite that being evidence against his existence (presumably because there's some other evidence that's overwhelming).

Here are my questions:

  1. Is it sane for Bob and Daisy to be in such a positive or negative feedback loop? How is this resolved?
  2. If Bob and Daisy took the evidence provided by their belief into account already, how does this affect my own evidence updating? Should I take it into account regardless, or not at all, or to a smaller degree?

I am looking forward to your thoughts.

Case Study: the Death Note Script and Bayes

25 gwern 04 January 2013 04:33AM

"Who wrote the Death Note script?"

I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.

continue reading »

(Subjective Bayesianism vs. Frequentism) VS. Formalism

27 potato 26 November 2011 05:05AM

One of the core aims of the philosophy of probability is to explain the relationship between frequency and probability. The frequentist proposes identity as the relationship. This use of identity is highly dubious. We know how to check for identity between numbers, or even how to check for the weaker copula relation between particular objects; but how would we test the identity of frequency and probability? It is not immediately obvious that there is some simple value out there which is modeled by probability, like position and mass are values that are modeled by Newton's Principia. You can actually check if density * volume = mass, by taking separate measurements of mass, density and volume, but what would you measure to check a frequency against a probability?

There are certain appeals to frequentest philosophy: we would like to say that if a bag has 100 balls in it, only 1 of which is white, then the probability of drawing the white ball is 1/100, and that if we take a non-white ball out, the probability of drawing the white ball is now 1/99. Frequentism would make the philosophical justification of that inference trivial. But of course, anything a frequentist can do, a Bayesian can do (better). I mean that literally: it's the stronger magic.

A Subjective Bayesian, more or less, says that the reason frequencies are related to probabilities is because when you learn a frequency you thereby learn a fact about the world, and one must update one's degrees of belief on every available fact. The subjective Bayesian actually uses the copula in another strange way:

Probability is subjective degree of belief.

and subjective Bayesians also claim:

Probabilities are not in the world, they are in your mind.

These two statements are brilliantly championed in Probability is Subjectively Objective. But ultimately, the formalism which I would like to suggest denies both of these statements. Formalists do not ontologically commit themselves to probabilities, just as they do not say that numbers exist; hence we don't allocate probabilities in the mind or anywhere else; we only commit ourselves to number theory, and probability theory. Mathematical theories are simply repeatable processes which construct certain sequences of squiggles called "theorems", by changing the squiggles of other theorems, according to certain rules called "inferences". Inferences always take as input certain sequences of squiggles called premises, and output a sequence of squiggles called the conclusion. The only thing an inference ever does is add squiggles to a theorem, take away squiggles from a theorem, or both. It turns out that these squiggle sequences mixed with inferences can talk about almost anything, certainly any computable thing. The formalist does not need to ontologically commit to numbers to assert that "There is a prime greater than 10000.", even though "There is x such that" is a flat assertion of existence; because for the formalist "There is a prime greater than 10000." simply means that number theory contains a theorem which is interpreted as "there is a prime greater than 10000." When you say a mathematical fact in English, you are interpreting a theorem from a formal theory. If under your suggested interpretation, all of the theorems of the theory are true, then whatever system/mechanism your interpretation of the theory talks about, is said to be modeled by the theory.

So, what is the relation between frequency and probability proposed by formalism? Theorems of probability, may be interpreted as true statements about frequencies, when you assign certain squiggles certain words and claim the resulting natural language sentence. Or for short we can say: "Probability theory models frequency." It is trivial to show that Komolgorov models frequency, since it also models fractions; it is an algebra after all. More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here. If Bayesian probability theory really does model rational belief, which many believe it does, then that is likely the most interesting thing we are ever going to be able to model with probability. But probability theory also models spatial measurement? Why not add the position that probability is volume to the debating lines of the philosophy of probability?

Why are frequentism's and subjective Bayesianism's misuses of the copula not as obvious as volumeism's? This is because what the Bayesian and frequentest are really arguing about is statistical methodology, they've just disguised the argument as an argument about what probability is. Your interpretation of probability theory will determine how you model uncertainty, and hence determine your statistical methodology. Volumeism cannot handle uncertainty in any obvious way; however, the Bayesian and frequentest interpretations of probability theory, imply two radically different ways of handling uncertainty.

The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin:

A subjective Bayesian and a frequentist are at a bar, and the bartender (being rather bored) tells the two that he has a biased coin, and asks them "what is the probability that the coin will come up heads on the first flip?" The frequentist says that for the coin to be biased means for it not have a 50% chance of coming up heads, so all we know is that it has a probability that is not equal 50%. The Bayesain says that that any evidence I have for it coming up heads, is also evidence for it coming up tails, since I know nothing about one outcome, that doesn't hold for its negation, and the only value which represents that symmetry is 50%.

I ask you. What is the difference between these two, and the poor souls engaged in endless debate over realism about sound in the beginning of Making Beliefs Pay Rent?

If a tree falls in a forest and no one hears it, does it make a sound? One says, "Yes it does, for it makes vibrations in the air." Another says, "No it does not, for there is no auditory processing in any brain."

One is being asked: "Are there pressure waves in the air if we aren't around?" the other is being asked: "Are there auditory experiences if we are not around?" The problem is that "sound" is being used to stand for both "auditory experience" and "pressure waves through air". They are both giving the right answers to these respective questions. But they are failing to Replace the Symbol with the Substance and they're using one word with two different meanings in different places. In the exact same way, "probability" is being used to stand for both "frequency of occurrence" and "rational degree of belief" in the dispute between the Bayesian and the frequentist. The correct answer to the question: "If the coin is flipped an infinite amount of times, how frequently would we expect to see a coin that landed on heads?" is "All we know, is that it wouldn't be 50%." because that is what it means for the coin to be biased. The correct answer to the question: "What is the optimal degree of belief that we should assign to the first trial being heads?" is "Precisely 50%.", because of the symmetrical evidential support the results get from our background information. How we should actually model the situation as statisticians depends on our goal. But remember that Bayesianism is the stronger magic, and the only contender for perfection in the competition.

For us formalists, probabilities are not anywhere. We do not even believe in probability technically, we only believe in probability theory. The only coherent uses of "probability" in natural language are purely syncategorematic. We should be very careful when we colloquially use "probability" as a noun or verb, and be very careful and clear about what we mean by this word play. Probability theory models many things, including degree of belief, and frequency. Whatever we may learn about rationality, frequency, measure, or any of the other mechanisms that probability models, through the interpretation of probability theorems, we learn because probability theory is isomorphic to those mechanisms. When you use the copula like the frequentist or the subjective Bayesian, it makes it hard to notice that probability theory modeling both frequency and degree of belief, is not a contradiction. If we use "is" instead of "model", it is clear that frequency is not degree of belief, so if probability is belief, then it is not frequency.  Though frequency is not degree of belief, frequency does model degree of belief, so if probability models frequency, it must also model degree of belief.

A History of Bayes' Theorem

53 lukeprog 29 August 2011 07:04AM

Sometime during the 1740s, the Reverend Thomas Bayes made the ingenious discovery that bears his name but then mysteriously abandoned it. It was rediscovered independently by a different and far more renowned man, Pierre Simon Laplace, who gave it its modern mathematical form and scientific application — and then moved on to other methods. Although Bayes’ rule drew the attention of the greatest statisticians of the twentieth century, some of them vilified both the method and its adherents, crushed it, and declared it dead. Yet at the same time, it solved practical questions that were unanswerable by any other means: the defenders of Captain Dreyfus used it to demonstrate his innocence; insurance actuaries used it to set rates; Alan Turing used it to decode the German Enigma cipher and arguably save the Allies from losing the Second World War; the U.S. Navy used it to search for a missing H-bomb and to locate Soviet subs; RAND Corporation used it to assess the likelihood of a nuclear accident; and Harvard and Chicago researchers used it to verify the authorship of the Federalist Papers. In discovering its value for science, many supporters underwent a near-religious conversion yet had to conceal their use of Bayes’ rule and pretend they employed something else. It was not until the twenty-first century that the method lost its stigma and was widely and enthusiastically embraced.

So begins Sharon McGrayne's fun new book, The Theory That Would Not Die, a popular history of Bayes' Theorem. Instead of reviewing the book, I'll summarize some of its content below. I skip the details and many great stories from the book, for example the (Bayesian) search for a lost submarine that inspired Hunt for Red October. Also see McGrayne's Google Talk here. She will be speaking at the upcoming Singularity Summit, too, which you can register for here (price goes up after August 31st).

continue reading »

Illustrator needed: Intuitive Bayes 2.0

15 Eliezer_Yudkowsky 19 July 2011 12:24AM

A huge revision of The Intuitive Explanation of Bayesian Reasoning is in progress, aimed at being considerably more accessible and hence with a lot more graphics.  I need someone who can turn out versions of the illustrations that are technically accurate enough to try on beta readers, fast enough and reliably enough that I can ask for revised versions of the illustrations a day later if the beta reader says they didn't understand it.  There is a Google Doc in progress which you would be given permission to edit, containing some hand-drawn attempts on my part to indicate what the illustrations should look like, and a number of finished illustrations from an illustrator who unfortunately cannot put in any further work on this job.

For this job, technical accuracy (i.e., if a ratio is 3:4, it should not look like 1:9), understandability, and speed is much more important than beauty - the idea is to get as quickly as possible to a working version with understandable illustrations that has been verified by the beta readers.

If interested, email me at yudkowsky@gmail.com.

Information theory and the symmetry of updating beliefs

45 Academian 20 March 2010 12:34AM

Contents:

1.  The beautiful symmetry of Bayesian updating
2.  Odds and log odds: a short comparison
3.  Further discussion of information

Rationality is all about handling this thing called "information".  Fortunately, we live in an era after the rigorous formulation of Information Theory by C.E. Shannon in 1948, a basic understanding of which can actually help you think about your beliefs, in a way similar but complementary to probability theory. Indeed, it has flourished as an area of research exactly because it helps people in many areas of science to describe the world.  We should take advantage of this!

The information theory of events, which I'm about to explain, is about as difficult as high school probability.  It is certainly easier than the information theory of multiple random variables (which right now is explained on Wikipedia), even though the equations look very similar.  If you already know it, this can be a linkable source of explanations to save you writing time :)

So!  To get started, what better way to motivate information theory than to answer a question about Bayesianism?

The beautiful symmetry of Bayesian updating

The factor by which observing A increases the probability of B is the same as the factor by which observing B increases the probability of A.  This factor is P(A and B)/(P(A)·P(B)), which I'll denote by pev(A,B) for reasons to come.  It can vary from 0 to +infinity, and allows us to write Bayes' Theorem succinctly in both directions:

     P(A|B)=P(A)·pev(A,B),   and   P(B|A)=P(B)·pev(A,B)

What does this symmetry mean, and how should it affect the way we think?

A great way to think of pev(A,B) is as a multiplicative measure of mutual evidence, which I'll call mutual probabilistic evidence to be specific.  If pev=1 if they're independent, if pev>1 they make each other more likely, and if pev<1 if they make each other less likely.

But two ways to think are better than one, so I will offer a second explanation, in terms of information, which I often find quite helpful in analyzing my own beliefs:

continue reading »