Updating, part 1: When can you change your mind? The binary model

11 PhilGoetz 13 May 2010 05:55PM

I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.

I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.

But can that possibly work?  How can someone who isn't already highly-accurate, identify other people who are highly accurate?

Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree.  But it doesn't say that doing so helps.  Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?

To find out, I built a model of updating in response to the opinions of others.  It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians.  But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating.  It depends only on a very simple condition, established at the start of the simulation.  Can you guess what it is?

I'll write another post describing and explaining the results if this post receives a karma score over 10.

continue reading »

The Cameron Todd Willingham test

3 Kevin 05 May 2010 12:11AM

In 2004, The United States government executed Cameron Todd Willingham via lethal injection for the crime of murdering his young children by setting fire to his house. 

In 2009, David Grann wrote an extended examination of the evidence in the Willingham case for The New Yorker, which has called into question Willingham's guilt. One of the prosecutors in the Willingham case, John Jackson, wrote a response summarizing the evidence from his current perspective. I am not summarizing the evidence here so as to not give the impression of selectively choosing the evidence.

A prior probability estimate for Willingham's guilt (certainly not a close to optimal prior probability) is the probability that a fire resulting in the fatalities of children was intentionally set. The US Fire Administration puts this probability at 13%. The prior probability could be made more accurate by breaking down that 13% of intentionally set fires into different demographic sets, or looking at correlations with other things such as life insurance data.

My question for Less Wrong: Just how innocent is Cameron Todd Willingham? Intuitively, it seems to me that the evidence for Willingham's innocence is of higher magnitude than the evidence for Amanda Knox's innocence. But the prior probability of Willingham being guilty given his children died in a fire in his home is higher than the probability that Amanda Knox committed murder given that a murder occurred in Knox's house.

Challenge question: What does an idealized form of Bayesian Justice look like? I suspect as a start that it would result in a smaller percentage of defendants being found guilty at trial. This article has some examples of the failures to apply Bayesian statistics in existing justice systems.

Frequentist Magic vs. Bayesian Magic

41 Wei_Dai 08 April 2010 08:34PM

[I posted this to open thread a few days ago for review. I've only made some minor editorial changes since then, so no need to read it again if you've already read the draft.]

This is a belated reply to cousin_it's 2009 post Bayesian Flame, which claimed that frequentists can give calibrated estimates for unknown parameters without using priors:

And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be true to fact afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever.

And indeed they can. Here's the simplest example that I can think of that illustrates the spirit of frequentism:

Suppose there is a machine that produces biased coins. You don't know how the machine works, except that each coin it produces is either biased towards heads (in which case each toss of the coin will land heads with probability .9 and tails with probability .1) or towards tails (in which case each toss of the coin will land tails with probability .9 and heads with probability .1). For each coin, you get to observe one toss, and then have to state whether you think it's biased towards heads or tails, and what is the probability that's the right answer.

Let's say that you decide to follow this rule: after observing heads, always answer "the coin is biased towards heads with probability .9" and after observing tails, always answer "the coin is biased towards tails with probability .9". Do this for a while, and it will turn out that 90% of the time you are right about which way the coin is biased, no matter how the machine actually works. The machine might always produce coins biased towards heads, or always towards tails, or decide based on the digits of pi, and it wouldn't matter—you'll still be right 90% of the time. (To verify this, notice that in the long run you will answer "heads" for 90% of the coins actually biased towards heads, and "tails" for 90% of the coins actually biased towards tails.) No priors needed! Magic!

continue reading »

Bayesian Collaborative Filtering

14 JGWeissman 03 April 2010 11:29PM

I present an algorithm I designed to predict which position a person would report for an issue on TakeOnIt, through Bayesian updates on the evidence of other people's positions on that issue. Additionally, I will point out some potential areas of improvement, in the hopes of inspiring others here to expand on this method.


For those not familiar with TakeOnIt, the basic idea is that there are issues, represented by yes/no questions, on which people can take the positions Agree (A), Mostly Agree (MA), Neutral (N), Mostly Disagree (MD), or Disagree (D). (There are two types of people tracked by TakeOnIt: users who register their own opinions, and Experts/Influencers whose opinions are derived from public quotations.)

The goal is to predict what issue a person S would take on a position, based on the positions registered by other people on that question. To do this, we will use Bayes' Theorem to update the probability that person S takes the position X on issue I, given that person T has taken position Y on issue I:

P(S takes X on I | T takes Y on I) = P(S takes X on I)*P(T takes Y on I | S takes X on I)/P(T takes Y on I)

Really, we will be updating on several people Tj taking positions Ty on I:

P(S takes X on I | for all j, Tj takes Yj on I) = P(S takes X on I)*Product over j of (P(Tj takes Yj on I | S takes X on I)/P(Tj takes Yj on I))

continue reading »

SIA won't doom you

8 Stuart_Armstrong 25 March 2010 05:43PM

Katja Grace has just presented an ingenious model, claiming that SIA combined with the great filter generates its own variant of the doomsday argument. Robin echoed this on Overcoming Bias. We met soon after Katja had come up with the model, and I signed up to it, saying that I could see no flaw in the argument.

Unfortunately, I erred. The argument does not work in the form presented.

First of all, there is the issue of time dependence. We are not just a human level civilization drifting through the void in blissful ignorance about our position in the universe. We know (approximately) the age of our galaxy, and the time elapsed since the big bang.

How is this relevant? It is relevant because all arguments about the great filter are time-dependent. Imagine we had just reached consciousness and human-level civilization, by some fluke, two thousand years after the creation of our galaxy, by an evolutionary process that took two thousand years. We see no aliens around us. In this situation, we have no reason to suspect any great filter; if we asked ourselves "are we likely to be the first civilization to reach this stage?" then the answer is probably yes. No evidence for a filter.

Imagine, instead, that we had reached consciousness a trillion years into the life of our galaxy, again via an evolutionary process that took two thousand years, and we see no aliens or traces of aliens. Then the evidence for a filter is overwhelming; something must have stopped all those previous likely civilizations from emerging into the galactic plane.

So neither of these civilizations can be included in our reference class (indeed, the second one can only exist if we ourselves are filtered!). So the correct reference class to use is not "the class of all potential civilizations in our galaxy that have reached our level of technological advancement and seen no aliens", but "the class of all potential civilizations in our galaxy that have reached our level of technological advancement at around the same time as us and seen no aliens". Indeed, SIA, once we update on the present, cannot tell us anything about the future.

But there's more.

continue reading »

Information theory and the symmetry of updating beliefs

45 Academian 20 March 2010 12:34AM

Contents:

1.  The beautiful symmetry of Bayesian updating
2.  Odds and log odds: a short comparison
3.  Further discussion of information

Rationality is all about handling this thing called "information".  Fortunately, we live in an era after the rigorous formulation of Information Theory by C.E. Shannon in 1948, a basic understanding of which can actually help you think about your beliefs, in a way similar but complementary to probability theory. Indeed, it has flourished as an area of research exactly because it helps people in many areas of science to describe the world.  We should take advantage of this!

The information theory of events, which I'm about to explain, is about as difficult as high school probability.  It is certainly easier than the information theory of multiple random variables (which right now is explained on Wikipedia), even though the equations look very similar.  If you already know it, this can be a linkable source of explanations to save you writing time :)

So!  To get started, what better way to motivate information theory than to answer a question about Bayesianism?

The beautiful symmetry of Bayesian updating

The factor by which observing A increases the probability of B is the same as the factor by which observing B increases the probability of A.  This factor is P(A and B)/(P(A)·P(B)), which I'll denote by pev(A,B) for reasons to come.  It can vary from 0 to +infinity, and allows us to write Bayes' Theorem succinctly in both directions:

     P(A|B)=P(A)·pev(A,B),   and   P(B|A)=P(B)·pev(A,B)

What does this symmetry mean, and how should it affect the way we think?

A great way to think of pev(A,B) is as a multiplicative measure of mutual evidence, which I'll call mutual probabilistic evidence to be specific.  If pev=1 if they're independent, if pev>1 they make each other more likely, and if pev<1 if they make each other less likely.

But two ways to think are better than one, so I will offer a second explanation, in terms of information, which I often find quite helpful in analyzing my own beliefs:

continue reading »

"Life Experience" as a Conversation-Halter

11 Seth_Goldin 18 March 2010 07:39PM

Sometimes in an argument, an older opponent might claim that perhaps as I grow older, my opinions will change, or that I'll come around on the topic.  Implicit in this claim is the assumption that age or quantity of experience is a proxy for legitimate authority.  In and of itself, such "life experience" is necessary for an informed rational worldview, but it is not sufficient.

The claim that more "life experience" will completely reverse an opinion indicates that the person making such a claim believes that opinions from others are based primarily on accumulating anecdotes, perhaps derived from extensive availability bias.  It actually is a pretty decent assumption that other people aren't Bayesian, because for the most part, they aren't.  Many can confirm this, including Haidt, Kahneman, and Tversky.

When an opponent appeals to more "life experience," it's a last resort, and it's a conversation halter.  This tactic is used when an opponent is cornered.  The claim is nearly an outright acknowledgment of moving to exit the realm of rational debate.  Why stick to rational discourse when you can shift to trading anecdotes?  It levels the playing field, because anecdotes, while Bayesian evidence, are easily abused, especially for complex moral, social, and political claims.  As rhetoric, this is frustratingly effective, but it's logically rude.

Although it might be rude and rhetorically weak, it would be authoritatively appropriate for a Bayesian to be condescending to a non-Bayesian in an argument.  Conversely, it can be downright maddening for a non-Bayesian to be condescending to a Bayesian, because the non-Bayesian lacks the epistemological authority to warrant such condescension.  E.T. Jaynes wrote in Probability Theory about the arrogance of the uninformed, "The semiliterate on the next bar stool will tell you with absolute, arrogant assurance just how to solve the world's problems; while the scholar who has spent a lifetime studying their causes is not at all sure how to do this."

Omega's subcontracting to Alpha

7 Stuart_Armstrong 16 March 2010 06:52PM

This is a variant built on Gary Drescher's xor problem for timeless decision theory.

You get an envelope from your good friend Alpha, and are about to open it, when Omega appears in a puff of logic.

Being completely trustworthy as usual (don't you just hate that?), he explains that Alpha flipped a coin (or looked at the parity of a sufficiently high digit of pi), to decide whether to put £1000 000 in your envelope, or put nothing.

He, Omega, knows what Alpha decided, has also predicted your own actions, and you know these facts. He hands you a £10 note and says:

"(I predicted that you will refuse this £10) if and only if (there is £1000 000 in Alpha's envelope)."

What to do?

EDIT: to clarify, Alpha will send you the envelope anyway, and Omega may choose to appear or not appear as he and his logic deem fit. Nor is Omega stating a mathematical theorem: that one can deduce from the first premise the truth of the second. He is using XNOR, but using 'if and only if' seems a more understandable formulation. You get to keep the envelope whatever happens, in case that wasn't clear.

What is Bayesianism?

81 Kaj_Sotala 26 February 2010 07:43AM

This article is an attempt to summarize basic material, and thus probably won't have anything new for the hard core posting crowd. It'd be interesting to know whether you think there's anything essential I missed, though.

You've probably seen the word 'Bayesian' used a lot on this site, but may be a bit uncertain of what exactly we mean by that. You may have read the intuitive explanation, but that only seems to explain a certain math formula. There's a wiki entry about "Bayesian", but that doesn't help much. And the LW usage seems different from just the "Bayesian and frequentist statistics" thing, too. As far as I can tell, there's no article explicitly defining what's meant by Bayesianism. The core ideas are sprinkled across a large amount of posts, 'Bayesian' has its own tag, but there's not a single post that explicitly comes out to make the connections and say "this is Bayesianism". So let me try to offer my definition, which boils Bayesianism down to three core tenets.

We'll start with a brief example, illustrating Bayes' theorem. Suppose you are a doctor, and a patient comes to you, complaining about a headache. Further suppose that there are two reasons for why people get headaches: they might have a brain tumor, or they might have a cold. A brain tumor always causes a headache, but exceedingly few people have a brain tumor. In contrast, a headache is rarely a symptom for cold, but most people manage to catch a cold every single year. Given no other information, do you think it more likely that the headache is caused by a tumor, or by a cold?

If you thought a cold was more likely, well, that was the answer I was after. Even if a brain tumor caused a headache every time, and a cold caused a headache only one per cent of the time (say), having a cold is so much more common that it's going to cause a lot more headaches than brain tumors do. Bayes' theorem, basically, says that if cause A might be the reason for symptom X, then we have to take into account both the probability that A caused X (found, roughly, by multiplying the frequency of A with the chance that A causes X) and the probability that anything else caused X. (For a thorough mathematical treatment of Bayes' theorem, see Eliezer's Intuitive Explanation.)

continue reading »

Case study: abuse of frequentist statistics

25 Cyan 21 February 2010 06:35AM

Recently, a colleague was reviewing an article whose key justification rested on some statistics that seemed dodgy to him, so he came to me for advice. (I guess my boss, the resident statistician, was out of his office.) Now, I'm no expert in frequentist statistics. My formal schooling in frequentist statistics comes from my undergraduate chemical engineering curriculum -- I wouldn't rely on it for consulting. But I've been working for someone who is essentially a frequentist for a year and a half, so I've had some hands-on experience. My boss hired me on the strength of my experience with Bayesian statistics, which I taught myself in grad school, and one thing reading the Bayesian literature voraciously will equip you for is critiquing frequentist statistics. So I felt competent enough to take a look.1

continue reading »

View more: Prev | Next