Comment author:Perplexed
29 August 2010 06:55:36PM
2 points
[-]

I think that the problem is that EY has introduced non-standard terminology here. Worse, he blames it on Jaynes, who makes no such mistake. I just looked it up.

There are two concepts here which must not be confused.

a priori information, aka prior information, aka background information

prior probabilities, aka priors (by everyone except EY. Jaynes dislikes this but acquiesces).

Prior information does indeed constitute a hypothesis in which you have complete confidence. I agree this is something of a weakness - a weakness which is recognized implicitly in such folklore as "Cromwell's rule" Prior information cannot be updated.

Prior probabilities (frequently known simply as priors) can be updated. In a sense, being updated is their whole purpose in life.

Comment author:Perplexed
29 August 2010 09:05:11PM
2 points
[-]

This is exactly what's going on. Thank you.

You are welcome. Unfortunately, I was wrong. Or at least incomplete.

I misinterpreted what EY was saying in the posting you cited. He was not, as I mistakenly assumed, saying that prior probabilities should not be called priors. He was instead talking about a third kind of entity which should not be confused with either of the other two.

Prior distributions over hypotheses, which Eliezer wishes to call simply "priors"

But there is not a confusion with referring to both prior probabilities and prior distributions as simply priors because a prior probability is simply a special case of a prior distribution. A probability is simply a distribution over a set of two competing hypotheses - only one of which can be true.

Bayes theorem in its usual form applies only to simple prior probabilities. It tells you how to update the probability. In order to update a prior distribution, you effectively need to use Bayes's theorem multiple times - once for each hypothesis in your set of hypotheses.

So what is that 1/2 number which Eliezer says is definitely not a prior? It is none of the above three things. It is something harder to describe. A statistic over a distribution. I am not even going to try to explain what that means.
Sorry for any confusion I may have created. And thx to Sniffnoy and timtyler for calling my attention to my mistake.

This can easily be "flattened" into a single, more complex, probability distribution:

25% draw white bean from mixed bag.

25% draw black bean from mixed bag.

50% draw white bean from unmixed bag.

If we wish to consider multiple draws, we can again flatten the total event into a single distribution:

1/8 mixed bag, black and black

1/8 mixed bag, black and white

1/8 mixed bag, white and black

1/8 mixed bag, white and white

1/2 unmixed bag, white and white

Translating the "what is that number" question into this situation, we can ask: what do we mean when we say that we are 5/8 sure that we will draw two white beans? I would say that it is a confidence; the "event" that has 5/8 probability is a partial event, a lossy description of the total event.

Comment author:Perplexed
29 August 2010 09:47:17PM
*
3 points
[-]

I'm not convinced that there's a meaningful difference between prior distributions and prior probabilities.

There isn't when you have only two competing hypotheses. Add a third hypothesis and
you really do have to work with distributions. Chapter 4 of Jaynes explains this wonderfully. It is a long chapter, but fully worth the effort.

But the issue is also nicely captured by your own analysis. As you show, any possible linear combination of the two hypotheses can be characterized by a single parameter, which is itself the probability that the next ball will be white. But when you have three hypotheses, you have two degrees of freedom. A single probability number no longer captures all there is to be said about what you know.

## Comments (257)

BestI think that the problem is that EY has introduced non-standard terminology here. Worse, he blames it on Jaynes, who makes no such mistake. I just looked it up.

There are two concepts here which must not be confused.

Prior information does indeed constitute a hypothesis in which you have complete confidence. I agree this is something of a weakness - a weakness which is recognized implicitly in such folklore as "Cromwell's rule" Prior information cannot be updated.

Prior probabilities (frequently known simply as priors) can be updated. In a sense, being updated is their whole purpose in life.

This is exactly what's going on. Thank you.

I apologize for my confused terminology.

You are welcome. Unfortunately, I was wrong. Or at least incomplete.

I misinterpreted what EY was saying in the posting you cited. He was not, as I mistakenly assumed, saying that prior probabilities should not be called priors. He was instead talking about a third kind of entity which should not be confused with either of the other two.

But there is not a confusion with referring to both prior probabilities and prior distributions as simply priors because a prior probability is simply a special case of a prior distribution. A probability is simply a distribution over a set of two competing hypotheses - only one of which can be true.

Bayes theorem in its usual form applies only to simple prior probabilities. It tells you how to update the probability. In order to update a

prior distribution, you effectively need to use Bayes's theorem multiple times - once for each hypothesis in your set of hypotheses.So what is that 1/2 number which Eliezer says is definitely

not a prior? It is none of the above three things. It is something harder to describe. A statistic over a distribution. I am not even going to try to explain what that means. Sorry for any confusion I may have created. And thx to Sniffnoy and timtyler for calling my attention to my mistake.*0 points [-]I'm not convinced that there's a meaningful difference between prior distributions and prior probabilities.

Going back to the beans problem, we have this:

This can easily be "flattened" into a single, more complex, probability distribution:

If we wish to consider multiple draws, we can again flatten the

totalevent into a single distribution:Translating the "what is that number" question into this situation, we can ask: what do we mean when we say that we are 5/8 sure that we will draw two white beans? I would say that it is a

confidence; the "event" that has 5/8 probability is apartialevent, a lossy description of thetotalevent.*3 points [-]There isn't when you have only two competing hypotheses. Add a third hypothesis and you really do have to work with distributions. Chapter 4 of Jaynes explains this wonderfully. It is a long chapter, but fully worth the effort.

But the issue is also nicely captured by your own analysis. As you show, any possible linear combination of the two hypotheses can be characterized by a single parameter, which is itself the probability that the next ball will be white. But when you have three hypotheses, you have two degrees of freedom. A single probability number no longer captures all there is to be said about what you know.

In retrospect, it's obvious that "probability" should refer to a real scalar on the interval [0,1].

*0 points [-]Everyone calls prior probabilities "priors" - including: http://yudkowsky.net/rational/bayes