paper-machine comments on Putting in the Numbers - Less Wrong

8 Post author: Manfred 30 January 2014 06:41AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: Oscar_Cunningham 30 January 2014 09:20:17AM *  2 points [-]

If we have additional knowledge that the average roll of our die is 3, then we want to maximize -P(1)·Log(P(1)) - P(2)·Log(P(2)) - P(3)·Log(P(3)) - P(4)·Log(P(4)), given that the sum is 1 and the average is 3. We can either plug in the constraints and set partial derivatives to zero, or we can use a maximization technique like Lagrange multipliers.

I've never been able to understand this.

Surely the correct course of action in this situation is to have a prior for the possible biases of the die, say the uniform prior on {x in R^4 : x1+x2+x3+x4=1, xi>=0 for all i}, and then update Bayesianly by restricting to the subset where the average is 3. Then to find the distribution for the outcomes of the die we integrate over this.

I'm pretty sure this doesn't give the same distribution as maxent, and I can't think of a prior that would. (I think my suggested prior gives the "straight lines" distribution that you wanted!)

So when are each of these procedures appropriate? I agree that maxent is a good way to assign priors, but I think that when you have data you should use it by updating, rather than by remaking you prior.

Comment author: [deleted] 30 January 2014 09:29:48AM *  3 points [-]

I don't think there's anything that says a maximum entropy prior is what you get if you construct a maximum entropy prior for a weaker subset of assumptions, and then update based on the complement.

EDIT: Jaynes elaborates on the relationship between Bayes and maximum entropy priors here (warning, pdf).