You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RichardKennaway comments on The trouble with Bayes (draft) - Less Wrong Discussion

10 Post author: snarles 19 October 2015 08:50PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (58)

You are viewing a single comment's thread. Show more comments above.

Comment author: RichardKennaway 23 October 2015 06:53:00PM 0 points [-]

The example I gave in my first reply (where g(x) is known to be either one of two known functions h(x) or j(x)) can easily be extended into the kind of fully specified counterexample you are looking for

That looks like a parametric model. There is one parameter, a binary variable that chooses h or j. A belief about that parameter is a probability p that h is the function. Yes, I can see that updating p on sight of the data may give a better estimate of E[Y|f(X)=1], which is known a priori to be either h(1) or j(1).

I expect it would be similar for small numbers of parameters also, such as a linear relationship between X and Y. Using the whole sample should improve on only looking at the subsample around f(X)=1.

However, in the nonparametric case (I think you are arguing) this goes wrong. The sample size is not large enough to estimate a model that gives a narrow estimate of E[Y|f(X)=1]. Am I understanding you yet?

It seems to me that the problem arises even before getting to the nonparametric case. If a parametric model has too many parameters to estimate from the sample, and the model predictions are everywhere sensitive to all of the parameters (so it cannot be approximated by any simpler model) then trying to estimate E[Y|f(X)=1] by first fitting the model, then predicting from the model, will also not work.

It so clearly will not work that it must be a wrong thing to do. It is not yet clear to me that a Bayesian statistician must do it anyway. The set {Y|f(X)=1} conveys information about E[Y|f(X)=1] directly, independently of the true model (assumed for the purpose of this discussion to be within the model space being considered). Estimating it via fitting a model ignores that information. Is there no Bayesian method of using it?

A partial answer to your question:

So if the prior is weak (as it is in my main post) you don't model the function, and if the prior is strong, you model the function (and therefore make use of all the observations)? Where do you draw the line?

would be that the less the model helps, the less attention you pay it relative to calculating Mean{Y|f(X)=1}. I don't have a mathematical formulation of how to do that though.