You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

snarles comments on The trouble with Bayes (draft) - Less Wrong Discussion

10 Post author: snarles 19 October 2015 08:50PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (58)

You are viewing a single comment's thread. Show more comments above.

Comment author: snarles 20 October 2015 07:29:02PM *  1 point [-]

By "importance sampling distribution" do you mean the distribution that tells you whether Y is missing or not?

Right. You could say the cases of Y1|D=1 you observe in the population are an importance sample from Y1, the hypothetical population that would result if everyone in the population were treated. E[Y1], the quantity to be estimated, is the mean of this hypothetical population. The importance sampling weights are q(x) = Pr[D=1|x]/p(x) where p(x) is the marginal distribution (ie you invert these weights to get the average), the importance sampling distribution is the conditional density of X|D=1.

Comment author: IlyaShpitser 21 October 2015 12:18:06AM *  2 points [-]

Still slightly confused.

I think Robins and Ritov has a theorem (cited in your blog link) claiming to get E[Y] if Y is MAR you need to incorporate info about 1/p(x) somewhere into your procedure (?the prior?) or you don't get uniform consistency. Is your claim that you can get around this via some hierarchical model, e.g.:

How about a hierarchical model, where first we draw a parameter p from the uniform distribution, and then draw g(x) from the uniform distribution over smooth functions with mean value equal to p? This gets you non-constant g(x) in the posterior, while your posteriors of E[g(X)] converge to the truth as quickly as in the Binomial example. Arguing backwards, I would say that such a prior comes closer to capturing my beliefs.

Is this just intuition or did you write this up somewhere? That sounds very interesting.


Why did you start thinking about conditional sampling at all? If estimating E[Y] via importance sampling/inverse weights/covariate adjustment is already something of a difficulty for Bayesians, why think about E[Y | event]? Isn't that trivially at least as hard?

Comment author: snarles 21 October 2015 12:55:56AM 2 points [-]

The confusion may come from mixing up my setup and Robins/Ritov's setup. There is no missing data in my setup.

I could write up my intuition for the hierarchical model. It's an almost trivial result if you don't assume smoothness, since for any x1,...,xn the parameters g(x1)...g(xn) are conditionally independent given p and distributed as F(p), where F is the maximum entropy Beta with mean p (I don't know the form of the parameters alpha(p) and beta(p) off-hand). Smoothness makes the proof much more difficult, but based on high-dimensional intuition one can be sure that it won't change the result substantially.

It is quite possible that estimating E[Y] and E[Y|event] are "equivalently hard", but they are both interesting problems with different quite different real-world applications. The reason I chose to write about estimating E[Y|event] is because I think it is easier to explain than importance sampling.