othercriteria comments on [QUESTION]: Academic social science and machine learning - Less Wrong

11 Post author: VipulNaik 19 July 2014 03:13PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (17)

You are viewing a single comment's thread. Show more comments above.

Comment author: othercriteria 28 July 2014 01:32:39AM 1 point [-]

If you're working with composite hypotheses, replace "your statistic" with "the supremum of your statistic over the relevant set of hypotheses".

Comment author: jsteinhardt 29 July 2014 02:04:40AM 1 point [-]

If there are infinitely many hypotheses in the set then the algorithm in the grandparent doesn't terminate :).

Comment author: othercriteria 29 July 2014 05:02:05PM *  1 point [-]

What I was saying was sort of vague, so I'm going to formalize here.

Data is coming from some random process X(θ,ω), where θ parameterizes the process and ω captures all the randomness. Let's suppose that for any particular θ, living in the set Θ of parameters where the model is well-defined, it's easy to sample from X(θ,ω). We don't put any particular structure (in particular, cardinality assumptions) on Θ. Since we're being frequentists here, nature's parameter θ' is fixed and unknown. We only get to work with the realization of the random process that actually happens, X' = X(θ',ω').

We have some sort of analysis t(⋅) that returns a scalar; applying it to the random data gives us the random variables t(X(θ,ω)), which is still parameterized by θ and still easy to sample from. We pick some null hypothesis Θ0 ⊂ Θ, usually for scientific or convenience reasons.

We want some measure of how weird/surprising the value t(X') is if θ' were actually in Θ0. One way to do this, if we have a simple null hypothesis Θ0 = { θ0 }, is to calculate the p-value p(X') = P(t(X(θ0,ω)) ≥ t(X')). This can clearly be approximated using samples from t(X(θ0,ω)).

For composite null hypotheses, I guessed that using p(X') = sup{θ0 ∈ Θ0} P(t(X(θ0,ω)) ≥ t(X')) would work. Paraphrasing jsteinhardt, if Θ0 = { θ01, ..., θ0n }, you could approximate p(X') using samples from t(X(θ01,ω)), ... t(X(θ01,ω)), but it's not clear what to do when Θ0 has infinite cardinality. I see two ways forward. One is approximating p(X') by doing the above computation over a finite subset of points in Θ0, chosen by gridding or at random. This should give an approximate lower bound on the p-value, since it might miss θ where the observed data look unexceptional. If the approximate p-value leads you to fail to reject the null, you can believe it; if it leads you to reject the null, you might be less sure and might want to continue trying more points in Θ0. Maybe this is what jsteinhardt means by saying it "doesn't terminate"? The other way forward might be to use features of t and Θ0, which we do have some control over, to simplify the expression sup{θ0 ∈ Θ0} P(t(X(θ0,ω)) ≥ c). Say, if t(X(θ,ω)) is convex in θ for any ω and Θ0 is a convex bounded polytope living in some Euclidean space, then the supremum only depends on how P(t(X(θ0,ω)) ≥ c) behaves at a finite number of points.

So yeah, things are far more complicated than I claimed and realize now working through it. But you can do sensible things even with a composite null.

Comment author: jsteinhardt 30 July 2014 05:09:40AM 0 points [-]

Yup I agree with all of that. Nice explanation!