You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

philh comments on Open thread, Sep. 14 - Sep. 20, 2015 - Less Wrong Discussion

3 Post author: MrMind 14 September 2015 07:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (192)

You are viewing a single comment's thread. Show more comments above.

Comment author: satt 16 September 2015 01:17:42AM *  3 points [-]

Even if you know only the generating process and not an estimation procedure, you might be able to get away with just feeding a parametrization of the generating process into an MCMC sampler, and seeing whether the sampler converges on sensible posterior distributions for the parameters.

I like Stan for this; you write a file telling Stan the data's structure, the parameters of the generating process, and how the generating process produced the data, and Stan turns it into an MCMC sampling program you can run.

If the model isn't fully identified you can get problems like the sampler bouncing around the parameter space indefinitely without ever converging on a decent posterior. This could be a problem here; to illustrate, suppose I write out my version of skeptical_lurker's formulation of the model in the obvious naive way —

sales(city, widget) = α × β(city) × γ(widget) + noise(city, widget)

— where brackets capture city & widget-type indices, I have a β for every city and a γ for every widget type, and I assume there's no odd correlations between the different parameters.

This version of the model won't have a single optimal solution! If the model finds a promising set of parameter values, it can always produce another equally good set of parameter values by halving all of the β values and doubling all of the γ values; or by halving α and the γ values while quadrupling the β values; or by...you get the idea. A sampler might end up pulling a Flying Dutchman, swooping back and forth along a hyper-hyperbola in parameter space.

I think this sort of under-identification isn't necessarily a killer in Stan if your parameter priors are unimodal and not too diffuse, because the priors end up as a lodestar for the sampler, but I'm not an expert. To be safe, I could avoid the issue by picking a specific city and a specific widget as a reference widget type, with the other cities' β and other widgets' γ effectively defined as proportional to those:

if city == 1 and widget == 1: sales(city, widget) = α + noise(city, widget)

else, if city == 1: sales(city, widget) = α × γ(widget) + noise(city, widget)

else, if widget == 1: sales(city, widget) = α × β(city) + noise(city, widget)

else: sales(city, widget) = α × β(city) × γ(widget) + noise(city, widget)

Then run the sampler and back out estimates of the overall city-level sales fractions from the parameter estimates (1 / (1+sum(β)), β(2) / (1+sum(β)), β(3) / (1+sum(β)), etc.).

And I'd probably make the noise term multiplicative and non-negative, instead of additive, to prevent the sampler from landing on a negative sales figure, which is presumably nonsensical in this context.

Apologies if I'm rambling at you about something you already know about, or if I've focused so much on one specific version of the toy example that this is basically useless. Hopefully this is of some interest...

Comment author: philh 17 September 2015 05:47:07PM 1 point [-]

Thanks to both you and gwern. It doesn't look like this is the direction I'm going in for this problem, but it's something I'm glad to know about.