Open thread, Sep. 14 - Sep. 20, 2015

MrMind

Even if you know only the generating process and not an estimation procedure, you might be able to get away with just feeding a parametrization of the generating process into an MCMC sampler, and seeing whether the sampler converges on sensible posterior distributions for the parameters.

I like Stan for this; you write a file telling Stan the data's structure, the parameters of the generating process, and how the generating process produced the data, and Stan turns it into an MCMC sampling program you can run.

If the model isn't fully identified you can get problems like the sampler bouncing around the parameter space indefinitely without ever converging on a decent posterior. This could be a problem here; to illustrate, suppose I write out my version of skeptical_lurker's formulation of the model in the obvious naive way —

sales(city, widget) = α × β(city) × γ(widget) + noise(city, widget)

— where brackets capture city & widget-type indices, I have a β for every city and a γ for every widget type, and I assume there's no odd correlations between the different parameters.

This version of the model won't have a single optimal solution! If the model finds a promising set of parameter values, it can always produce another equally good set of parameter values by halving all of the β values and doubling all of the γ values; or by halving α and the γ values while quadrupling the β values; or by...you get the idea. A sampler might end up pulling a Flying Dutchman, swooping back and forth along a hyper-hyperbola in parameter space.

I think this sort of under-identification isn't necessarily a killer in Stan if your parameter priors are unimodal and not too diffuse, because the priors end up as a lodestar for the sampler, but I'm not an expert. To be safe, I could avoid the issue by picking a specific city and a specific widget as a reference widget type, with the other cities' β and other widgets' γ effectively defined as proportional to those:

if city == 1 and widget == 1: sales(city, widget) = α + noise(city, widget)

else, if city == 1: sales(city, widget) = α × γ(widget) + noise(city, widget)

else, if widget == 1: sales(city, widget) = α × β(city) + noise(city, widget)

else: sales(city, widget) = α × β(city) × γ(widget) + noise(city, widget)

Then run the sampler and back out estimates of the overall city-level sales fractions from the parameter estimates (1 / (1+sum(β)), β(2) / (1+sum(β)), β(3) / (1+sum(β)), etc.).

And I'd probably make the noise term multiplicative and non-negative, instead of additive, to prevent the sampler from landing on a negative sales figure, which is presumably nonsensical in this context.

Apologies if I'm rambling at you about something you already know about, or if I've focused so much on one specific version of the toy example that this is basically useless. Hopefully this is of some interest...

6

Open thread, Sep. 14 - Sep. 20, 2015

6

6

6

Open thread, Sep. 14 - Sep. 20, 2015

6

6