satt comments on Open thread, Sep. 14 - Sep. 20, 2015 - Less Wrong

3 Post author: MrMind 14 September 2015 07:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (192)

You are viewing a single comment's thread. Show more comments above.

Comment author: satt 19 September 2015 12:33:20PM *  2 points [-]

It does. However...

I see now I could've described the model better. In Stan I don't think you can literally write the observed data as the sum of the signal and the noise; I think the data always has to be incorporated into the model as something sampled from a probability distribution, so you'd actually translate the simplest additive model into Stan-speak as something like

data {
int<lower=1> N;
int<lower=1> Ncities;
int<lower=1> Nwidgets;
int<lower=1> city[N];
int<lower=1> widget[N];
real<lower=0> sales[N];
}
parameters {
real<lower=0> alpha;
real beta[Ncities];
real gamma[Nwidgets];
real<lower=0> sigma;
}
model {
// put code here to define explicit prior distributions for parameters
for (n in 1:N) {
// the tilde means the left side's sampled from the right side
sales[n] ~ normal(alpha + beta[city[n]] + theta[widget[n]], sigma);
}
}

which could give you a headache because a normal distribution puts nonzero probability density on negative sales values, so the sampler might occasionally try to give sales[n] a negative value. When this happens, Stan notices that's inconsistent with sales[n]'s zero lower bound, and generates a warning message. (The quality of the sampling probably gets hurt too, I'd guess.)

And I don't know a way to tell Stan, "ah, the normal error has to be non-negative", since the error isn't explicitly broken out into a separate term on which one can set bounds; the error's folded into the procedure of sampling from a normal distribution.

The way to avoid this that clicks most with me is to bake the non-negativity into the model's heart by sampling sales[n] from a distribution with non-negative support:

for (n in 1:N) {
sales[n] ~ lognormal(log(alpha * beta[city[n]] * theta[widget[n]]), sigma);
}

Of course, bearing in mind the last time I indulged my lognormal fetish, this is likely to have trouble too, for the different reason that a lognormal excludes the possibility of exactly zero sales, and you'd want to either zero-inflate the model or add a fixed nonzero offset to sales before putting the data into Stan. But a lognormal does eliminate the problem of sampling negative values for sales[n], and aligns nicely with multiplicative city & widget effects.