You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on Open thread, Sep. 14 - Sep. 20, 2015 - Less Wrong Discussion

3 Post author: MrMind 14 September 2015 07:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (192)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 17 September 2015 06:25:16PM 2 points [-]

And I'd probably make the noise term multiplicative and non-negative, instead of additive, to prevent the sampler from landing on a negative sales figure, which is presumably nonsensical in this context.

I know JAGS lets you put interval limits onto terms which lets you specify that some variable must be non-negative (looks something like dist(x,y)[0,∞]), so maybe STAN has something similar.

Comment author: satt 19 September 2015 12:33:20PM *  2 points [-]

It does. However...

I see now I could've described the model better. In Stan I don't think you can literally write the observed data as the sum of the signal and the noise; I think the data always has to be incorporated into the model as something sampled from a probability distribution, so you'd actually translate the simplest additive model into Stan-speak as something like

data {
int<lower=1> N;
int<lower=1> Ncities;
int<lower=1> Nwidgets;
int<lower=1> city[N];
int<lower=1> widget[N];
real<lower=0> sales[N];
}
parameters {
real<lower=0> alpha;
real beta[Ncities];
real gamma[Nwidgets];
real<lower=0> sigma;
}
model {
// put code here to define explicit prior distributions for parameters
for (n in 1:N) {
// the tilde means the left side's sampled from the right side
sales[n] ~ normal(alpha + beta[city[n]] + theta[widget[n]], sigma);
}
}

which could give you a headache because a normal distribution puts nonzero probability density on negative sales values, so the sampler might occasionally try to give sales[n] a negative value. When this happens, Stan notices that's inconsistent with sales[n]'s zero lower bound, and generates a warning message. (The quality of the sampling probably gets hurt too, I'd guess.)

And I don't know a way to tell Stan, "ah, the normal error has to be non-negative", since the error isn't explicitly broken out into a separate term on which one can set bounds; the error's folded into the procedure of sampling from a normal distribution.

The way to avoid this that clicks most with me is to bake the non-negativity into the model's heart by sampling sales[n] from a distribution with non-negative support:

for (n in 1:N) {
sales[n] ~ lognormal(log(alpha * beta[city[n]] * theta[widget[n]]), sigma);
}

Of course, bearing in mind the last time I indulged my lognormal fetish, this is likely to have trouble too, for the different reason that a lognormal excludes the possibility of exactly zero sales, and you'd want to either zero-inflate the model or add a fixed nonzero offset to sales before putting the data into Stan. But a lognormal does eliminate the problem of sampling negative values for sales[n], and aligns nicely with multiplicative city & widget effects.