You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

skeptical_lurker comments on Open thread, Sep. 14 - Sep. 20, 2015 - Less Wrong Discussion

3 Post author: MrMind 14 September 2015 07:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (192)

You are viewing a single comment's thread. Show more comments above.

Comment author: philh 14 September 2015 10:56:43PM *  1 point [-]

The problem - at least the one I'm currently focusing on, which might not be the one I need to focus on - is converting percentages-by-city on a collection of subsets, into percentages-by-city in general. I'm currently assuming that there's no structure beyond what I specified, partly because I'm not currently able to take advantage of it if there is.

A toy example, with no randomness, would be - widget A sold 2/3 in city X and 1/3 in city Y. Widget B sold 6/7 in city X and 1/7 in city Z. Widget C sold 3/4 in city Y and 1/4 in city Z. Widget D is to be sold in cities X, Y and Z. What fraction of its sales should I expect to come from each city?

The answer here is 0.6 from X, 0.3 from Y and 0.1 from Z, but I'm looking for some way to generate these in the face of randomness. (My first thought was to take averages - e.g. city A got an average of (2/3 + 6/7)/2 = 16/21 of the sales - and then normalize those averages. But none of the AM, GM and HM gave the correct results on the toy version, so I don't expect them to do well with high randomness. It might be that with more data they come closer to being correct, so that's something I'll look into if no one can point me to existing literature.)

Comment author: skeptical_lurker 15 September 2015 01:03:20PM *  3 points [-]

So, there's some sort of function mapping from (cities,widgets)->sales, plus randomness. In general, I would say use some standard machine learning technique, but if you know the function is linear you can do it directly.

So:

sales=constant x cityvalue x widgetvalue + noise

d sales/d cityvalue = constant x widgetvalue

d sales/d widgetvalue = constant x cityvalue

(all vectors)

So then you pick random starting values of cityvalue , widgetvalue, calculate the error and do gradient decent.

Or just plug

Error = sum((constant x cityvalue x widgetvalue - sales)^2)

Into an optimisation function, which will be slower but quicker to code.

Comment author: philh 15 September 2015 01:57:34PM 2 points [-]

Thank you! This seems like the conceptual shift I needed.