The problem - at least the one I'm currently focusing on, which might not be the one I need to focus on - is converting percentages-by-city on a collection of subsets, into percentages-by-city in general. I'm currently assuming that there's no structure beyond what I specified, partly because I'm not currently able to take advantage of it if there is.
A toy example, with no randomness, would be - widget A sold 2/3 in city X and 1/3 in city Y. Widget B sold 6/7 in city X and 1/7 in city Z. Widget C sold 3/4 in city Y and 1/4 in city Z. Widget D is to be sold in cities X, Y and Z. What fraction of its sales should I expect to come from each city?
The answer here is 0.6 from X, 0.3 from Y and 0.1 from Z, but I'm looking for some way to generate these in the face of randomness. (My first thought was to take averages - e.g. city A got an average of (2/3 + 6/7)/2 = 16/21 of the sales - and then normalize those averages. But none of the AM, GM and HM gave the correct results on the toy version, so I don't expect them to do well with high randomness. It might be that with more data they come closer to being correct, so that's something I'll look into if no one can point me to existing literature.)
So, there's some sort of function mapping from (cities,widgets)->sales, plus randomness. In general, I would say use some standard machine learning technique, but if you know the function is linear you can do it directly.
So:
sales=constant x cityvalue x widgetvalue + noise
d sales/d cityvalue = constant x widgetvalue
d sales/d widgetvalue = constant x cityvalue
(all vectors)
So then you pick random starting values of cityvalue , widgetvalue, calculate the error and do gradient decent.
Or just plug
Error = sum((constant x cityvalue x widgetvalue - sales)^2)
Into an optimisation function, which will be slower but quicker to code.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.