endoself comments on Rationalist horoscopes: A low-hanging utility generator. - Less Wrong

62 Post author: AdeleneDawner 22 May 2011 09:37AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

You are viewing a single comment's thread.

Comment author: endoself 23 May 2011 01:20:11AM *  3 points [-]

If a horoscope's score becomes negative, it is removed from the pool of active horoscopes; otherwise, its chance of being chosen is based on the average value of the votes it has received compared to the other horoscopes, disregarding recently-used ones.

Why not use a Bayesian method, like the one described here.

EDIT: I misremembered; this is not actually a Bayesian method.

Comment author: Cyan 23 May 2011 02:39:25AM *  2 points [-]

That's not a Bayesian method -- it's a frequentist shrinkage method. That said, it's a good one to use for this purpose.

Comment author: endoself 23 May 2011 02:45:22AM *  0 points [-]

Sorry, I read the page a while ago and linked it without rereading.

Comment author: AdeleneDawner 23 May 2011 01:37:25AM 2 points [-]

Is that compatible with the idea of having multiple 'strengths' of upvotes and downvotes? I'm not following the math well at all, but it seems not to be, and having things that are rated harmful disappear quickly and automatically seems important.

Comment author: endoself 23 May 2011 01:46:01AM *  2 points [-]

Rereading the article, I'm not sure whether the method is actually Bayesian. If it is, you could do that by saying that the probability of a 'harmful' rating is very low if the horoscope is not actually harmful.

Comment author: AdeleneDawner 23 May 2011 01:51:46AM 2 points [-]

So we'd track 'chance that it's harmful', 'chance that it's not useful', etc. separately, and combine those probabilities somehow to get the score that's used to determine the horoscope's chance of being picked?

Sounds convoluted, but I could see that working.

Comment author: endoself 23 May 2011 02:48:21AM *  1 point [-]

Essentially. It wouldn't be that complicated. You'd use a multinomial distribution for your conditional probabilities and the factors of usefulness and harmfulness should combine linearly if your utilities sum over people.

Comment author: AdeleneDawner 23 May 2011 03:02:08AM *  0 points [-]

There's a reasonable chance that what you just said will be parseable to Peer, but it goes over my head.

Alternately, some non-us person could do the relevant coding, since it is open-source.

Comment author: AdeleneDawner 24 May 2011 03:19:21AM *  1 point [-]

I think I've figured this out. I still can't implement it, since I don't write PHP, but I'm going to drop it here anyway, to get a sanity check and in case seeing it written out will be useful for anyone else. (Hint, hint.)

Edit: On further examination, I think my conclusion is wrong... or rather, the first set of numbers I got was right, and just not useful for this purpose.

I started with a set of sample data: A horoscope with 2 votes for 'not useful' and 3 votes each for 'sort-of useful' and 'useful', totaling 8 votes.

I then used the wolfram alpha Wilson score interval calculator to get a pair of numbers for each of the 5 vote types, given 95% confidence:

  • 'Harmful' and 'awesome' have a lower bound of 0 and an upper bound of .324

  • 'Not useful' has a lower bound of .071 and an upper bound of .590

  • 'Sort of useful' and 'useful' have a lower bound of .137 and an upper bound of .694

I messed around with using those numbers directly to get weighted scores for the horoscopes, but they didn't work very well that way, so I adjusted each set to add up to 100%. For example, on the lower bound set, 'not useful' has (0.071)/(0.071+0.137+0.137)=0.206 of the adjusted votes. Multiplying these adjusted numbers by the weighting that I gave in the original post (-15, -1, +1, +3, +10) gave what look to me like sane numbers: My sample data got an adjusted lower bound score of 1.382 and an adjusted upper bound score of 0.216. (The adjusted upper bound score is low because 'harmful' votes are weighted more strongly than 'awesome' votes, and the upper bounds for those are closer to the upper bounds of the other types of votes - in other words, the 'upper bound' vote is more pessimistic because of how things are weighted.)

I could easily have done something wrong, there, but assuming not and assuming that it's easy to code a Wilson score interval calculator, this should be simple enough to add to the code once someone gets to it. (Peer appears to have become busy with something else.) The decision on whether to use upper or lower bounds seems to depend on how we want newer horoscopes to act in comparison to older ones, and I think using the lower bound (optimistic) one makes sense - I think new horoscopes should ideally be given a few opportunities to prove themselves at a higher rate of visibility before settling into their accurate place in the hierarchy, rather than having to claw their way up from enforced obscurity.


Numbers:

actual votes

0 harmful
2 useless
3 s/o useful
3 useful
0 awesome

score 1.25

-

lower bound 95% confidence

0.000 harmful (*-15 = -0.000)
0.071 useless (*-1 = -0.071)
0.137 s/o useful (*1 = 0.137)
0.137 useful (*3 = 0.411)
0.000 awesome (*10 = 0.000)

score (total weighted score divided by sum of Wilson numbers): 0.48475

-

upper bound 95% confidence

0.324 harmful (*-15 = -4.860)
0.590 useless (*-1 = -0.590)
0.694 s/o useful (*1 = 0.694)
0.694 useful (*3 = 2.082)
0.324 awesome (*10 = 3.240)

score: 0.2155

-

adjusted lower bound of 95% confidence (keep wilson numbers proportional to each other but make 'em total 1.0)

0.000 harmful (*-15 = -0.000)
0.206 useless (*-1 = -0.206)
0.397 s/o useful (*1 = 0.397)
0.397 useful (*3 = 1.191)
0.000 awesome (*10 = 0.000)

score: 1.382

-

adjusted upper bound 95% confidence

0.123 harmful (*-15 = -1.845)
0.225 useless (*-1 = -0.225)
0.264 s/o useful (*1 = 0.264)
0.264 useful (*3 = 0.792)
0.123 awesome (*10 = 1.230)

score: 0.216

Comment author: endoself 24 May 2011 10:58:20PM 0 points [-]

I just realized that I didn't mention this in a direct reply to you so I should mention now that this method was not actually Bayesian. It seems to work well enough for whatever sites use it but if you want I could do a Bayesian analysis of the data similar to the one you did.

Comment author: AdeleneDawner 24 May 2011 11:56:51PM 1 point [-]

I don't really care whether a given method of Baysean or frequentist so long as it works in context. But it looks to me like the point of any predictive method is to give a higher score to something with more votes, and I don't think that makes sense here - there's a risk of ending up in a situation where a few dozen horoscopes (based on how often the horoscopes are allowed to repeat; out of hopefully at least a few hundred) have many more votes than the others, because they happened to be at the top of the heap at one point and started getting picked more often by the RNG, which got them more votes, which widened the gap in how often they were chosen, which got them even more votes....

Comment author: Cyan 25 May 2011 01:42:39AM 2 points [-]

To ensure churn, fix a lower bound on the probability that a horoscope will be picked, at least until it has been picked enough times to accurately rank it against horoscopes that have been picked more often.