Comment author: Yvain 22 March 2009 11:05:36PM 1 point [-]

Added your acronym to main post. Please do not doci. Adboc instead.

Comment author: Meni_Rosenfeld 25 July 2013 09:12:07PM 9 points [-]

I don't suppose it's possible to view the version history of the post, so can you state for posterity what "DOCI" used to stand for?

Comment author: [deleted] 02 January 2013 11:23:31PM *  1 point [-]

This is the way I would do it, also taking into account EY's point of not hiding away new comments:

Assume each comment has an ‘upvote rate’ U, such that the probability that a comment at the age of t has u upvotes is a Poisson distribution with parameter Ut,

  • P(u|U, t) = (Ut)^u exp(−Ut)/u!

and similarly for downvotes,

  • P(d|D, t) = (Dt)^d exp(−Dt)/d!

If the prior probability distribution for U and D is P(U, D), their posterior probability distribution will be

  • P(U, D|u, d, t) = D^d U^u exp(−(D + U)t) P(U, D)/(a normalization constant).

Then, you sort comments according to a functional of the posterior pdf of U and D; in analogy with expected utility maximization you could use the posterior expectation value of some function f(U, D), but other choices would be possible. (This reduces to your proposal when you take f(U, D) = U/(U + D).)

Of course this model isn't entirely realistic because U and D ought to vary with time (according to timezone, how old the thread is and whether it's currently linked to from the main page, etc.), but the main effect of disregarding this (pretending that a comment has the same probability of getting upvoted in the 10,000th hour after its publication as in the 1st hour) would be to cause very recent comments to be sorted higher, which IMO is a Good Thing anyway.

Comment author: Meni_Rosenfeld 03 January 2013 10:02:34AM 0 points [-]

I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).

Comment author: EHeller 03 January 2013 04:45:28AM 1 point [-]

Why a Poisson distribution? It seems fairly clear we are looking at Bernoulli trials (people who look either upvote, or not). I doubt its a rare enough event (though it depends on the site, I suppose) that a poisson is a better approximation than a normal.

Comment author: Meni_Rosenfeld 03 January 2013 09:58:23AM 2 points [-]

I think it's reasonable to model this as a Poisson process. There are many people who could in theory vote, only few of them do, at random times.

Comment author: EHeller 03 January 2013 03:22:59AM *  2 points [-]

Its not obvious to me that your method improves upon the Wilson score. Certainly, the traditional Bayesian approach (Jeffreys interval) is rarely that different from the WIlson score- have you played with values to see what the largest differences would look like?

Comment author: Meni_Rosenfeld 03 January 2013 06:02:26AM 1 point [-]

Given that a and b are arbitrary, I think the differences can be large. Whether they actually are large for typical datasets I can't readily answer.

In any case the advantages are:

  1. Simplicity. Tuning the parameters is a bit involved, but once you do the formula to apply for each item is very simple. In many (not all) cases, a complicated formula reflects insufficient understanding of the problem.

  2. Motivation. Taking the lower bound of a confidence/credible interval makes some sense but it's not that obvious. The need for it arises because we don't model the prior mean, so we don't want to take risk on unproven items. A posterior mean of the quality is more natural, and won't cause much problems because items default to the true population mean.

  3. Parametrization. The interval methods has a parameter for the probability to take for the size of the interval, but it's not at all clear how to choose it. My method has parameters for mean and variance which are based on the data.

  4. Generalization. This framework makes it easier to clearly think about what we want, and replace the posterior mean of p with a posterior mean of some other quantity of interest. e.g., the suggested "explore vs. exploit" tends to give something closer to an interval upper bound than lower bound, and other methods have been suggested.

Comment author: GuySrinivasan 02 January 2013 06:20:27PM 0 points [-]

Some users' votes are not independent of the current net vote count. http://lesswrong.com/lw/7x/voting_etiquette/5ic

Comment author: Meni_Rosenfeld 02 January 2013 07:30:56PM *  0 points [-]

True. This is a problem since the current net vote count is mutable, while an individual vote, once cast, is not. You could try fitting a much more complicated model that can reproduce this behavior, calibrate it with A/B testing, etc. Or maybe try to prevent it by sorting according to quality, but not actually displaying the metrics.

Comment author: dbaupp 01 January 2013 10:21:33PM *  5 points [-]

(Link to How Not To Sort By Average Rating.)

Something of interest: Jeffery's interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I'm certainly willing to be corrected.)

But I fear that it would cause irreparable damage if the world settles on this solution.

This is probably vastly exaggerating the possible consequences; it's just a method of sorting, and either the Wilson's interval method and a Bayesian method are definitely far better than the naive methods.

Comment author: Meni_Rosenfeld 02 January 2013 05:03:02PM *  1 point [-]

But I fear that it would cause irreparable damage if the world settles on this solution.

This is probably vastly exaggerating the possible consequences; it's just a method of sorting, and either the Wilson's interval method and a Bayesian method are definitely far better than the naive methods.

I just feel that it will place this low-hanging fruit out of reach. e.g.,

Me: Hey Reddit, I have this cool new sorting method for you to try!

Reddit: What do you mean? We've already moved beyond the naive methods into the correct method. Here, see Miller's paper. No further changes are needed.

Maybe I'm exaggerating - I mean, things can be improved again after being improved once - but I just feel that if the world had a "naive rating method" itch to scratch, and something like Miller's method became the go-to method, something is wrong.

Comment author: IlyaShpitser 02 January 2013 10:23:10AM 5 points [-]

But the "correct" method is complicated, poorly motivated, insufficiently parameterized, and founded on frequentist statistics.

A quick question. It seems from this sentence that you view "insufficiently parameterized" as a flaw in a method. Can you explain what that phrase means and why it is a flaw? Is this a statement of preference for parametric stat or something else?

Comment author: Meni_Rosenfeld 02 January 2013 11:35:14AM *  7 points [-]

It means that the model used per item doesn't have enough parameters to encode what we know about the specific domain (where domain is "Reddit comments", "Urban dictionary definitions", etc.)

The formulas discussed define a certain mapping between pairs (positive votes, negative votes) to a quality score. In Miller's model, the same mapping is used everywhere without consideration of the characteristics of the specific domain. In my model, there are parameters a and b (or alternatively, a/(a+b) and a+b) that we first train per-domain, and then apply per item.

For example, let's say you want to decide the order of a (5, 0) item and a (40, 10) item. Miller's model just gives one answer. My model gives different answers depending on:

The average quality - if the overall item quality is high (say, most items have 100% positive votes), the (5,0) item should be higher because it's likely one of those 100% items, while (40,10) has proven itself to be of lower quality. If, however, most items have low quality, (40,10) will be higher because it has proven itself to be one of the rare high-quality items, while (5,0) is more likely to be a low-quality item which lucked out.

The variance in quality - say the average quality is 50%. If the variance in quality is low, (5,0) will be lower because it is likely to be an average item which lucked out, while (40, 10) has proven to be of high quality. If the variance is high (with most items being either 100% or 0%), (5,0) will be higher because in all likelihood it is one of the 100% items, while (40, 10) has proven to be only 80%.

In short, using a cookie-cutter model without any domain-specific parameters doesn't make the most efficient use of the data possible.

Comment author: Eliezer_Yudkowsky 02 January 2013 12:15:57AM 8 points [-]

The main thing you want to calculate here is expected-value-of-information. Otherwise new posts drop into the void. Trying to maximize upvotes in the long run means showing new posts that might have a high-upvoting parameter.

Comment author: Meni_Rosenfeld 02 January 2013 10:26:45AM 2 points [-]

This is interesting, especially considering that it favors low-data items, as opposed to both the confidence-interval-lower-bound and the notability adjustment factor, which penalize low-data items.

You can try to optimize it in an explore-vs-exploit framework, but there would be a lot of modeling parameters, and additional kinds of data will need to be considered. Specifically, a measure of how many of those who viewed the item bothered to vote at all. Some comments will not get any votes simply because they are not that interesting; so if you keep placing them on top hoping to learn more about them, you'll end up with very few total votes because you show people things they don't care about.

Comment author: TrE 02 January 2013 09:53:30AM *  0 points [-]

Specifically, on a continuous quality scale from 0 to 1, with a prior of a uniform density in this interval with k upvotes and n downvotes, one receives for the posterior distribution the (unnormalized) measure .

A Gaussian-like prior might be more suited here, though.

Knowing the actual probability distribution and not just the average can be useful if, for some reason, you're not interested in the comments with the best average, but in those which are least or most controversial.

Comment author: Meni_Rosenfeld 02 January 2013 10:21:41AM *  0 points [-]

The beta distribution is a conjugate prior for Bernoulli trials, so if you start with such a prior the posterior is also beta, which greatly simplifies the calculations. It also converges to normal for large alpha and beta, and in any case can be fit into any mean and variance, so it's a good choice.

Whatever your target function is, you'll want the item with the greatest posterior mean for this target. To do this generally you'll need the posterior distribution of p rather than the mean of p itself. But the distribution just describes what you know about p, it doesn't itself encode properties such as "controversial".

Comment author: jsteinhardt 02 January 2013 03:16:58AM 1 point [-]

How wide a range did you have in mind? It's certainly not the case that Bayesian methods are universally better than frequentist ones.

Comment author: Meni_Rosenfeld 02 January 2013 06:39:57AM 0 points [-]

Well, I think there is some sense of Bayesianism as a meta-approach, without regard to specific methods, which most of us would consider healthier than the frequentist mindset.

There are surely papers showing the superiority of frequentism over Bayesianism, and papers showing the differences between various flavors of Bayesianism and various flavors of frequentism. But that's not what I'm after right now (with the understanding that a paper can be on the "Bayesian" side and be correct).

View more: Prev | Next