The confusion is because you are trying to do two contradictory things with your "vote". If the goal is to inform future readers as to whether a post is high quality or low quality, clearly you should just give your personal assessment. The only reason you'd take priors into account was if your "vote" was actually a "bet" and there was some advantage to "betting" correctly.
Let's say we strive to vote according to our personal judgment. Should we vote strategically or not?
For example, let's say I read a post that seems marginally good. It has a score that's significantly higher than other posts which seem superior. Should I downvote the post to indicate that I think its score should be lower, or upvote to indicate that I think the post was marginally good?
Annoyance wrote, "There's no point to the rating system if we don't exercise our evaluations independently. Taking other people's opinions into consideration may be a good way to reach a final conclusion, but it's a terrible way to form your own opinion."
It's also a terrible way to contribute to a good group evaluation. One of the points Surowiecki wrote about in "The Wisdom of Crowds" is the necessity of aggregating independent judgements. If the judgements to be aggregated are not independent you get bubbles and runs.
One way to encourage people to not just vote stuff up because it is already popular would be to occasionally put up a subtle nonsense post, preferably by someone high status like EY or robin, seed it with an already fairly high score (such as 10) and then heavily penalize people's karma if they vote it up. One might call this a "honeypot for phoney rationalists"
This would require some incentive for people to vote, to compensate for the small probability of being hit with a penalty. Overall this would make the Karma system more complex.
Should we be worried that people will vote stuff up just because it is already popular? There is currently no penalty for voting against the crowd, so wouldn't people (rightly) want to do this?
(Of course, we assume people are voting based on their personal impressions. It's clear that votes bases on Bayesian beliefs are not are useful here.)
I may end up linking this from the About page when it comes time to explain suggested voting policies.
Of course, your personal vote up or down should not be influenced by that.
(Also, I did a quick edit to insert a page break / continuation thingy, hope you don't mind.)
When moderating comments, the goal is not to vote good posts up and bad posts down, but to make the vote total most accurately reflect the signals of all the people who voted on it. Since voters don't gain or lose anything by voting accurately, besides the satisfaction of knowing that their votes help the scores more accurately reflect post quality, they should always vote according to their private signal, and ignore the signals that others have given.
On the other hand, when signaling is tied together with some other choice, then information cascades can happen. The example that was given in my networks class was a case of two restaurants next to each other, where each potential patron can see how busy each restaurant is. In that case, people don't care about their signal, but just want to visit the better restaurant, and an information cascade is likely to occur. A similar occurrence happens with book purchases: if a book appears on a best-seller list, then that signals to everyone that it's good, but it may only be there because people bought it based on that signal. There are documented examples of clever publishers have buying copies of their own books to kick-start this effect.
Hold on, Johnicholas, isn´t there a slip in the calculation concerning the third reader, case 4? You say
...but shouldn't this produce the answer (3:8) rather than (6:1)? The conclusion seems to be that as long as either the score is tied or "down" leads by one, readers will keep on voting according to their judgement, while as soon as either "up" leads by one or "down" leads by one, the next reader and all the following will ignore their judgements and follow suit.
Slightly more complicated, but still a great example!
You are ENTIRELY CORRECT! I am embarrassed and I apologize.
I juggled the numbers repeatedly, trying to get a brief example that only uses numbers, not symbols; when it seemed like I had succeeded, I stopped.
I'll think about how to correct the post.
It is interesting to observe the distribution of scores on recent posts.
0, 0, 2, 3, 3, 14, 19, 23
these are fairly obviously clustered into "high scoring" and "very low scoring", indicating that a nonlinear effect is in play, perhaps something like an information cascade.
If karma was hidden, would you expect it to be linear?
Plus, as far as I know, we can't see the total up votes and down votes i.e. more relevant information.
Well I'll have to be careful here that I don't say something stupid, because my grasp of statistics needs work. I would certainly not expect the distribution to have two clear peaks unless the underlying quality of the posts was similarly two-peaked.
I think the explanation for this on LessWrong is the same as the explanation on Reddit (which, from what I understand, served as the code base for LW).
People don't have unlimited time, and they are willing to spend time on LW reading good posts, but unwilling to waste time reading bad posts. Thus many people will somehow filter lower posts (I do so by simply sorting the posts from highest rated to lowest rated, and read until I run out of time or get bored).
If many people do this, then posts which are "generally agreed as good" will then to shoot up as everyone votes them up. Posts which are bad will generally drop down to the filter point (usually around 0), and then stay there rather than go further below.
If everyone voted on every post, you would expected bad posts to continue dropping down, and a post which was "generally agreed as bad" would have just as large a magnitude as posts which are "generally agreed as good", except in the negative direction.
But we don't see posts with a score of -23. They seem to either be neutral, or good. So my theory that people filter to only see (and thus only vote on) neutral or better article seems to be able to predict what actually happens with the scores.
Great example. In general, when there have been many votes, most people would do best to believe the consensus (assuming it is meaningful). But if everyone just then votes based on their updated opinion, they will just reinforce the consensus, making it meaningless. So I think your suggestion is right that you should vote your personal judgement, as it was before you were informed by the voting data.
I have occasionally argued that this is kind of a prisoner's dilemma game. Cooperating means trying not to be persuaded by the consensus and voting your personal opinion. Defecting means updating from the consensus, thereby losing the ability to make an unbiased contribution to the collective.
A related issue is that comments on older posts are less likely to be read, and less likely to be voted on. Earlier comments are, on average, somewhat better comments (See one analysis of this here - http://www.marginalrevolution.com/marginalrevolution/2008/02/does-the-qualit.html ). But I still think the status quo is sub-optimal, especially when most people view comments sorted by popularity or in chronological order.
Now the fun part:
If this comment receives positive votes, how will that affect your assessment of its quality?
Perhaps LW could randomly hide scores of some articles for a while after they're posted. If this were done with enough articles that the sample included a wide range of article types and qualities, we could easily see just how significant an effect there is from having scores visible.
If we were serious about this, I'd suggest a double-blind experiment where for a randomly selected minority of posts or comments, half of us see a score higher than the real score and half see a lower score. Something like +/- per , so they still look believable and change as expected as the user makes a vote. We then see how this affected voting, and whether being influenced by scoring correlates to other factors. While it's on, users would be asked not to discuss specific scores.
Great idea. One potential problem though for these sorts of experiments is that knowledge (or reasonable suspicion) of the experiments would alter users' behavior.
Yes, but I'm hoping using a randomly selected minority posts or comments would help, and I'd expect our estimations as to which posts have been raised or lowered would be interestingly inaccurate. Maybe we could submit our guesses along with the probability we assign to each guess, then the calibration test results could be posted... :-)
I don't agree that rehashing old ideas is a bad thing, especially when good old ideas or their implications are being ignored. Novelty is valuable, but highly overrated.
There's no point to the rating system if we don't exercise our evaluations independently. Taking other people's opinions into consideration may be a good way to reach a final conclusion, but it's a terrible way to form your own opinion.
What precisely is the difference between being a "good Bayesian" as you describe it, and being a groupthinker? Is it only that the Bayesian has an explicit equation while the groupthinker probably doesn't?
The endpoints 1,2 and 4 are more or less equivalent; they are worth repeating though. There isn't really any worth in a score of votes on the true quality, at least not for bayesians. A score of votes on individual judgments would contain all useful information.
A thought experiment: You could use a double voting system: you make one vote on your beliefs before updating on the consensus and another vote in a separate count on your updated belief. The point would be to update on the consensus of the first vote count and use the second vote count for all other purposes, eg. promoting on the front page. This would allow broadcasting of each persons novel evidence (their individual judgement) as well as keeping some kind of aggregate score for the sites algorithms to work with. It would probably be easy to create an algorithm that makes full use of the first score though and as long as one can't think of a good use of the second count, one shouldn't vote on ones updated beliefs in a single vote system I guess.
A minor point about the calculations: An ideal bayesian wouldn't do the calculation you did. Knowing the voting procedure, they would dismiss any votes not contributing new information. As the order of the votes isn't public, they would have to keep a prior for the different orders and update on that. This is of course a minor quibble as this would lead to far too much calculations to be a reasonable model for any real reader.
"An ideal bayesian wouldn't..." I apologize, I'm not following.
I was dismissing votes not contributing new information. The order of the votes is partly deduced. Regarding the part that isn't deduced, there is no evidence to update on, and the prior is included - it's the (6:4) factor.
Would you mind posting what the ideal bayesian's calculations would look like?
[Sorry for not answering earlier, I didn't find the inbox until recently.]
I perhaps was a bit unclear, but when I say "ideal bayesian" I mean a mathematical construct that does full bayesian updating i.e. incorporates all prior knowledge into its calculations. This is of course impossible for anyone not extremely ignorant of the world, which is why I called it a minor point.
An ideal bayesian calculation would include massive deductive work on e.g. the psychology of voting, knowledge of the functioning of this community in particular etc.
My comment wasn't really an objection. To do a full bayesian calculation of a real world problem is comparable to using quantum mechanics for macroscopic systems. One must use approximations; the hard part is knowing when they break down.
We already have sections for both popular (up - down > threshold) and controversial (up + down > threshold). Is it that posts are automatically elevated to these states, or does that still need to be done by moderators? Is the throttling of post elevation that EY recently mentioned handled automatically or manually?
If elevation is handled manually by moderators, it seems it makes most sense to keep the tallies private, and let the moderators use bayesian math to adjust for priors. (I personally think that's overkill - might make a fun automation task however.)
The only reason to leave them public is so people can decide which posts to read. There's not enough time in the day for me to keep up with all the posts here -- hell, I can barely keep up with Eliezer's posts on OB.
Instead, it seems they should be kept private to avoid the biases pointed out.
Those like myself will likely only ever read popular posts - at which point it's too late to vote (elevation has already happened), and only occasionally dip into the new or controversial sections when particularly bored. I expect I'm in the majority. (Are we keeping stats of views vs votes? Probably a bit early to tell at this point.)
it seems they should be kept private to avoid the biases pointed out.
Personally I find the scores quite interesting. I like having a sense of what other aspiring rationalists reading alongside me are thinking.
I'm in favor of an option not to see the scores, but surely those here gathered to overcome bias should be allowed to strive to do that themselves without having to be protected from the information.
This article is a good explanation of information cascades:
http://www.starcitygames.com/magic/fundamentals/12201_Information_Cascades_in_Magic.html
Let's talk about how the very first reader would vote. If they judged the post high quality, then they would multiply the prior likelihood ratio (6:4) times the bayes factor for a high private signal (4:1), get (6 4:4 1) = (6:1) and vote the post up.
Voting is a decision, so you need utility in order to go from 6:1 to "vote up".
I agree. I should have said something about it, but at that point I'm using the assumption "Voters strive to vote the true quality of the post."
Isn't this voting business all rather ..... juvenile? Rationality (whatever it is) is not based on the simple addition of voting! 70 people can vote up and 69 vote down resulting in an objectively measured mediocre post. Or 100 can read the post and move on without voting and only two vote down - resulting in a negative karma. We have a list of boasts on the sidebar - Yudkowsky is 6 times more rational thanYvain and 15 times more rational than me and Hanson is lagging behind. Now we know whom we need to affiliate with or cheer on - Come on Hanson you need my vote. This is silly and rife with cognitive bias.
I don't think that the idea was that the higher your score the more rational you are, but I do agree that the "Top Contributors" thing seems to be more trouble than it's worth.
An information cascade is a problem in group rationality. Wikipedia has excellent introductions and links about the phenomenon, but here is a meta-ish example using likelihood ratios.
Suppose in some future version of this site, there are several well-known facts:
Let's talk about how the very first reader would vote. If they judged the post high quality, then they would multiply the prior likelihood ratio (6:4) times the bayes factor for a high private signal (4:1), get (6*4:4*1) = (6:1) and vote the post up. If they judged the post low quality then they would instead multiply by the bayes factor for a low private signal (1:4), get (6*1:4*4) = (3:8) and vote the post down.
There were two scenarios for the first reader (private information high or low). If we speculate that the first reader did in fact vote up, then there are two scenarios for the second scenario: There are two scenarios for the second reader:
Note that now there are two explanations for ending up two votes up. It could be that the second reader actually agreed, or it could be that the second reader was following the first reader and the prior against their personal judgement. That means that the third reader gets zero information from the second reader's personal judgement! The two scenarios for the third reader, and every future reader, are exactly analogous to the two scenarios for the second reader.
This has been a nightmare scenario of groupthink afflicting even diligent bayesians. Possible conclusions:
Note: Olle found an error that necessitated a rewrite. I apologize.