Opinion piece on the Swedish Network for Evidence-Based Policy
Cross-posted from the Effective Altruism Forum
Follow up to: The effectiveness-alone strategy and evidence-based policy
A translation of the opinion piece can be found here.
I
Effective altruism is a great concept, but it's not trivial to sell. There are therefore good reasons to ally ourselves with other rationalist memes to increase the level of rationality and effectiveness in the world. One powerful such rationalist meme is "evidence-based policy", which is inspired by the "evidence-based medicine" movement.
The exact meaning of evidence-based policy is somewhat disputed, but generally proponents of evidence-based policy demand that the standards on which policy is based should be raised. Many believe strongly in randomized control trials (RCTs) and in the "hierarchy of evidence", but there is not complete agreement on the strength of RCTs relative to other kinds of studies.
In the US and the UK, there are several organizations which work on evidence-based policy, such as the British What Works Network and the American Coalition for Evidence-Based Policy. Inspired by them, I took the initative to start a Swedish network for evidence-based policy at the start of this year. We are by now around 50 (depending on how you count) researchers, civil servants, journalists, consultants, students and other activists in the network. Only myself and a few others are EA members, so it's not an EA organization, but as I argued in my previous post, I do believe working on this nevertheless is an effectively altruistic cause.
One difference between us and What Works is that we aim to be a broad campaigning organization. We believe that policy not being evidence-based is not only due to a lack of knowledge, but also due to a lack of will, especially among politicians. Politicians often disregard expert advice (on what policies are the most effective to reach a given set of goals) which goes against their political prejudices. Therefore we need to put pressure on politicians - not the least in the media - rather than just work behind the scenes as an expert organization.
II (Most linked replies below are in Swedish)
Our activities were fairly modest until last Sunday, when we wrote an opinion piece calling for evidence-based policy (English). The opinion piece was published in the most widely-read broadsheet, Dagens Nyheter, on DN Debatt - a sort of op-ed forum. DN Debatt has a special standing in Swedish politics. Everybody reads it and it's well-respected.
Hence we had expected a lot of attention, but the results still exceeded them. Ours was the second most shared DN Debatt-article in the month of May. We got seven replies in Dagens Nyheter, were strongly criticized in the other main broadsheet, Svenska Dagbladet (conservative), parodied in a popular public service (equivalent of BBC) podcast, and were also commented on in a number of smaller newspapers. The discussion on Twitter was pretty intense. Subsequently, we also published two replies to replies in Dagens Nyheter and Svenska Dagbladet.
It's hard to tell what the majority opinion on our piece was. Certainly, there was a lot of praise and a lot of Facebook likes, but also some fierce criticism. This was almost exclusively down to misunderstandings. I won't bog you down with all of the details here, but will rather summarize my general conclusions. They could be useful for anyone trying to write on evidence-based policy or related concepts in other countries.
I should say that "evidence-based policy" isn't as entrenched a concept in Sweden as it is in the US and the UK, which probably played to our disadvantage.
1) You need to be very clear over the means-ends-distinction. Evidence-based policy is about making the methods for reaching your political goals (happiness, equality, liberty, etc) more effective by the use of evidence. It is not about propagating any particular set of political goals. We tried to be clear about this, but partly failed for two reasons. Firstly, Dagens Nyheter set the headline, which was misleading. Second, we only clarified this distinction at the end. It should have been at the top.
2) There is a straw man conception of evidence-based policy, or expert-informed policy and political rationality more generally, akin to Julia Galef's "Straw Vulcan" conception of rationality. Perhaps this varies a bit from country to country, but in Sweden it's strong. Let's call it "Straw Soviet" for now (please come with suggestions!).
According to this conception, evidence-based policy means technocracy (of a dictatorial form, according to the more extreme interpretations), disregard of non-quantifiable values (cf "Straw Vulcan"), disregard of emotions, "Mad Scientist"-conception of society as a labratory, etc, etc. You need to everything you can to counter such interpretations. I certainly underestimated the power of this straw man meme. I should also say that the Straw Soviet is probably more vicious than the Straw Vulcan, who seems more innocent (perhaps this is partly down to Julia's plafyul presentation of it, though).
For instance, Svenska Dagbladet's criticism was all about the "Straw Soviet". We were said to want to "design voter behaviour" (this was also partly due to the article having been signed by a few nudgers who call themselves "behavioural engineers" - a big trigger of the Straw Soviet). Here are some more quotes:
It is perhaps not the “enlightened despot” who is called for in the opinion piece, but rather Dr Despot. Today’s most frightening reading came from the recently formed “Network for Evidence-Based Policy” (Dagens Nyheter 1 June).
//
Since there probably are very few citizens who base their votes on research reports, free elections yields results which are not evidence-based. According to the argument in the opinion piece, that means that since we “see the world through partisan lenses”, the election results are as a rule problematic or directly harmful.
//
Now if the network were correct, true evidence-based policies would lead to a single proposal, a solution “free from ideology and populism”. That would in turn mean that all parties arrived at the same answer, and it is absolutely impossible why that – though ever so full of evidence – would be desirable.
A vibrant democracy is based on the existence of conflicts of opinion and value, intellectual diversity and the citizen’s right to freely express it. The complete and rational citizen is an anomaly, and based on the unpleasant idea that enlightened powers can raise, design, a new man.
Paradoxically, it is precisely highly ideological regimes which have attempted just that. The results have been devastating.
We got several other replies along these lines, though we also got a much more positive one from Dagens Nyheter itself. A large group of replies treated more technical and hum-drum issues concerning RCTs, practical policy-making, etc.
3) Connected to the Straw Vulcan and the Straw Soviet, there is a "Straw Naive Positivist Scientist" (again, suggestions for better terms are welcome), who thinks that knowledge is easily obtainable even in messy fields like economics, that it's easy to reach consensus if you just don't lead political misconceptions mislead you, that you always easily can infer policy-advice from research, etc. We got a lot of criticism which was based on the Straw Naive Positivist. Obviously, we don't hold any of those views.
4) People read very superficially. This is not only true of the man in the street, but also of many journalists, politicians, etc. At some level I know this, having myself written about research on this on my blog, but it's harder to make full use of that knowledge when you write.
Also lots of people don't use the principle of charity at all. Some of the replies - including one from a philosophy professor - were exceedingly uncharitable. Thus don't expect people to use the principle of charity - especially when emotional memes like the Straw Soviet are around.
When you fight such powerful memes, you need to be extremely clear. You need to say the things you really want to get across early, to repeat them, and to give examples. If at all possible, you should control the title, since that sets so much of the tone of the piece (give the publishers a juicy suggestion and they might buy it). Don't say too much, but focus on getting the central message across.
This is so different from writing an academic paper. Of course that's obvious, but it's one thing to get it on an intellectual level, quite another to really internalize it. If you could get a skilled public communicator on board, that would be very useful.
I also think it would be good to pre-test major articles (e.g. on Mechanical Turk) to get a clearer picture of whether the message gets across. If you don't want the content to leak beforehand, that might not be doable, though.
5) We were probably a bit too extreme regarding RCTs, which triggered the Straw Soviet and the Straw Naive Positivist (for epistemological and ethical reasons). It would have been more tactical to emphasize other stuff.
6) We would have come off as more concrete if we had based our opinion piece on a research report on the state of Swedish policy-making. It's great if you can do that, but I don't think it would have been rational for us (see below).
7) We should have stressed how big the movement on evidence-based policy is in the US and the UK. For instance, we could have mentioned that "Obama's 2016 budget calls for an emphasis on evidence-based approaches at all levels of government". Obama being popular and respected in Sweden, that would have done much to disarm the Straw Soviet.
8) It was a mistake to mention legal means as a way of making politics more evidence-based, since it strongly triggers the Soviet meme. Even those who otherwise supported us criticized this suggestion.
III
In our replies, we focused on rectifying the misunderstandings, focusing on the claim that we are calling for "Dr Despot". These replies normally gets much less attention, and so it was with ours as well. However, the reception also was more unanimously positive, especially from academics and civil servants who know the field.
I don't regret writing this opinion piece at this early stage. Before I started writing it (I wrote the body of the text, and the others then made minor tweaks) there wasn't much activity in our network. Now, we have many more members, including more senior ones. Also, those who already were in the network grew much more enthusiastic after the publication. Thus all-in-all it's been a major success. Still, I think you can learn a lot from things we could have done better.
I'll write more later on how the network is developing more generally. Also I should add that I'm still digesting what I've learnt, so my conclusions aren't set in stone. Any comments are welcome.
Could auto-generated troll scores reduce Twitter and Facebook harassments?
There's been a lot of discussion in the last few yeas on the problem of hateful behaviour on social media such as Twitter and Facebook. How can this problem be solved? Twitter and Facebook could of course start adopting stricter policies towards trolls and haters. They could remove more posts and tweets, and ban more users. So far, they have, however, been relatively reluctant to do that. Another more principled problem with this approach is that it could be seen as a restriction on the freedom of speech (especially if Twitter and Facebook were ordered to do this by law).
There's another possible solution, however. Using sentiment analysis, you could give Twitter and Facebook users a "troll score". Users whose language is hateful, offensive, racist, etc, would get a high troll score.* This score would in effect work as a (negative) reputation/karma score. That would in itself probably incentivize trolls to improve. However, if users would be allowed to block (and make invisible the writings by) any user whose troll score is above a certain cut-off point (of their choice), that would presumably incentivize trolls to improve even more.
Could this be done? Well, it's already been shown to be possible to infer your big five personality traits, with great accuracy, from what you've written and liked, respectively, on Facebook. The tests are constructed of the basis of correlations between data from standard personality questionnaires (more than 80'000 Facebook users filled in such tests on the behalf of YouAreWhatYouLike, who constructed one of the Facebook tests) and Facebook writings or likes. Once it's been established that, e.g. extraverted people tend to like certain kinds of posts, or use certain kinds of words, this knowledge can be used to predict the level of extraversion of Facebook users who haven't taken the questionnaire.
This suggest that there are no principled reasons a reliable troll score couldn't be constructed with today's technology. However, a problem is that while there are agreed criteria for what is to count as an extraverted person, there are no agreed criteria for what counts as a troll. Also, it seems you couldn't use questionnaires, since people who actually do behave like trolls online would be discinlined to admit that they do in a questionnaire.
One way to proceed could instead be this. First, you could define in rather general and vague terms what is to count as trolling - say "racism", "vicious attacks", "threats of violence", etc. You could then use two different methods to go from this vague definition to a precise score. The first is to let a number of sensible people give their troll scores of different Facebook posts and tweets (using the general and vague definition of what is to count as trolling). You would feed this into your algorithms, which would learn which combinations of words are characteristic of trolls (as judged by these people), and which arent't. The second is to simply list a number of words or phrases which would count as characteristic of trolls, in the sense of the general and vague definition. This latter method is probably less costly - particularly if you can generate the troll-lexicon automatically, say from existing dictionaries of offensive words - but also probably less accurate.
In any case, I expect it to be possible to solve this problem. The next problem is: who would do this? Facebook and Twitter should be able to construct the troll score, and to add the option of blocking all trolls, but do they want to? The risk is that they will think that the possible down-side to this is greater than the possible up-side. If people start disliking this rather radical plan, they might leave en masse, whereas if they like it, well, then trolls could potentially disappear, but it's unlikely that this will affect their bottom line drastically. Thus it's not clear that they will be more positive to this idea than they are to conventional banning/moderating methods.
Another option is for an outside company to create a troll score using Facebook or Twitter data. I don't know whether that's possible at present - whether you'd need Facebook and Twitter's consent, and whether they'd then be willing to give it. It seems you definitely would need it in order for the troll score to show up on your standard Facebook/Twitter account, and in order to enable users to block all trolls.
This second problem is thus much harder. A troll score could probably be constructed by Facebook and Twitter, but potentially they are not very likely to want to do it. Any suggestions on how to get around this problem would be appreciated.
My solution is very similar to the LessWrong solution to the troll problem. Just like you can make low karma users invisible on LessWrong, you would be able to block (and make invisible the writings by) Facebook and Twitter users with a high troll score. A difference is, though, that whereas karma is manually generated (by voting) the troll score would be automatically generated from your writings (for more on this distinction, see here).
One advantage of this method, as opposed to conventional moderation methods, is that it doesn't restrict freedom of speech in the same way. If trolls were blocked by most users, you'd achieve much the same effect as you would from bannings (the trolls wouldn't be able to speak to anyone), but in a very different way: it would result from lots of blockings from individual users, who presumably have a full right to block anyone, rather than from the actions of a central admin.
Let me finish with one last caveat. You could of course extend this scheme, and construct all sorts of scores - such as a "liberal-conservative score", with whose help you could block anyone whose political opinions are insufficiently close to yours. That would be a very bad idea, in my view. Scores of this sort should only be used to combat harassment, threats and other forms of anti-social behaviour, and not to exclude any dissenter from discussion.
* I here use "troll" in the wider sense which "equate[s] trolling with online harassment" rather than in the narrower (and original) sense according to which a troll is "a person who sows discord on the Internet by starting arguments or upsetting people, by posting inflammatory, extraneous, or off-topic messages in an online community (such as a newsgroup, forum, chat room, or blog) with the deliberate intent of provoking readers into an emotional response or otherwise disrupting normal on-topic discussion" (Wikipedia).
[Link] Algorithm aversion
It has long been known that algorithms out-perform human experts on a range of topics (here's a LW post on this by lukeprog). Why, then, is it that people continue to mistrust algorithms, in spite of their superiority, and instead cling to human advice? A recent paper by Dietvorst, Simmons and Massey suggests it is due to a cognitive bias which they call algorithm aversion. We judge less-than-perfect algorithms more harshly than less-than-perfect humans. They argue that since this aversion leads to poorer decisions, it is very costly, and that we therefore must find ways of combating it.
Abstract:
Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.
General discussion:
The results of five studies show that seeing algorithms err makes people less confident in them and less likely to choose them over an inferior human forecaster. This effect was evident in two distinct domains of judgment, including one in which the human forecasters produced nearly twice as much error as the algorithm. It arose regardless of whether the participant was choosing between the algorithm and her own forecasts or between the algorithm and the forecasts of a different participant. And it even arose among the (vast majority of) participants who saw the algorithm outperform the human forecaster.
The aversion to algorithms is costly, not only for the participants in our studies who lost money when they chose not to tie their bonuses to the algorithm, but for society at large. Many decisions require a forecast, and algorithms are almost always better forecasters than humans (Dawes, 1979; Grove et al., 2000; Meehl, 1954). The ubiquity of computers and the growth of the “Big Data” movement (Davenport & Harris, 2007) have encouraged the growth of algorithms but many remain resistant to using them. Our studies show that this resistance at least partially arises from greater intolerance for error from algorithms than from humans. People are more likely to abandon an algorithm than a human judge for making the same mistake. This is enormously problematic, as it is a barrier to adopting superior approaches to a wide range of important tasks. It means, for example, that people will more likely forgive an admissions committee than an admissions algorithm for making an error, even when, on average, the algorithm makes fewer such errors. In short, whenever prediction errors are likely—as they are in virtually all forecasting tasks—people will be biased against algorithms.
More optimistically, our findings do suggest that people will be much more willing to use algorithms when they do not see algorithms err, as will be the case when errors are unseen, the algorithm is unseen (as it often is for patients in doctors’ offices), or when predictions are nearly perfect. The 2012 U.S. presidential election season saw people embracing a perfectly performing algorithm. Nate Silver’s New York Times blog, Five Thirty Eight: Nate Silver’s Political Calculus, presented an algorithm for forecasting that election. Though the site had its critics before the votes were in— one Washington Post writer criticized Silver for “doing little more than weighting and aggregating state polls and combining them with various historical assumptions to project a future outcome with exaggerated, attention-grabbing exactitude” (Gerson, 2012, para. 2)—those critics were soon silenced: Silver’s model correctly predicted the presidential election results in all 50 states. Live on MSNBC, Rachel Maddow proclaimed, “You know who won the election tonight? Nate Silver,” (Noveck, 2012, para. 21), and headlines like “Nate Silver Gets a Big Boost From the Election” (Isidore, 2012) and “How Nate Silver Won the 2012 Presidential Election” (Clark, 2012) followed. Many journalists and popular bloggers declared Silver’s success a great boost for Big Data and statistical prediction (Honan, 2012; McDermott, 2012; Taylor, 2012; Tiku, 2012).
However, we worry that this is not such a generalizable victory. People may rally around an algorithm touted as perfect, but we doubt that this enthusiasm will generalize to algorithms that are shown to be less perfect, as they inevitably will be much of the time.
The Argument from Crisis and Pessimism Bias
Many people have argued that the public seems to have an overly negative view of society's development. For instance, this survey shows that the British public think that the crime rate has gone up, even though it has gone down. Similarly, Hans Rosling points out that the public has an overly negative view of developing world progress.
If we have such a pessimism bias, what might explain it? One standard explanation is that good news isn't news - only bad news is. A murder or a famine is news; their absense isn't. Hence people listening to the news gets a skewed picture of the world.
No doubt there is something to that. In this post I want, however, to point to another mechanism that gives rise to a pessimism bias, namely the compound effect of many uses of what I call the Argument from Crisis. (Please notify me if you've seen this idea somewhere else.)
The Argument from Crisis says that some social problem - say crime, poverty, inequality, etc - has worsened and that we therefore need to do something about it. This way of arguing is effective primarily because we are loss averse - because we think losing is worse than failing to win. By arguing that inequality was not as bad ten years ago and that we have now "lost" some degree of equality, your argument will be rhetorically stronger. The reason is that in that case more equality will eradicate a loss, whereas if inequality hasn't worsened, more equality will simply be a gain, which we value less. Hence we will be more inclined to act against inequality in the former case.
Even though the distinction between a gain and an eradication of a loss is important from a rhetorical point of view, it does not seem very relevant from a logical point of view. Whatever the level of crime or inequality is, it would seem that the value of reducing it is the same regardless of whether it has gone up or down the past ten years.
Another reason for why the Argument from Crisis is rhetorically effective is of course that we believe that whatever trend there is will continue (rightly or wrongly). Hence if we think that crime or inequality is increasing, we believe that it will continue do so unless we do something about it.
Both of these factors make the Argument from Crisis rhetorically effective. For this reason, many people argue that social problems which they want to alleviate are getting worse, even though in fact they are not.
I'd say the vast majority of people who use this argument are not conscious of doing it, but rather persuade themselves into believing that the problem they want to alleviate is getting worse. Indeed, I think that the subconscious use of this argument is a major reason why radicals often think the world is on a downward slope. The standard view is of course that they want radical change because they believe that the world has got worse, but I think that to some extent, the causality is reversed: they believe that the world has got worse because they want radical change.
Since the Argument from Crisis is so rhetorically effective, it gets used a lot. The effect of this is to create, among the public at large, a pessimism bias - an impression that the world is getting worse rather than better, in face of evidence to the contrary. This in turn helps various backward-looking political movements. Hence I think that we should do more to combat the Argument from Crisis, even though it can sometimes be a rhetorically effective means to persuade people to take action on important social problems.
Reverse engineering of belief structures
(Cross-posted from my blog.)
Since some belief-forming processes are more reliable than others, learning by what processes different beliefs were formed is for several reasons very useful. Firstly, if we learn that someone's belief that p (where p is a proposition such as "the cat is on the mat") was formed a reliable process, such as visual observation under ideal circumstances, we have reason to believe that p is probably true. Conversely, if we learn that the belief that p was formed by an unreliable process, such as motivated reasoning, we have no particular reason to believe that p is true (though it might be - by luck, as it were). Thus we can use knowledge about the process that gave rise to the belief that p to evaluate the chance that p is true.
Secondly, we can use knowledge about belief-forming processes in our search for knowledge. If we learn that some alleged expert's beliefs are more often than not caused by unreliable processes, we are better off looking for other sources of knowledge. Or, if we learn that the beliefs we acquire under certain circumstances - say under emotional stress - tend to be caused by unreliable processes such as wishful thinking, we should cease to acquire beliefs under those circumstances.
Thirdly, we can use knowledge about others' belief-forming processes to try to improve them. For instance, if it turns out that a famous scientist has used outdated methods to arrive at their experimental results, we can announce this publically. Such "shaming" can be a very effective means to scare people to use more reliable methods, and will typically not only have an effect on the shamed person, but also on others who learn about the case. (Obviously, shaming also has its disadvantages, but my impression is that it has played a very important historical role in the spreading of reliable scientific methods.)
A useful way of inferring by what process a set of beliefs was formed is by looking at its structure. This is a very general method, but in this post I will focus on how we can infer that a certain set of beliefs most probably was formed by (politically) motivated cognition. Another use is covered here and more will follow in future posts.
Let me give two examples. Firstly, suppose that we give American voters the following four questions:
- Do expert scientists mostly agree that genetically modified foods are safe?
- Do expert scientists mostly agree that radioactive wastes from nuclear power can be safely disposed of in deep underground storage facilities?
- Do expert scientists mostly agree that global temperatures are rising due to human activities?
- Do expert scientists mostly agree that the "intelligent design" theory is false?
The answer to all of these questions is "yes".* Now suppose that a disproportionate number of republicans answer "yes" to the first two questions, and "no" to the third and the fourth questions, and that a disproportionate number of democrats answer "no" to the first two questions, and "yes" to the third and the fourth questions. In the light of what we know about motivated cognition, these are very suspicious patterns or structures of beliefs, since that it is precisely the patterns we would expect them to arrive at given the hypothesis that they'll acquire whatever belief on empirical questions that suit their political preferences. Since no other plausibe hypothesis seem to be able to explain these patterns as well, this confirms this hypothesis. (Obviously, if we were to give the voters more questions and their answers would retain their one-sided structure, that would confirm the hypothesis even stronger.)
Secondly, consider a policy question - say minimum wages - on which a number of empirical claims have bearing. For instance, these empirical claims might be that minimum wages significantly decrease employers' demand for new workers, that they cause inflation and that they significantly reduce workers' tendency to use public services (since they now earn more). Suppose that there are five such claims which tell in favour of minimum wages and five that tell against them, and that you think that each of them has a roughly 50 % chance of being true. Also, suppose that they are probabilistically independent of each other, so that learning that one of them is true does not affect the probabilities of the other claims.
Now suppose that in a debate, all proponents of minimum wages defend all of the claims that tell in favour of minimum wages, and reject all of the claims that tell against them, and vice versa for the opponents of minimum wages. Now this is a very surprising pattern. It might of course be that one side is right across the board, but given your prior probability distribution (that the claims are independent and have a 50 % probability of being true) a more reasonable interpretation of the striking degree of coherence within both sides is, according to your lights, that they are both biased; that they are both using motivated cognition. (See also this post for more on this line of reasoning.)
The difference between the first and the second case is that in the former, your hypothesis that the test-takers are biased is based on the fact that they are provably wrong on certain questions, whereas in the second case, you cannot point to any issue where any of the sides is provably wrong. However, the patterns of their claims are so improbable given the hypothesis that they have reviewed the evidence impartially, and so likely given the hypothesis of bias, that they nevertheless strongly confirms the latter. What they are saying is simply "too good to be true".
These kinds of arguments, in which you infer a belief-forming process from a structure of beliefs (i.e you reverse engineer the beliefs), have of course always been used. (A salient example is Marxist interpretations of "bourgeois" belief structures, which, Marx argued, supported their material interests to a suspiciously high degree.) Recent years have, however, seen a number of developments that should make them less speculative and more reliable and useful.
Firstly, psychological research such as Tversky and Kahneman's has given us a much better picture of the mechanisms by which we acquire beliefs. Experiments have shown that we fall prey to an astonishing list of biases and identified which circumstances that are most likely to trigger them.
Secondly, a much greater portion of our behaviour is now being recorded, especially on the Internet (where we spend an increasing share of our time). This obviously makes it much easier to spot suspicious patterns of beliefs.
Thirdly, our algorithms for analyzing behaviour are quickly improving. FiveLabs recently launched a tool that analyzes your big five personality traits on the basis of your Facebook posts. Granted, this tool does not seem completely accurate, and inferring bias promises to be a harder task (since the correlations are more complicated than that between usage of exclamation marks and extraversion, or that betwen using words such as "nightmare" and "sick of" and neuroticism). Nevertheless, better algorithms and more computer power will take us in the right direction.
In my view, there is thus a large untapped potential to infer bias from the structure of people's beliefs, which in turn would be inferred from their online behaviour. In coming posts, I intend to flesh out my ideas on this in some more details. Any comments are welcome and might be incorporated in future posts.
* The second and the third questions are taken from a paper by Dan Kahan et al, which refers to the US National Academy of Sciences (NAS) assessment of expert scientists' views on these questions. Their study shows that many conservatives don't believe that experts agree on climate change, whereas a fair number of liberals think experts don't agree that nuclear storage is safe, confirming the hypothesis that people let their political preferences influence their empirical beliefs. The assessment of expert consensus on the first and fourth question are taken from Wikipedia.
Asking people what they think about the expert consensus on these issues, rather than about the issues themselves, is good idea, since it's much easier to come to an agreement on what the true answer is on the former sort of question. (Of course, you can deny that professors from prestigious universities count as expert scientists, but that would be a quite extreme position that few people hold.)
Three methods of attaining change
Say that you want to change some social or political institution: the educational system, the monetary system, research on AGI safety, or what not. When trying to reach this goal, you may use one of the following broad strategies (or some combination of them):
1) You may directly try to lobby (i.e. influence) politicians to implement this change, or try to influence voters to vote for parties that promise to implement these changes.
2) You may try to build an alternative system and hope that it eventually becomes so popular so that it replaces the existing system.
3) You may try to develop tools that a) appeal to users of existing systems and b) whose widespread use is bound to change those existing systems.
Let me give some examples of what I mean. Trying to persuade politicians that we should replace conventional currencies by a private currency or, for that matter, starting a pro-Bitcoin party, fall under 1), whereas starting a private currency and hope that it spreads falls under 2). (This post was inspired by a great comment by Gunnar Zarncke on precisely this topic. I take it that he was there talking of strategy 2.) Similarly, trying to lobby politicians to reform the academia falls under 1) whereas starting new research institutions which use new and hopefully more effective methods falls under 2). I take it that this is what, e.g. Leverage Research is trying to do, in part. Similarly, libertarians who vote for Ron Paul are taking the first course, while at least one possible motivation for the Seasteading Institute is to construct an alternative system that proves to be more efficient than existing governments.
Efficient Voting Advice Applications (VAA's), which advice you to vote on the basis of your views on different policy matters, can be an example of 3) (they are discussed here). Suppose that voters started to use them on a grand scale. This could potentially force politicians to adhere very closely to the views of the voters on each particular issue, since if you failed to do this you would stand little chance of winning. This may or may not be a good thing, but the point is that it would be a change that would not be caused by lobbying of politicians or by building an alternative system, but simply by constructing a tool whose widespread use could change the existing system.
Another similar tool is reputation or user review systems. Suppose that you're dissatisfied with the general standards of some institution: say university education, medical care, or what not. You may attain this by lobbying politicians to implement new regulations intended to ensure quality (1), or by starting your own, superior, universities or hospitals (2), hoping that others will follow. Another method is, however, to create a reliable reputation/review system which, if they became widely used, would guide students and patients to the best universities and hospitals, thereby incentivizing to improve.
Now of course, when you're trying to get people to use such review systems, you are, in effect, building an evaluation system that competes with existing systems (e.g. the Guardian university ranking), so on one level you are using the second strategy. Your ultimate goal is, however, to create better universities, to which better evaluation systems, is just a means (as a tool). Hence you're following the third strategy here, in my terms.
Strategy 1) is of course a "statist" one, since what you're doing here is that you're trying to get the government to change the institution in question for you. Strategies 2) and 3) are, in contrast, both "non-statist", since when you use them you're not directly trying to implement the change through the political system. Hence libertarians and other anti-statists should prefer them.
My hunch is that when people are trying to change things, many of them unthinkingly go for 1), even regarding issues where it is unlikely that they are going to succeed that way. (For instance, it seems to me that advocates for direct democracy who try to persuade voters to vote for direct democratic parties are unlikely to succeed, but that widespread of VAA's might get us considerably closer to their ideal, and that they therefore should opt for the third strategy.) A plausible explanation of this is availability bias; our tendency to focus on what we most often see around us. Attempts to change social institutions through politics get a lot of attention, which makes people think of this strategy first. Even though this strategy is often efficient, I'd guess it is, for this reason, generally overused and that people sometimes instead should go for 2) or 3). (Possibly, Europeans have an even stronger bias in favour of this strategy than Americans.)
I also suspect, though, that people go for 2) a bit too often relative to 3). I think that people find it appealing, for its own sake, to create an entirely alternatively structure. If you're a perfectionist, it might be satisfying to see what you consider "the perfect institution", even if it is very small and has little impact on society. Also, sometimes small groups of devotees flock to these alternatives, and a strong group identity is therefore created. Moreover, I think that availability bias may play a role here, also. Even though this sort of strategy gets less attention than lobbying, most people know what it is. It is quite clear what it means to do something like this, and being part of a project like this therefore gives you a clear identity. For these reasons, I think that we might sometimes fool ourselves into believing that these alternative structures are more likely to be succesful than they actually are.
Conversely, people might be biased against the third strategy because it's less obvious. Also, it has perhaps something vaguely manipulative over it which might bias idealistic people against it. What you're typically trying to do is to get people to use a tool (say VAA's) a side-effect of which is the change you wish to attain (in this case, correspondence between voters' views and actual policies). I don't think that this kind of manipulation is necessarily vicious (but it would need to be discussed on a case-by-case-basis) but the point is that people tend to think that it is. Also, even those who don't think that it is manipulative in an unethical sense would still think that it is somehow "unheroic". Starting your own environmental party or creating your own artifical libertarian island clearly has something heroic over it, but developing efficient VAA's, which as a side-effect changes the political landscape, does not.
I'd thus argue that people should start looking more closely at the third strategy. A group that does use a strategy similar to this is of course for-profit companies. They try to analyze what products would appeal to people, and in so doing, carefully consider how existing institutions shape people's preferences. For instance, companies like Uber, AirBnB and LinkedIn have been succesful because they realized that given the structure of the taxi, the hotel and the recruitment businesses, their products would be appealing.
Of course, these companies primary goal, profit, is very different from the political goals I'm talking about here. At the same time, I think it is useful to compare the two cases. I think that generally, when we're trying to attain political change, we're not "actually trying" (in CFAR's terminology) as hard as we do when we're trying to maximize profit . It is very easy to fall into a mode where you're focusing on making symbolic gestures (which express your identity) rather than on trying to change things in politics. (This is, in effect, what many traditional charities are doing, if the EA movement is right.)
Instead, we should think as hard as profit-maximizing companies what new tools are likely to catch on. Any kind of tools could in principle be used, but the ones that seem most obvious are various kind of social media and other internet based tools (such as those mentioned in this post). The technical progress gives us enormous opportunities to costruct new tools that could re-shape people's behaviour in a way that would impact existing social and political institutions on a large scale.
Developing such tools is not easy. Even very succesful companies again and again fail to predict what new products will appeal to people. Not the least, you need a profound understanding of human psychology in order to succeed. That said, political organizations have certain advantages visavi for-profit companies. More often than not, they might develop ideas publically, whereas for-profit companies often have to keep them secret until they product is launched. This facilitates wisdom of the crowd-reasoning, where many different kinds of people come up with solutions together. Such methods can, in my opinion, be very powerful.
Any input regarding, e.g. the taxonomy of methods, my speculations about biases, and, in particular, examples of institution changing tools are welcome. I'm also interested in comments on efficient methods for coming up with useful tools (e.g. tests of them). Finally, if anything's unclear I'd be happy to provide clarifications (it's a very complex topic).
Multiple Factor Explanations Should Not Appear One-Sided
In Policy Debates Should Not Appear One-Sided, Eliezer Yudkowsky argues that arguments on questions of fact should be one-sided, whereas arguments on policy questions should not:
On questions of simple fact (for example, whether Earthly life arose by natural selection) there's a legitimate expectation that the argument should be a one-sided battle; the facts themselves are either one way or another, and the so-called "balance of evidence" should reflect this. Indeed, under the Bayesian definition of evidence, "strong evidence" is just that sort of evidence which we only expect to find on one side of an argument.
But there is no reason for complex actions with many consequences to exhibit this onesidedness property.
The reason for this is primarily that natural selection has caused all sorts of observable phenomena. With a bit of ingenuity, we can infer that natural selection has caused them, and hence they become evidence for natural selection. The evidence for natural selection thus has a common cause, which means that we should expect the argument to be one-sided.
In contrast, even if a certain policy, say lower taxes, is the right one, the rightness of this policy does not cause its evidence (or the arguments for this policy, which is a more natural expression), the way natural selection causes its evidence. Hence there is no common cause of all of the valid arguments of relevance for the rightness of this policy, and hence no reason to expect that all of the valid arguments should support lower taxes. If someone nevertheless believes this, the best explanation of their belief is that they suffer from some cognitive bias such as the affect heuristic.
(In passing, I might mention that I think that the fact that moral debates are not one-sided indicates that moral realism is false, since if moral realism were true, moral facts should provide us with one-sided evidence on moral questions, just like natural selection provides us with one-sided evidence on the question how Earthly life arose. This argument is similar to, but distinct from, Mackie's argument from relativity.)
Now consider another kind of factual issues: multiple factor explanations. These are explanations which refer to a number of factors to explain a certain phenomenon. For instance, in his book Guns, Germs and Steel, Jared Diamond explains the fact that agriculture first arose in the Fertile Crescent by reference to no less than eight factors. I'll just list these factors briefly without going into the details of how they contributed to the rise of agriculture. The Fertile Crescent had, according to Diamond (ch. 8):
- big seeded plants, which were
- abundant and occurring in large stands whose value was obvious,
- and which were to a large degree hermaphroditic "selfers".
- It had a higher percentage of annual plants than other Mediterreanean climate zones
- It had higher diversity of species than other Mediterreanean climate zones.
- It has a higher range of elevations than other Mediterrenean climate zones
- It had a great number of domesticable big mammals.
- The hunter-gatherer life style was not that appealing in the Fertile Crescent
(Note that all of these factors have to do with geographical, botanical and zoological facts, rather than with facts about the humans themselves. Diamond's goal is to prove that agriculture arose in Eurasia due to geographical luck rather than because Eurasians are biologically superior to other humans.)
Diamond does not mention any mechanism that would make it less likely for agriculture to arise in the Fertile Crescent. Hence the score of pro-agriculture vs anti-agriculture factors in the Fertile Crescent is 8-0. Meanwhile no other area in the world has nearly as many advantages. Diamond does not provide us with a definite list of how other areas of the world fared but no non-Eurasian alternative seem to score better than about 5-3 (he is primarily interested in comparing Eurasia with other parts of the world).
Now suppose that we didn't know anything about the rise of agriculture, but that we knew that there were eight factors which could influence it. Since these factors would not be caused by the fact that agriculture first arose in the Fertile Crescent, the way the evidence for natural selection is caused by the natural selection, there would be no reason to believe that these factors were on average positively probabilistically dependent of each other. Under these conditions, one area having all the advantages and the next best lacking three of them is a highly surprising distribution of advantages. On the other hand, this is precisely the pattern that we would expect given the hypothesis that Diamond suffers from confirmation bias or another related bias. His theory is "too good to be true" and which lends support to the hypothesis that he is biased.
In this particular case, some of the factors Diamond lists presumably are positively dependent on each other. Now suppose that someone argues that all of the factors are in fact strongly positively dependent on each other, so that it is not very surprising that they all co-occur. This only pushes the problem back, however, because now we want an explanation of a) what the common cause of all of these dependencies is (it being very improbable that they all would correlate in the absence of such a common cause) and b) how it could be that this common cause increases the probability of the hypothesis via eight independent mechanisms, and doesn't decrease it via any mechanism. (This argument is complicated and I'd be happy on any input concerning it.)
Single-factor historical explanations are often criticized as being too "simplistic" whereas multiple factor explanations are standardly seen as more nuanced. Many such explanations are, however, one-sided in the way Diamond's explanation is, which indicates bias and dogmatism rather than nuance. (Another salient example I'm presently studying is taken from Steven Pinker's The Better Angels of Our Nature. I can provide you with the details on demand.*) We should be much better at detecting this kind of bias, since it for the most part goes unnoticed at present.
Generally, the sort of "too good to be true"-arguments to infer bias discussed here are strongly under-utilized. As our knowledge of the systematic and predictable ways our thought goes wrong increase, it becomes easier to infer bias from the structure or pattern of people's arguments, statements and beliefs. What we need is to explicate clearly, preferably using probability theory or other formal methods, what factors are relevant for deciding whether some pattern of arguments, statements or beliefs most likely is the result of biased thought-processes. I'm presently doing research on this and would be happy to discuss these questions in detail, either publicly or via pm.
*Edit: Pinker's argument. Pinker's goal is to explain why violence has declined throughout history. He lists the following five factors in the last chapter:
- The Leviathan (the increasing influence of the government)
- Gentle commerce (more trade leads to less violence)
- Feminization
- The expanding (moral) circle
- The escalator of reason
- Weaponry and disarmanent (he claims that there are no strong correlations between weapon developments and numbers of deaths)
- Resource and power (he claims that there is little connection between resource distributions and wars)
- Affluence (tight correlations between affluence and non-violence are hard to find)
- (Fall of) religion (he claims that atheist countries and people aren't systematically less violen
Separating university education from grading
One of many problems with the contemporary university system is that the same institutions that educate students also give them their degrees and grades. This obviously creates massive incentives for grade inflation and lowering of standards. Giving a thorough education requires hard work not only from students but also from the professors. In the absence of an independent body that tests that the students actually have learnt what they are supposed to have learnt, many professors spend as little time as possible at teaching, giving the students light workloads (something most of them of course happily accept). The faculty/student non-aggression pact is an apt term for this.
To see how absurd this system is, imagine that we would have the same system for drivers' licenses: that the driving schools that train prospective drivers also tested them and issued their drivers' licenses. In such a system, people would most probably chose the most lenient schools, leading to a lowering of standards. For fear of such a lowering of standards, prospective drivers are in many countries (I would guess universally but do not know that for sure) tested by government bodies.
Presumably, the main reason for this is that governments really care about the lowering of drivers' standards. Ensuring that all drivers are appropriately educated (i.e. is seen as very important. By contrast, the governments don't care that much about the lowering of academic standards. If they would, they would long ago have replaced a present grading/certification system with one where students are tested by independent bodies, rather than by the universities themselves.
This is all the more absurd given how much politicians in most countries talk about the importance of education. More often than not they talk about education, especially higher education, as a panacea to cure for all ills. However, if we look at the politicians' actions, rather than at their words, it doesn't seem like they actually do think it's quite as important as they say to ensure that the population is well-educated.
Changing the system for certifying students is important not the least in order to facilitate inventions in higher education. The present system discriminates in favour of traditional campus courses, which are both expensive and fail to teach the students as much as they should. I'm not saying that online courses, and other non-standard courses, are necessarily better or more cost-effective, but they should get the chance to prove that they are.
The system is of course hard to change, since there are lots of vested interests that don't want it to change. This is nicely illustrated by the reactions to a small baby-step towards the system that I'm envisioning that OECD is presently trying to take. Financial Times (which has a paywall, unfortunately) reports that OECD are attempting to introduce Pisa-style tests to compare students from higher education institutions around the world. Third year students would be tested on critical thinking, analytical reasoning, problem solving and written communcation. There would also be discipline-specific trials for economics and engineering.
These attempts have, however, not progressed because of resistance from some universities and member countries. OECD says that the resistance often comes from "the most prestigious institutions, because they have very little to win...and a lot to lose". In contrast, "the greatest supporters are the ones that add the greatest value...many of the second-tier institutes are actually a lot better and they're very keen to get on a level playing field."
I figure that if OECD get enough universities on board, they could start implementing the system without the obstructing top universities. They could also allow students from those universities to take the tests independently. If employers started taking these tests seriously, students would have every reason to take them even if their universities haven't joined. Slowly, these presumably more objective tests, or others like them, would become more important at the cost of the universities' inflated grades. People often try to change institutions or systems directly, but sometimes it is more efficient to build alternative systems, show that their useful to the relevant actors, and start out-competing the dominant system (as discussed in these comments).
The End of Bullshit at the hands of Critical Rationalism
The public debate is rife with fallacies, half-lies, evasions of counter-arguments, etc. Many of these are easy to spot for a careful and intelligent reader/viewer - particularly one who is acquainted with the most common logical fallacies and cognitive biases. However, most people arguably often fail to spot them (if they didn't, then these fallacies and half-lies wouldn't be as effective as they are). Blatant lies are often (but not always) recognized as such, but these more subtle forms of argumentative cheating (which I shall use as a catch-all phrase from now on) usually aren't (which is why they are more frequent).
The fact that these forms of argumentative cheating are a) very common and b) usually easy to point out suggests that impartial referees who painstakingly pointed out these errors could do a tremendous amount of good for the standards of the public debate. What I am envisioning is a website like factcheck.org but which would not focus primarily on fact-checking (since, like I said, most politicians are already wary of getting caught out with false statements of fact) but rather on subtler forms of argumentative cheating.
Ideally, the site would go through election debates, influential opinion pieces, etc, more or less line by line, pointing out fallacies, biases, evasions, etc. For the reader who wouldn't want to read all this detailed criticism, the site would also give an overall rating of the level of argumentative cheating (say from 0 to 10) in a particular article, televised debate, etc. Politicians and others could also be given an overall cheating rating, which would be a function of their cheating ratings in individual articles and debates. Like any rating system, this system would serve both to give citizens reliable information of which arguments, which articles, and which people, are to be trusted, and to force politicians and other public figures to argue in a more honest fashion. In other words, it would have both have an information-disseminating function and a socializing function.
How would such a website be set up? An obvious suggestion is to run it as a wiki, where anyone could contribute. Of course, this wiki would have to be very heavily moderated - probably more so than Wikipedia - since people are bound to disagree on whether controversial figures' arguments really are fallacious or not. Presumably you will be forced to banish trolls and political activists on a grand scale, but hopefully this wouldn't be an unsurmountable problem.
I'm thinking that the website should be strongly devoted to neutrality or objectivity, as is Wikipedia. To further this end, it is probably better to give the arguer under evaluation the benefit of the doubt in borderline cases. This would be a way of avoiding endless edit wars and ensure objectivity. Also, it's a way of making the contributors to the site concencrate their efforts on the more outrageous cases of cheating (which there are many of in most political debates and articles, in my view).
The hope is that a website like this would make the public debate transparent to an unprecedented degree. Argumentative cheaters thrive because their arguments aren't properly scrutinized. If light is shone on the public debate, it will become clear who cheats and who doesn't, which will give people strong incentives not to cheat. If people respected the site's neutrality, its objectivity and its integrity, and read what it said, it would in effect become impossible for politicians and others to bullshit the way they do today. This could mark the beginning of the realization of an old dream of philosophers: The End of Bullshit at the hands of systematic criticism. Important names in this venerable tradition include David Hume, Rudolf Carnap and the other logical positivists, and not the least, the guy standing statue outside my room, the "critical rationalist" (an apt name for this enterprise) Karl Popper.
Even though politics is an area where bullshit is perhaps especially common, and one where it does an exceptional degree of harm (e.g. vicious political movements such as Nazism are usually steeped in bullshit) it is also common and harmful in many other areas, such as science, religion, advertising. Ideally critical rationalists should go after bullshit in all areas (as far as possible). My hunch is, though, that it would be a good idea to start off with politics, since it's an area that gets lots of attention and where well-written criticism could have an immediate impact.

Book review: The Reputation Society. Part II
This is the second part of my book review of The Reputation Society. See the first part for an overview of the structure of the review.
Central concepts of The Reputation Society
Aggregation of reputational information. Since the book is entirely untechnical, and since aggregation rules by their nature are mathematical formulae, there isn’t much on aggregation rules (i.e., on how we are to aggregate individuals' ranking of, e.g., a person or a product, into one overall rating) in the book. The choice of aggregation rules is, however, obviously very important to optimize the different functions of reputation systems.
One problem that is discussed, though, is whether the aggregation rules should be transparent or not (e.g., in chs. 1 and 3). Concealing them makes it harder for participants to game the system, but on the other hand it makes it easier for the system providers to game the system (for instance, Google has famously been accused of manipulating search results for money). Hence concealment of the aggregation rules can damage the credibility of the site. (See also Display of reputational information.)
Altruism vs self-interest as incentives in rating systems. An important question for any rating system is whether it should appeal to people’s altruism (their community spirit) or to their self-interest. Craig Newmark (foreword) seems to take the former route, arguing that “people are normally trustworthy”, whereas the authors of ch. 11 argue that scientists need to be given incentives that appeal to their self-interest to take part in their reputation system.
It could be argued that the success of Wikipedia shows that appealing to people’s self-interest is not necessary to get them to contribute. On the other hand, it could also be argued that the notion that Wikipedia has been successful is due to a lack of imagination concerning the potential of sites with user-generated content. Perhaps Wikipedia would have been still more successful if they had given contributors stronger incentives.
Anonymity in online systems. Dellarocas (ch. 1) emphasizes that letting social network users remain anonymous while failing to guard against the creation of multiple identities facilitates gaming greatly. On the other hand, prohibitions against remaining anonymous might raise privacy concerns.
Display of reputational information. Dellarocas (ch. 1, Location 439, p. 7) discusses a number of ways of displaying reputational information:
- Simple statistics (number of transactions, etc.
- Star ratings (e.g. Amazon reviews)
- Numerical scores (e.g., eBay’s reputation score)
- Numbered tiers (e.g., World of Warcraft player levels)
- Achievement badges (e.g., Yelp elite reviewer)
- Leaderboards (lists where users are ranked relative to other users; e.g. list of Amazon top reviewers.
See gaming for a brief discussion of the advantages and disadvantages of comparative (e.g., 6) and non-comparative systems (e.g., 5).
Expert vs peer rating systems. Most pre-Internet rating systems were ran by experts (e.g., movie guides, restaurant guides, etc.) Internet has created huge opportunities for rating systems where large number of non-expert ratings and votes are aggregated into an overall rating. Proponents of the Wisdom of the crowd argue that even though many non-experts are not very reliable, the noise tends to even out as the number of rater grows, and we are left with an aggregated judgment which can beat that of experienced experts.
However, the Internet also offers new ways of identifying experts (emphasized, e.g., in ch. 8). People whose written recommendations are popular, or whose ratings are reliable as measured against some objective standard (if such a standard can be constructed – that obviously depends on context) can be given a special status. For instance, their recommendations can become more visible, and their ratings more heavily weighted. It could be argued that such systems are more meritocratic ways of identifying the experts than the ones that dominate society today (see, e.g., ch. 8).
Explicit vs implicit reputation systems. In the former, your reputation is a function of other users’ votes, whereas in the latter, your reputation is derived from other forms of behavior (e.g., the number of readers of your posts, your number of successful transactions, etc.). This is a distinction made by several authors, but unfortunately they use different terms for it, something which is never acknowledged. Here the editors should have done a better job.
In the language of economics, the implicit reputation systems (such as Google’s page rank system) are, by and large, based on people's revealed preferences - by their actions – whereas explicit reputation systems are built on their stated preferences. Two main advantages of revealed preferences are that we typically get them for free (since we infer them from publically observable behavior that people do for other reasons – e.g., making a link to a page – whereas we need to ask people if we want their stated preferences) and that they typically express people’s true preferences (whereas their stated preferences might be false – see untruthful reporting). On the other hand, we typically only get quite coarse-grained information about people’s preferences by observing their behavior (e.g., observing that John chose a Toyota over a Ford does not tell us whether he did that because it was cheaper, or because of a preference for Japanese cars, or because of its lower fuel consumption, etc.), whereas we can get more fine-grained information about their preferences by asking them to state them.
Functions of reputation systems. Dellarocas (ch. 1, Location 364, p. 4) argue that online reputation systems have the following functions (to varying degrees, depending on the system):
a) a socializing function (rewarding desired behavior and punish undesired one; build trust). As pointed in chs. 6 and 7, this makes reputation systems an alternative to other systems intended to socialize people; in particular government regulation (backed by the threat of force). This should make reputation systems especially interesting to those opposed to the latter (e.g., libertarians).
b) an information-filtering function (makes reliable information more visible).
c) a matching function (matching users with similar interests and tastes in, e.g., restaurants or films – this is similar to b) with the difference that it is not assumed that some users are more reliable than others).
d) a user lock-in function – users who have spent considerable amounts of time creating a good reputation on one site are unlikely to change to another site where they have to start from scratch.
Gaming. Gaming has been a massive problem at many sites making use of reputation systems. In general, more competitive/comparative displays of reputational information exacerbate gaming problems (as pointed out in ch. 2). On the other hand, strong incentives to gain a good reputation are to some extent necessary to solve the undersupply of reputational information problem.
Dellarocas (ch. 1) emphasizes that it is impossible to create a system that is totally secure from manipulation. Manipulators will continuously come up with new gaming strategies, and therefore the site’s providers constantly have to update its rules. The situation is, however, quite analogous to the interplay between tax evaders and legislators and hence these problems are not unique to online rating systems by any means.
Global vs personalized/local trust metrics (Massa, ch. 14). While the former gives the same assessments of the trustworthiness of person X to each other person Y, the latter gives different assessments of the trustworthiness of X to different people. Thus, the former are comprised of statements such as “the reputation of Carol is .4”, the latter of statements such as “Alice should trust Carol to degree .9” and “Bob should trust Carol to degree .1” (Location 3619, p. 155). Different people may trust others to different degrees based on their beliefs and preferences, and this is reflected in the personalized trust metrics. Massa argues that a major problem with global rating systems is that they lead to “the tyranny of the majority”, where original views are unfairly down-voted. At the same time, he also argues that the use of personalized trust metrics may lead to the formation of echo chambers, where people only listen to those who agree with them.
Immune system disorders of reputation systems (Foreword). Rating systems can be seen as “immune systems” intended to give protection against undesirable behavior and unreliable information. However, they can also give rise to diseases of their own. For instance, the academic “rating systems” based mainly on number of articles and numbers of citations famously give rise to all sorts of undesirable behavior (see section IV, chs. 10-12, on the use of rating/reputation systems in science). An optimal rating system would of course minimize these immune system disorders.
Karma as currency. This idea is developed in several chapters (e.g., 1 and 2) but especially in the last chapter (18) by Madeline Ashby and Cory Doctorow, two science fiction writers. They envision a reputation-based future society where people earn “Whuffie” – Karma or reputation – when they are talked about, and spend it when they talk about others. You can also exchange Whuffie for goods and services, effectively making it a currency.
Moderation. Moderation is to some extent an alternative to ratings in online forums. Moderators could either be paid professionals, or picked from the community of users (the latter arguably being more cost-efficient; ch. 2). The moderators can in turn be moderated in a meta-moderation system used, e.g., by Slashdot (their system is discussed by several of the authors).
Yet another system which in effect is a version of the meta-moderation system is the peer-prediction model (see ch. 1), in which your ratings are assessed on the basis of whether they manage to predict subsequent ratings. These later ratings then in effect function as meta-ratings of your ratings.
Privacy – several authors raise concerns over privacy (in particular chs. 16-18). In a fully-fledged reputation society, everything you did would be recorded and counted either for or against you. (Such a society would thus be very much like life according to many religions – the vicious would get punished, the virtuous rewarded – with the crucial difference that the punishments and rewards would be given in this life rather than in the after-life.) While this certainly could improve behavior (see Functions of reputation systems) it could also make society hard and unforgiving (or so several authors argue; see especially ch. 17). People have argued that it therefore should be possible to undergo “reputational bankruptcy” (cf. forgiveness of sins in, e.g., Catholicism), to escape one’s past, as it were, but as Eric Goldman points out (ch. 5, Location 1573, p. 59), this would allow people to get away with anti-social behavior without any reputational consequences, and hence make the reputation system’s socializing effects much weaker.
As stated in the introduction, in small villages people often have more reliable information about others’ past behavior and their general trustworthiness. This makes the villages’ informal reputation systems very powerful, but it is also to some extent detrimental to privacy. The story of the free-thinker who leaves the village where everyone knows everything about everyone for the freedom of the anonymous city is a perennial one in literature.
It could be argued, thus, that there is necessarily a trade-off between the efficiency of a reputation system, and the degree to which it protects people’s privacy. (See also Anonymity in online systems for more on this.) According to this line of reasoning, privacy encroachments are immune system disorders of reputation systems. It is a challenger for the architect of reputation systems to minimize this, and other, immune system disorders.
Referees – all rating systems need to be overlooked by referees. The received view seems to be that they need to be independent and impartial, and the question is raised whether private companies such as Google can function as such trustworthy and impartial referees (ch. 3). An important problem with regards to this is who “guards the guards?”. In ch. 3, John Henry Clippinger argues that this problem, which “has been the Achilles heel of human institutions since times immemorial” (Location 1046, p. 33) can be overcome in online reputation systems. The key, he argues, is transparency:
In situations in which both activities and their associated reputation systems become fully digital, they can in principle be made fully transparent and auditable. Hence the activities of interested parties to subvert or game policies or reputation metrics can themselves be monitored, flagged, and defended against.
Reporting bias (ch. 1) – e.g., that people refrain from giving negative votes for fear of retaliation. Obviously this is more likely to happen in systems where it is publically visible how you have voted. Another form of reporting bias is due to a certain good or service only being consumed by fans, who tend to give high ratings.
Reputation systems vs recommendation systems. This is a simple terminological distinction: reputation systems are ratings of people, recommendations systems are ratings of goods and services. I use “rating systems” as a general term covering both reputation and recommendation system.
Undersupply of reputational information; i.e. that people don’t rate as much as is socially optimal. This is also a concept mentioned by several authors, but in most detail in ch. 5 (Location 1520, p. 57):
Much reputational information starts out as non-public (i.e. “private”) information in the form of a customer’s subjective impressions about his or her interactions with a vendor. To the extent that this information remains private, it does not help other consumers make marketplace decisions. These collective mental impressions represent a vital but potentially underutilized social resource.
The fact that private information remains locked in consumers’ head could represent a marketplace failure. If the social benefit from making reputational information public exceeds the private benefit, public reputational information will be undersupplied.
Personally I think this is a massively underappreciated problem. People get countless such subjective impressions every day. At present we harvest but a tiny portion of these subjective impressions, or judgments, as a community. If the authors’ vision is to stand a chance of getting realized, we need to make people share these judgements to a much greater extent than they do today. (It goes without saying that we also need to distinguish the reliable ones from the unreliable ones).
Universal vs. constrained (or contextual) reputation systems. (ch. 17) The former are a function of your behavior across all contexts, and influences your reputation in all contexts, whereas the latter are rather constrained to a particular context (say selling and buying stuff on eBay).
Untruthful reporting (ch. 1). This can happen either because raters try to game the system (e.g., in order to benefit themselves, their restaurant, or what not) or because of vandalism/trolling. Taking a leaf out of Bryan Caplan’s "The Myth of the Rational Voter", I’d like to add that even people who are neither gaming or trolling typically spend less time and effort giving accurate ratings for others’ benefit than they do when they make decisions that influence their own pockets. Presumably this will decrease the level of accuracy of their ratings.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)