Could auto-generated troll scores reduce Twitter and Facebook harassments?
There's been a lot of discussion in the last few yeas on the problem of hateful behaviour on social media such as Twitter and Facebook. How can this problem be solved? Twitter and Facebook could of course start adopting stricter policies towards trolls and haters. They could remove more posts and tweets, and ban more users. So far, they have, however, been relatively reluctant to do that. Another more principled problem with this approach is that it could be seen as a restriction on the freedom of speech (especially if Twitter and Facebook were ordered to do this by law).
There's another possible solution, however. Using sentiment analysis, you could give Twitter and Facebook users a "troll score". Users whose language is hateful, offensive, racist, etc, would get a high troll score.* This score would in effect work as a (negative) reputation/karma score. That would in itself probably incentivize trolls to improve. However, if users would be allowed to block (and make invisible the writings by) any user whose troll score is above a certain cut-off point (of their choice), that would presumably incentivize trolls to improve even more.
Could this be done? Well, it's already been shown to be possible to infer your big five personality traits, with great accuracy, from what you've written and liked, respectively, on Facebook. The tests are constructed of the basis of correlations between data from standard personality questionnaires (more than 80'000 Facebook users filled in such tests on the behalf of YouAreWhatYouLike, who constructed one of the Facebook tests) and Facebook writings or likes. Once it's been established that, e.g. extraverted people tend to like certain kinds of posts, or use certain kinds of words, this knowledge can be used to predict the level of extraversion of Facebook users who haven't taken the questionnaire.
This suggest that there are no principled reasons a reliable troll score couldn't be constructed with today's technology. However, a problem is that while there are agreed criteria for what is to count as an extraverted person, there are no agreed criteria for what counts as a troll. Also, it seems you couldn't use questionnaires, since people who actually do behave like trolls online would be discinlined to admit that they do in a questionnaire.
One way to proceed could instead be this. First, you could define in rather general and vague terms what is to count as trolling - say "racism", "vicious attacks", "threats of violence", etc. You could then use two different methods to go from this vague definition to a precise score. The first is to let a number of sensible people give their troll scores of different Facebook posts and tweets (using the general and vague definition of what is to count as trolling). You would feed this into your algorithms, which would learn which combinations of words are characteristic of trolls (as judged by these people), and which arent't. The second is to simply list a number of words or phrases which would count as characteristic of trolls, in the sense of the general and vague definition. This latter method is probably less costly - particularly if you can generate the troll-lexicon automatically, say from existing dictionaries of offensive words - but also probably less accurate.
In any case, I expect it to be possible to solve this problem. The next problem is: who would do this? Facebook and Twitter should be able to construct the troll score, and to add the option of blocking all trolls, but do they want to? The risk is that they will think that the possible down-side to this is greater than the possible up-side. If people start disliking this rather radical plan, they might leave en masse, whereas if they like it, well, then trolls could potentially disappear, but it's unlikely that this will affect their bottom line drastically. Thus it's not clear that they will be more positive to this idea than they are to conventional banning/moderating methods.
Another option is for an outside company to create a troll score using Facebook or Twitter data. I don't know whether that's possible at present - whether you'd need Facebook and Twitter's consent, and whether they'd then be willing to give it. It seems you definitely would need it in order for the troll score to show up on your standard Facebook/Twitter account, and in order to enable users to block all trolls.
This second problem is thus much harder. A troll score could probably be constructed by Facebook and Twitter, but potentially they are not very likely to want to do it. Any suggestions on how to get around this problem would be appreciated.
My solution is very similar to the LessWrong solution to the troll problem. Just like you can make low karma users invisible on LessWrong, you would be able to block (and make invisible the writings by) Facebook and Twitter users with a high troll score. A difference is, though, that whereas karma is manually generated (by voting) the troll score would be automatically generated from your writings (for more on this distinction, see here).
One advantage of this method, as opposed to conventional moderation methods, is that it doesn't restrict freedom of speech in the same way. If trolls were blocked by most users, you'd achieve much the same effect as you would from bannings (the trolls wouldn't be able to speak to anyone), but in a very different way: it would result from lots of blockings from individual users, who presumably have a full right to block anyone, rather than from the actions of a central admin.
Let me finish with one last caveat. You could of course extend this scheme, and construct all sorts of scores - such as a "liberal-conservative score", with whose help you could block anyone whose political opinions are insufficiently close to yours. That would be a very bad idea, in my view. Scores of this sort should only be used to combat harassment, threats and other forms of anti-social behaviour, and not to exclude any dissenter from discussion.
* I here use "troll" in the wider sense which "equate[s] trolling with online harassment" rather than in the narrower (and original) sense according to which a troll is "a person who sows discord on the Internet by starting arguments or upsetting people, by posting inflammatory, extraneous, or off-topic messages in an online community (such as a newsgroup, forum, chat room, or blog) with the deliberate intent of provoking readers into an emotional response or otherwise disrupting normal on-topic discussion" (Wikipedia).
Which cognitive biases should we trust in?
There have been (at least) a couple of attempts on LW to make Anki flashcards from Wikipedia's famous List of Cognitive Biases, here and here. However, stylistically they are not my type of flashcard, with too much info in the "answer" section.
Further, and more troublingly, I'm not sure whether all of the biases in the flashcards are real, generalizable effects; or, if they are real, whether they have effect sizes large enough to be worth the effort to learn & disseminate. Psychology is an academic discipline with all of the baggage that entails. Psychology is also one of the least tangible sciences, which is not helpful.
There are studies showing that Wikipedia is no less reliable than more conventional sources, but this is in aggregate, and it seems plausible (though difficult to detect without diligently checking sources) that the set of cognitive bias articles on Wikipedia has high variance in quality.
We do have some knowledge of how many of them were made, in that LW user nerfhammer wrote a bunch. But, as far as I can tell, s/he didn't discuss how s/he selected biases to include. (Though, s/he is obviously quite knowledgable on the subject, see e.g. here.)
As the articles stand today, many (e.g., here, here, here, here, and here) only cite research from one study/lab. I do not want to come across as whining: the authors who wrote these on Wikipedia are awesome. But, as a consumer the lack of independent replication makes me nervous. I don't want to contribute to information cascades.
Nevertheless, I do still want to make flashcards for at least some of these biases, because I am relatively sure that there are some strong, important, widespread biases out there.
So, I am asking LW whether you all have any ideas about, on the meta level,
1) how we should go about deciding/indexing which articles/biases capture legit effects worth knowing,
and, on the object level,
2) which of the biases/heuristics/fallacies are actually legit (like, a list).
Here are some of my ideas. First, for how to decide:
- Only include biases that are mentioned by prestigious sources like Kahneman in his new book. Upside: authoritative. Downside: potentially throwing out some good info and putting too much faith in one source.
- Only include biases whose Wikipedia articles cite at least two primary articles that share none of the same authors. Upside: establishes some degree of consensus in the field. Downside: won't actually vet the articles for quality, and a presumably false assumption that the Wikipedia pages will reflect the state of knowledge in the field.
- Search for the name of the bias (or any bold, alternative names on Wikipedia) on Google scholar, and only accept those with, say, >30 citations. Upside: less of a sampling bias of what is included on Wikipedia, which is likely to be somewhat arbitrary. Downside: information cascades occur in academia too, and this method doesn't filter for actual experimental evidence (e.g., there could be lots of reviews discussing the idea).
- Make some sort of a voting system where experts (surely some frequent this site) can weigh in on what they think of the primary evidence for a given bias. Upside: rather than counting articles, evaluates actual evidence for the bias. Downside: seems hard to get the scale (~ 8 - 12 + people voting) to make this useful.
- Build some arbitrarily weighted rating scale that takes into account some or all of the above. Upside: meta. Downside: garbage in, garbage out, and the first three features seem highly correlated anyway.
Second, for which biases to include. I'm just going off of which ones I have heard of and/or look legit on a fairly quick run through. Note that those annotated with a (?) are ones I am especially unsure about.
- anchoring
- availability
- bandwagon effect
- base rate neglect
- choice-supportive bias
- clustering illusion
- confirmation bias
- conjunction fallacy (is subadditivity a subset of this?)
- conservatism (?)
- context effect (aka state-dependent memory)
- curse of knowledge (?)
- contrast effect
- decoy effect (aka independence of irrelevant alternatives)
- Dunning–Kruger effect (?)
- duration neglect
- empathy gap
- expectation bias
- framing
- gambler's fallacy
- halo effect
- hindsight bias
- hyperbolic discounting
- illusion of control
- illusion of transparency
- illusory correlation
- illusory superiority
- illusion of validity (?)
- impact bias
- information bias (? aka failure to consider value of information)
- in-group bias (this is also clearly real, but I'm also not sure I'd call it a bias)
- escalation of commitment (aka sunk cost/loss aversion/endowment effect; note, contra Gwern, that I do think this is a useful fallacy to know about, if overrated)
- false consensus (related to projection bias)
- Forer effect
- fundamental attribution error (related to the just-world hypothesis)
- familiarity principle (aka mere exposure effect)
- moral licensing (aka moral credential)
- negativity bias (seems controversial & it's troubling that there is also a positivity bias)
- normalcy bias (related to existential risk?)
- omission bias
- optimism bias (related to overconfidence)
- outcome bias (aka moral luck)
- outgroup homogeneity bias
- peak-end rule
- primacy
- planning fallacy
- reactance (aka contrarianism)
- recency
- representativeness
- self-serving bias
- social desirability bias
- status quo bias
Happy to hear any thoughts!
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)