There's been a lot of discussion in the last few yeas on the problem of hateful behaviour on social media such as Twitter and Facebook. How can this problem be solved? Twitter and Facebook could of course start adopting stricter policies towards trolls and haters. They could remove more posts and tweets, and ban more users. So far, they have, however, been relatively reluctant to do that. Another more principled problem with this approach is that it could be seen as a restriction on the freedom of speech (especially if Twitter and Facebook were ordered to do this by law).

 

There's another possible solution, however. Using sentiment analysis, you could give Twitter and Facebook users a "troll score". Users whose language is hateful, offensive, racist, etc, would get a high troll score.* This score would in effect work as a (negative) reputation/karma score. That would in itself probably incentivize trolls to improve. However, if users would be allowed to block (and make invisible the writings by) any user whose troll score is above a certain cut-off point (of their choice), that would presumably incentivize trolls to improve even more. 

Could this be done? Well, it's already been shown to be possible to infer your big five personality traits, with great accuracy, from what you've written and liked, respectively, on Facebook. The tests are constructed of the basis of correlations between data from standard personality questionnaires (more than 80'000 Facebook users filled in such tests on the behalf of YouAreWhatYouLike, who constructed one of the Facebook tests) and Facebook writings or likes. Once it's been established that, e.g. extraverted people tend to like certain kinds of posts, or use certain kinds of words, this knowledge can be used to predict the level of extraversion of Facebook users who haven't taken the questionnaire.

This suggest that there are no principled reasons a reliable troll score couldn't be constructed with today's technology. However, a problem is that while there are agreed criteria for what is to count as an extraverted person, there are no agreed criteria for what counts as a troll. Also, it seems you couldn't use questionnaires, since people who actually do behave like trolls online would be discinlined to admit that they do in a questionnaire.

One way to proceed could instead be this. First, you could define in rather general and vague terms what is to count as trolling - say "racism", "vicious attacks", "threats of violence", etc. You could then use two different methods to go from this vague definition to a precise score. The first is to let a number of sensible people give their troll scores of different Facebook posts and tweets (using the general and vague definition of what is to count as trolling). You would feed this into your algorithms, which would learn which combinations of words are characteristic of trolls (as judged by these people), and which arent't. The second is to simply list a number of words or phrases which would count as characteristic of trolls, in the sense of the general and vague definition. This latter method is probably less costly - particularly if you can generate the troll-lexicon automatically, say from existing dictionaries of offensive words - but also probably less accurate.

 

In any case, I expect it to be possible to solve this problem. The next problem is: who would do this? Facebook and Twitter should be able to construct the troll score, and to add the option of blocking all trolls, but do they want to? The risk is that they will think that the possible down-side to this is greater than the possible up-side. If people start disliking this rather radical plan, they might leave en masse, whereas if they like it, well, then trolls could potentially disappear, but it's unlikely that this will affect their bottom line drastically. Thus it's not clear that they will be more positive to this idea than they are to conventional banning/moderating methods.

Another option is for an outside company to create a troll score using Facebook or Twitter data. I don't know whether that's possible at present - whether you'd need Facebook and Twitter's consent, and whether they'd then be willing to give it. It seems you definitely would need it in order for the troll score to show up on your standard Facebook/Twitter account, and in order to enable users to block all trolls.

This second problem is thus much harder. A troll score could probably be constructed by Facebook and Twitter, but potentially they are not very likely to want to do it. Any suggestions on how to get around this problem would be appreciated.

 

My solution is very similar to the LessWrong solution to the troll problem. Just like you can make low karma users invisible on LessWrong, you would be able to block (and make invisible the writings by) Facebook and Twitter users with a high troll score. A difference is, though, that whereas karma is manually generated (by voting) the troll score would be automatically generated from your writings (for more on this distinction, see here).

One advantage of this method, as opposed to conventional moderation methods, is that it doesn't restrict freedom of speech in the same way. If trolls were blocked by most users, you'd achieve much the same effect as you would from bannings (the trolls wouldn't be able to speak to anyone), but in a very different way: it would result from lots of blockings from individual users, who presumably have a full right to block anyone, rather than from the actions of a central admin.

 

Let me finish with one last caveat. You could of course extend this scheme, and construct all sorts of scores - such as a "liberal-conservative score", with whose help you could block anyone whose political opinions are insufficiently close to yours. That would be a very bad idea, in my view. Scores of this sort should only be used to combat harassment, threats and other forms of anti-social behaviour, and not to exclude any dissenter from discussion.

 

* I here use "troll" in the wider sense which "equate[s] trolling with online harassment" rather than in the narrower (and original) sense according to which a troll is "a person who sows discord on the Internet by starting arguments or upsetting people, by posting inflammatory, extraneous, or off-topic messages in an online community (such as a newsgroup, forum, chat room, or blog) with the deliberate intent of provoking readers into an emotional response or otherwise disrupting normal on-topic discussion" (Wikipedia).

New Comment
42 comments, sorted by Click to highlight new comments since: Today at 8:34 PM

First, you could define in rather general and vague terms what is to count as trolling -

Obviously, all Thought Crime. Duh. I'm sure the Ministry of Truth could whip up a "Hate Facts" detector in no time.

Yes, I'm know I'm being snarky. Guess I'm feeling triggered by the ideological coercion inherent in the proposal.

On a technical level, you pick your set of values, or Valuers, and label material by those values, you could whip up a predictor with some reliability that labels according to those values. That's not an issue.

The issue is the governance issue. Whose values?

There are solutions to user preference that don't involve top down imposition of values. User tuned scoring, including collaborative scoring, can filter out what you find disagreeable, based on your choices. Let every ideological entity make their own scoring algorithm, and have Facebook open an API to that users can select whatever one they want. Facebook should, in my view, make such mechanisms available. It's pitiful that they don't. It's pitiful that it isn't a ubiquitous feature of The Web Gazillion.0.

I didn't use it extensively, but I think the Extropians list had such user controlled collaborative filtering in the mid 90s.

Let every ideological entity make their own scoring algorithm

That solves the Ministry of Truth problem, but that doesn't solve the Set of Echo Chambers problem.

[-]V_V9y90

In some ways Google and Facebook already create echo chambers as they personalize your search results or your feed trying to maximize "user engagement".

True, and moreover the internet itself promotes the creation of echo chambers because it's very easy to choose only those sources of information which confirm your existing views and beliefs.

I think it's a pretty serious problem and one that we should try to not make any worse.

Maybe we could somehow combine the creation of echo chambers with exploration.

I do not have a specific solution, but here is what I am trying to achieve: Suppose that I have a political view X, and I oppose a political view Y. Let's say that I intolerantly believe that most people from Y are idiots. Then, at some moment, somehow (perhaps by clicking a button: "I feel adventurous today, show me a random article outside of my bubble"), I find one person with a view Y who seems smart. I still believe the others are idiots, but this one is an exception. So I decide to expland my bubble by including this specific person, and stuff recommended by them.

For this solution to work, we need a good scoring algorithm. Because I only respect one specific person from the group Y, but dislike most of the others. So if the system will give me articles recommended by this specific person, I may enjoy them, but if it just give me articles recommended by people from group Y generally, I will dislike them. So the ability to build very specific bubble is necessary to have the ability to explore other people's bubbles.

My intuition is that a bubble made for one person is better than a bubble made for a group. And even better would be if the algorithm could recognize different "factors" within that person. For example, there would be a bubble of information optimized for me, but someone could choose that they want to see my favorite articles about rationality, but not my favorite articles about programming; and the system could classify them correctly. Analogically, there could be a person that is really good at some topic, but completely mindkilled about some other topic, so I would decide to follow them only about that one topic.

This is just a hypothesis without good suppport, but maybe the problem of echo chambers today is that they are too big. If you create a chamber for 100 people, there may be 20 loud idiots, and the whole chamber will be mostly full of idiocy. But if you split it to smaller chambers for 10 people each, some of those subchambers may actually be interesting.

So I decide to expand my bubble by including this specific person, and stuff recommended by them.

Yeah, that's the kind of thing I'm thinking of.

If you allow ideological self tagging, you could also let the Wisdom of the Idiots pick their champion. One automatic method is upping the people someone responds to.

There are a lot of simple options that would go a long way, particularly since right now you're lucky to get thumbs up, thumbs down. The Web 5 Zillion is really pitiful in this regard.

perhaps by clicking a button: "I feel adventurous today, show me a random article outside of my bubble"

Well, my first association is with the scene in Stephenson's Snowcrash where Hiro meets a few people from the New South Africa Franchulate #153...X-)

For this solution to work

For this solution to work we need at least two things to happen on a regular basis:

  • People will click the "show me something written by weirdos" button
  • People's reaction will be something other than "Freaks! Degenerates! Spawn of evil! KILL'EM ALL!!!"

maybe the problem of echo chambers today is that they are too big

I think the problem of echo chambers is that they exist. They are not an unmitigated disaster, of course -- basically everyone curates their own experience and that's normal -- but the words "echo chamber" imply that the balance has tilted too far towards the "comfortably numb" side and away from the "new and could change something" side.

In defense of echo chambers: Imagine what would happen if we tried to minimize their existence.

The ultimate anti-bubble internet would be like this: It would show you a random page. Then you could click "Next" and it would show you another random page. (Or perhaps the pages would be changed automatically, to prevent you from spending too much time looking at the page you agree with, and skipping the pages you disagree with.) That's all. There would be no way to send someone a page, or even to bookmark it for the future you, because even that would give you a tool to increase the fraction of time spent reading pages you agree with.

I am sure there are people who could defend this kind of internet. (Especially if it would exist, so they would be defending status quo instead of a crazy thought experiment.) Yeah, you probably couldn't even read half of the content you would look at, because it would be written in a language you don't understand... but that's awesome because it allows you to look out from your cultural bubble, and motivates you to learn new languages. Etc.

But I think most of us would agree that such internet would be a horrible thing. We want to have a choice. We want the opportunity to read a LessWrong debate instead of having to read only random articles (even if they are in the languages we speak). Okay, wanting does not necessarily mean that something is better, but... I think we would also agree that the ultimate anti-bubble internet would be worse than what we have now.

So it seems like there is a scale going from "no bubbles" to "perfect bubbles", both extremes seem horrible, and... how do we find the optimal point? (I mean, using some other method than defending status quo.)

I think we would also agree that the ultimate anti-bubble internet would be worse than what we have now.

Well, duh.

So it seems like there is a scale going from "no bubbles" to "perfect bubbles"

Kinda. The "no bubbles" extreme means you are forced to absorb information regardless of your preferences, Clockwork Orange-style if you actually want to go to the extreme. A more common example would be school: you are forced to learn (well, to be exposed to) a set of subjects and no one cares whether you are interested in them or not.

The "perfect bubble" end actually looks like what some people at contemporary US colleges would like to construct (see e.g. this). You are actively protected from anything that might upset you -- or make you change your mind.

If you want to find the optimal point on the spectrum between the two extremes, a good starting point would be to specify what are you optimizing for. Optimal according to which criteria?

I think most people would prefer to see only the stuff they already agree with, and maybe once in a while a heresy they can easily defeat. On the other hand, they want unbelievers to be more exposed to alternative opinions.

Even when people try to suggest the same rules for everyone, I suspect they prefer such rules that would make their opinion win. If they believe their opinion would win in a free marketplace of ideas (or at least that it would lose without such marketplace), they will defend free speech for everyone. On the other hand if too much freedom gives an advantage to competing memes, they will find an excuse why free speech has to be limited in this specific aspect. Etc.

So I guess that most people would prefer the status quo with slightly better filtering options for themselves, and slightly more exposure to alternative views for others.

On some level I find it hypocritical to complain about those colleges where students are protected against the microaggressions of alternative opinions, when my own desire is to be sheltered from idiocy of people around me. Technically speaking, the set of rationalists is so much smaller than the set of politically correct people, that even if I don't desire 100% filtering out of the rest of the world and they do, my filter is still stronger than theirs in some mathematical sense. (I cannot realistically imagine a whole college full of rationalists. But if such thing is possible, I would probably never want to leave that place.)

We're talking solely about the desirable degree of filtering. No one, including me, argues that people should just not filter their information input -- their news, their forums, their discussions, etc.

It's like I say that we shouldn't encourage paranoid tendencies and you're saying that if you imagine the inverse -- everyone is forced to trust complete strangers all the time -- it will be horrible. Of course it will be horrible. However this is not a valid counterargument to "we shouldn't encourage paranoid tendencies".

Filtering is normal. Filtering is desirable. Everyone filters. But.

Too much of pretty much any normal and desirable activity leads to problems. If the environment (due, say, to shifts in technology) changes so that it become very very easy to overdo that normal and desirable activity, there will be issues. The so-called diseases of civilization would be an example: e.g. the desire to stuff your face full of superstimulus food is entirely normal and desirable (we tend to diagnose people without this desire with eating disorders). But if that superstimulus food becomes easily and cheaply available, well, there are issues.

It may or may not be a problem, depending on how people set their filters. People choose.

If you don't want to hear from the other team, you don't. If you do, you filter accordingly. If the people whose judgment you include in your filters want to listen to the other team, you get fed some of the other team, as they filter the other team.

Plenty of people want to live in a echo chamber. Let them self segregate and get out of the way of the grown ups who want to talk.

Let them self segregate and get out of the way of the grown ups who want to talk.

History tells me that they will show up with torches and pitchforks outside my door soon enough...

You could of course extend this scheme, and construct all sorts of scores - such as a "liberal-conservative score", with whose help you could block anyone whose political opinions are insufficiently close to yours. That would be a very bad idea, in my view. Scores of this sort should only be used to combat harassment, threats and other forms of anti-social behaviour, and not to exclude any dissenter from discussion.

You give people a tool to build echo chambers, they will build echo chambers for themselves. Your views on what constitutes a "bad idea" would be irrelevant at this point. Not to mention that many interpret the existence of people "whose political opinions are insufficiently close" as harassment and threat by itself.

[-]V_V9y150

Machine learning methods can often have good accuracy at population level, but fail spectacularly on specific instances, and if the instance-level outputs are visible to the public, these failures may be quite embarrassing: imagine if somebody posted a quote by Churchill or a passage from the Qur’an and it was mistakenly tagged as trolling.

Even if the machine learning system was not misclassifying, it can never be better than the data it is trained on.
If you train it on user ratings, it will turn into a popularity contest, with unpopular opinions being tagged as trolling, especially once the politically-motivated users figure out how to manipulate the system by voting strategically.
If it is based on a dataset curated by the company it would be perceived as the company exerting ideological and political bias in a subtle, opaque manner.

In general it would be seen as a powerful institution putting an official seal of approval/mark of shame on speech and people. It reeks of totalitarianism.

imagine if somebody posted a quote by Churchill or a passage from the Qur’an and it was mistakenly tagged as trolling.

If the trolling criteria include "racism" or "threats of violence" (as in the OP), I think both of these sources would be correctly matched by the software. (Which is not to say I think we should censor them.)

If the criteria include the generous "language [which] is offensive" (also in the OP), I think most language ever written would turn out to be offensive to someone.

[-]V_V9y00

If the trolling criteria include "racism" or "threats of violence" (as in the OP), I think both of these sources would be correctly matched by the software. (Which is not to say I think we should censor them.)

Indeed.

My suggestion was not to train the system on user ratings:

The first is to let a number of sensible people give their troll scores of different Facebook posts and tweets (using the general and vague definition of what is to count as trolling). You would feed this into your algorithms, which would learn which combinations of words are characteristic of trolls (as judged by these people), and which arent't. The second is to simply list a number of words or phrases which would count as characteristic of trolls, in the sense of the general and vague definition.

[-]V_V9y20

So, essentially it would depend on the company opinion.

Anyway, lists of words or short phrases won't work. Keep in mind that trolls are human intelligences, any AI short of Turing-test level won't beat human intelligences at their own game.

This is fine as long as you don't care about false positives.

Facebook does allow users to report posts as harrassing. The likely do have internal machine learning algorithms that give people corresponding scores and I would also guess that it somehow figures into their 100,000 factors of ranking posts on the timeline.

This post is mentioned in a Slate Star Codex blog post, which has some examples of blocking groups of people on social media (less sophisticated than proposed here), and raises different scenarios of how this trend could play out http://slatestarcodex.com/2015/05/06/the-future-is-filters/

[-][anonymous]9y10

This is pretty much the same as existing downvoting systems. People are supposed to use the down arrow as a distributed moderation, not to show disagreement, according to Reddit rules, yet they do it anyway. The same way, the troll button would also devolve into a disagree button.

I should also add, in an ideal world, there would be a way to make a difference between ideas expressed in a hateful tone vs. ideas that are branded hateful even if they are very gently and a friendly way put just because they do not mach "political correctness" norms. Hate can be an actual emotion (discovered by tone) or a judgement set by others.

"The same way, the troll button would also devolve into a disagree button."

There's no suggestion that there should be a troll button.

A troll score could probably be constructed by Facebook and Twitter, but potentially they are not very likely to want to do it. Any suggestions on how to get around this problem would be appreciated.

I don't see why this wouldn't work fine as an optional feature. You could even have minimum (or maximum) positivity ratings by group, say--someone might want to see everything their core group of friends and family posts, only positive things that acquaintances or strangers post, and only negative things posted by their frenemies.

I think the second problem is solveable. I've extracted suitable data for a project classifying tweets as hate speech from the twitter apis (you can at the very least get all tweets containing certain search terms). As to integration with a site, I think it should be possible to create a browser addon that deletes content (or perhaps replaces it with a trollface image?) identified as trolling from a page as you load it (I don't know of an example of something that does this offhand, but I know there are addons that, ie., remove the entire newsfeed from a facebook page). There might be some problems relative to facebook or twitter implementing it themselves, but it would be possible to at least get a start on it.

What definition of "hate speech" did you use? For example, does mentioning that members of group X have lower IQ's and are more likely to commit violent crimes count as "hate speech"? Does it matter if all the relevant statistics indicate this is indeed the case? Does it matter if it's true?

Sorry, wasn't meaning to get into the question of how accurate you could be - I just wanted to clarify technical feasibility of data collection and website modification. The project in question was just for a university course, and not intended for anything like the system described in this post. I just used a bag of words model, with tweets only pulled which contained words which are typically used as slurs towards a particular group. Obviously, accuracy wasn't very good individual tweet classification, the model only worked well when specific terms were used, and it missed a lot of nuance (ie. quoted song lyrics). It wasn't what you would need for troll classification.

For anything to work for this application, you'd probably need to limit the use of automatic classification to suggesting that a tweet might be trolling, subject to later manual reclassification, or to identifying users that are frequent and blatant trolls, and it's an open question as to whether you could actually make something useful from the available data.

My point is that I don't think "hate speech" is a well defined concept. In practice it winds up cashing out as "speech I disagree with".

[-][anonymous]9y20

While the term is clearly overused, I think it is not entirely useless. One useful method is to focus not on the victim but on the speaker. Does the speaker sound like someone bringing others down just to feel better about himself? For example, Murray's The Bell Curve was attacked a lot for its alleged racism, but it was not actually hate speech, the tone was that of a deeply concerned and frightened and reluctant admittance of human differences. However just go on reddit and you find many examples of people engaging in cheap potshots of racism or sexism largely as a way to comparatively feel better about themselves. If a racial or gender oriented comment sounds like something said by a gamma male in a proverbial parental basement to feel comparatively better, it is hate speech. /r/fatpeoplehate is a good example, if you count the various kinds of insults "butter slugs" "butter golems" etc.

If it sounds strange, then let me remind you that hate is an emotion. It is something felt by people. So hate depends on not whether it had or had not hurt the victim, but whether the perpetrator felt it or not. So offensive speech (victim angle) can still not be hate speech (perp angle) and vice versa.

but it was not actually hate speech, the tone was that of a deeply concerned and frightened and reluctant admittance of human differences. However just go on reddit and you find many examples of people engaging in cheap potshots of racism or sexism largely as a way to comparatively feel better about themselves.

Note, how you've just conflated "hate speech" with "racism" and "sexism". Yes, hate is an emotion (at least that's the original meaning of them word) and like all emotions there are rational and irrational reasons to feel it. However, as the term is commonly used today "hate speech" has almost nothing to do with "hate". You mention that it's inappropriately applied to situations where there is no actual hate. On the other hand you're still implicitly restricting it to situations where the target is an official victim group.

If a racial or gender oriented comment sounds like something said by a gamma male in a proverbial parental basement to feel comparatively better, it is hate speech.

This is incomplete because it ignores the possibility of countersignalling--"you're so inferior that I don't even need to use swear words at you to demonstrate hate".

This is incomplete because it ignores the possibility of countersignalling

Gammas almost by definition aren't going to be counter-signaling anything.

"you're so inferior that I don't even need to use swear words at you to demonstrate hate"

You seem to be making the mistake under discussion by conflating being superior to someone with "hating" him.

Gammas almost by definition aren't going to be counter-signaling anything.

He did not say it was said by gammas. He said that it resembles something said by gammas (but was said by other people). These other people could countersignal.

You seem to be making the mistake under discussion by conflating being superior to someone with "hating" him.

No, I'm not. I didn't say that feeling superior is hate; I said that feeling superior affects how one expresses hate. Someone who feels superior might express hate using nice words, if his status is such that it is clear that he is not using the nice words to actually be nice.

That's very interesting! I would obviously love it if such a browser addon could be constructed. And the trollface image is a great idea. :)

By the way, the fact that your very insightful comment is downvoted is really a shame. Why do people downvote interesting and informative comments like this? That makes no sense whatsoever.

Most of the top comments are "this is a terrible idea and here are the reasons we should never do it", and his comment is "we can do it sooner than you think, here's how". To people concerned about censorship and creation of echo-chambers, the trollface image is adding insult to injury for those harmed by the filter as well as being an easy-to-imagine detail so people are less detached and more emotional in responding to it.

Also personally, while I grudgingly accept that people currently use "troll" to simply mean "mean person on the internet" without regard to the more specific meaning that "troll" used to have, I'm pretty sure the trollface is specific to trolling in the latter sense, while it's only the former that could plausibly be detected by a filter, and the incorrect usage of the meme aggravates me no end.

Most of the top comments are "this is a terrible idea and here are the reasons we should never do it", and his comment is "we can do it sooner than you think, here's how".

I get that. But in my book you don't downvote a comment simply because you don't agree with it. You downvote a comment because it is poorly argued, makes no sense, or something like that. Clearly, that doesn't apply to this comment.

Well, the disagreement is on the level of the latter taking completely for granted the point that the former is disagreeing with (i.e. that the 'troll filter' is desirable), which could be "poorly argued" from some perspectives, or otherwise seems to fall under the "or something like that" umbrella.

But in my book you don't downvote a comment simply because you don't agree with it.

Does it surprise you that many people here have books that are different from yours?

This seems like a great way to build echo chambers. "Oh, he dares to hold a different view than mine? Downvote!"

Not a great way, but a small step towards, yes :-)

But the LW-in-reality is some distance away from the LW-as-it-should-be. In practice I see downvotes on the basis of disagreement all the time. This is what is (descriptive) regardless of what people would like to be (normative).

Discussing the details of how to do bad things, and doing so for its own sake rather than as a step towards showing something else, is in the real world Bayseian evidence that you support those bad things. Announcing "I personally have started implementing a step needed for one of those bad things" is even worse.