Improve comments by tagging claims

Benquo

I used to think that comments didn’t matter. I was wrong. This is important because communities of discourse are an important source of knowledge. I’ll explain why I changed my mind, and then propose a simple mechanism for improving them, that can be implemented on any platform that allows threaded comments.

Should there be comments?

Comments are optional.

Organizing discourse as a forum with comments introduces substantial structural rigidity. If you don’t like one of the regular posters’ content, too bad, it’s still cluttering up your feed. If you want to start contributing to the discourse, this comes at the cost of pulling everyone else’s attention, willing or no. In forums with high barriers to entry, this means that the move equivalent to “start your own blog” is not available to everyone. In forums with low barriers to entry, this can mean dilution by low-value posts, leading to the departure of discriminating readers. If standards are ambiguous, this can lead to an adverse selection process in which the contributors who are most conscientious about respecting community standards err on the side of not posting, and the least conscientious contributors flood the forum with low-value content, accelerating the process of pushing away the most discriminating participants.

My prior model of how written discourse should work was that people should publish in whatever venue they thought appropriate - often their own blog. If they find someone else’s writing interesting and want to comment on it, they can write their own post, and link to the thing they want to talk about. This has a few benefits. A link-based attention economy means that in order to get the attention of the readers of some content, you generally have to get the attention of the person generating the content. This means that higher-quality writing gets attention, and limits the effectiveness of low-quality trolling. Another benefit is that it allows for partly overlapping clusters. If a blog you like keeps linking to another blog, you can decide whether you like that one, and if you do, you can start reading it directly. Thus, you’ll be directed to blogs that are “nearby” in your local network, but not everyone reading your blog has to read the same set of nearby blogs.

In the early 2000s, a natural experiment happened. Americans on the right and left were both galvanized by the World Trade Center attacks and the consequent wars in Afghanistan and Iraq, and started writing. Two parallel blogospheres were formed - one on the left, and one on the right. And it so happened that some of the first prominent right-wing bloggers didn’t allow comments on their blogs. The result? The left-blogosphere developed parallel thriving communities on prominent blogs. The right-blogosphere had a “get your own blog” ethos, and were generous with their links and hat-tips. For more on this, see Adamic and Glance 2005 (h/t Kevin Drum) and Benkler and Shaw 2010.

My first exposure to the blogosphere was mostly blogs on the right during the relevant period. When I heard tell of problems due to trollish commenters, and difficulties establishing clear standards for posting in forums, my initial response was, “Who needs a forum? If you have something to say, you should start your own blog.” I can no longer wholeheartedly insist that this is the whole solution. But I still think that the comments-links axis is an important one in discourse design, and it’s not obvious that the end of the spectrum “comments’ points towards is always better.

Comments matter.

A friend recently expressed disappointment that an article they published in a forum got low-quality comments that they felt compelled to respond to. When I suggested posting on their own blog, and if necessary closing the comment section, they pointed out something I hadn’t properly considered:

Comments are a high-quality, high-sensitivity measure of engagement.

It’s great when someone links to your work - and perhaps linking would be more common without comment sections. It’s a strong signal that you’ve been heard, and someone thinks your message is relevant - but it happens rarely, and unless you’re already extremely popular, it won’t happen on most of your posts. It’s a bad feedback loop taken on its own; it measures the right thing, but gives extremely coarse-grained feedback. A more sensitive metric is website traffic - how many people did you get to look at your post? But this doesn’t tell you whether anyone was moved to do something on their own based on your post, just how many people felt moved to click through, and maybe share it on social media. It favors feel-good posts and outrage porn over true insight and clear criticism. Judging by web traffic alone, my all-time “best” blog post is one about the query language of the mind. This probably happened because it was shared by some prominent figures in my community. But judging by engagement, one of my first posts pointing out some specific problems with Effective Altruism seemed to be much better at starting productive conversations, based on the comment section. (It turns out to be my second most popular post of all time, but I couldn't have known that as quickly as I knew that I was getting good engagement.) As a result, I’ve written more posts like that, and gotten more in-person feedback that those posts were persuasive, than for posts on any other topic I’ve written on.

A second reason comments are important, is that starting your own blog - or writing a whole blog post - is a pretty big deal when all you want to do is signal-boost an article you think is worth reading (upvoting or “liking” is a way to do this and only this, without interposing yourself between the reader and the content), or reply to a specific point that’s only relevant in the context of a particular article (in which case commenting is the natural solution). Oddly, the more hierarchically structured forums and blogs with comments welcome more interaction than the flatter structure of a purely posts-and-links blogosphere.

Improving comments

Selection vs treatment

There’s already been some discussion about how to make comments work better. For instance, Paul Christiano has proposed a machine learning solution in line with his approach to AI safety. Much of the discussion has been about efficiently promoting good comments and detecting and removing bad actors. In short, it’s about improving the quality of observed comments through selection.

I want to talk how to create, not selection effects, but treatment effects. I want to focus on making comments better - doing things that directly cause people who are trying in good faith to participate in the discourse to post better comments.

Problem: nitpicking

One problem that’s come up repeatedly in conversations about the quality of blog post comments is that they don’t respond to important points, and instead nitpick about minutia. I think this happens for a few reasons, including: It’s often easier to evaluate minor fact-claims than major claims. This is because many readers often aren’t comparing an article’s underlying model to theirs - they may not have a model - and so, look for details that it’s easy to say yes or no to. The obvious solution is to make it easier to identify and think about a post’s substantive claims.

Solution: tag claims

The Arbital team has recently implemented the feature of tagging claims made in an article. For instance, in Alexei Andreev’s recent post about waiting to donate until the end of a fundraiser, there are links in the article to specific claims it makes or addresses, such as:

When linking to my blog post on GiveWell and "crowding out" considerations, he adds a link to a related claim page:

It's good for GiveWell and Good Ventures to crowd out donors by their donations.

I’m pretty excited about this infrastructure. It gives the post author to foreground the considerations they think are most relevant, and gives commenters a set of default topics to argue about.

It also accomplishes the secondary goal of making comments more of a lasting, accessible record. If the comments about a claim are scattered over several related articles relevant to the claim, and also mixed together with comments on other topics, it can be hard to know whether you’ve seen the important discussions on a given topic. If, on the other hand, all those articles are tagged with the same claim, then you need only click through to the claim page, and you’ll find a record of comments by readers of all of the relevant posts - and only the comments relevant to this claim.

However, Arbital isn’t an universal platform, and many people want to maintain their personal blogs or post to a public forum allowing comments. What can we do to improve comments there?

Threaded comments enable tagged claims.

In my post advocating publishing private opinions on secret blogs, I posted three comments, one for each claim I wanted to make. Each comment ended with: "If you want to discuss this claim, I encourage you to do it as a reply to this comment.” On LessWrong, someone asked me to put the claim comments in boldface so they’d be easier to find. For another example, see my comments on this post.

The post didn’t get a huge number of comments but they felt maybe slightly more on-topic than usual. My claim-comments didn’t prevent people from introducing new threads, but they might have generated relevant responses that wouldn’t have been written otherwise, and they mostly got comments on the same topic grouped together. If you’re writing your own posts, whether on something like a forum or other group publication, a personal blog or website, or just ordinary social media, I encourage you to try this. Let me know how it goes.

(Cross-posted at my personal blog and Arbital.)

Claim 4: Explicitly tagging the core claims of a post will make people substantially more likely to respond to these claims.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

I sometimes end blog posts with enumerated "discussion questions". I get vastly higher comment rates and vastly better discussions when I do this. I don't know why I don't always do this, actually.

Hypothesis: there's a lot of "intellectual dark matter" in the form of thoughts people have (or could easily have if prompted) that they don't share, but would share if nudged.

Twitter.

Though I'd probably call it anti-matter, though :-/

At least it decreases the probability that people will start commenting on one claim of the post, and forget that the other are there, too.

I think you may be misunderstanding why people focus on selection mechanisms. Selection mechanisms can have big effects on both the private status returns to quality in comments (~5x) and the social returns to quality (~1000x). Similar effects are much less plausible with treatment effects.

Claim: selection mechanisms are much more powerful than treatment effects.

I think people are using the heuristic: If you want big changes in behavior, focus on incentives.

Selection mechanisms can make relatively big changes in the private status returns to making high quality comments by making high quality comments much more recognized and visible. That makes the authors higher status, which gives them good reason to invest more in making the comments. If you get 1000x the audience when you make high quality comments, you're going to feel substantially higher status.

Selection mechanisms can make the social returns to quality much larger by focusing people's attention on high quality comments (whereas before, many people might have had difficulty identifying high quality even after reading it).

"More powerful" seems like it's implicitly using categories that don't cut at the joints. I think Aceso Under Glass's post on Tostan makes an important distinction between capacity-building and capacity-using interventions:

This is more speculative, but I feel like the most legible interventions are using something up. Charity Science: Health is producing very promising results with SMS vaccine reminders in India, but that’s because the system already had some built in capacity to use that intervention (a ~working telephone infrastructure, a populace with phones, government health infrastructure, medical research that identified a vaccine, vaccine manufacture infrastructure… are you noticing a theme here?). [...] Having that capacity and not using it was killing people. But I don’t think that CS’s intervention style will create much new capacity. For that you need inefficient, messy, special snowflake organizations.

I'd guess that treatment effects seem less powerful than selection effects of equal importance because treatment effects are typically more capacity-building loaded.

I wonder if claim tagging/comments also helps your writing too. Making you more conscious of what you are claiming and how justified it is.

Another possibly useful exercise is to enumerate some of the bits of evidence you would expect to refute those claims.

Claim 5: Claim-tagging is worth trying more broadly (because of claims 3,4).

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

Claim 1: Location on the comments-links continuum is an important aspect of discourse design.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

Yes, to my mind even obviously so.

Some concepts that relate to this:

Pacing: comments are relatively fast-flowing, while exchange of blog posts is more ponderous
Chunkiness: comments are smaller, blog posts are bigger
Control: comments are at the mercy of forum moderators, blog posts are your own domain

Claim 2: Comments are a high-quality, high-sensitivity measure of engagement with little in the way of viable substitutes.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

I wonder whether it would be a good idea to implement a feedback mechanism like this -- multiple feedback buttons below the article ("agree", "disagree", "interesting", "stupid", etc.) and the user could just click one... and then, optionally, write a comment, but the vote would count also without the comment. (Maybe a limit of one vote per IP address?)

The idea is that clicking the button is even easier than writing a comment, so if you want more feedback...

Claim 3: Irrelevant nitpicks are an important problem in comment sections on sites such as LessWrong.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

Sometimes the nitpicks are of the form "you don't have enough evidence for X" or "you didn't cite a source" followed by lots of downvoting, when a reasonable prior probability distribution would assign enough probability to X to make the proceeding analysis interesting and useful.

It's very hard to make any kind of interesting claim whilst covering yourself against every possible 'citation needed' nitpick.

Normally we think of the burden of proof resting on writers. But that is just a social convention. I haven't heard a consequentialist justification for this.

Pros of having the burden of proof be on the person who introduces an idea:

The person who introduces an idea generally gets the associated status. Looking up citations is tedious compared to having ideas, at least for me. By putting the burden of proof on the person who has the idea, we create a status incentive for someone to actually look up citations.
The toplevel post vs comment distinction facilitates working harder to create posts than comments. It seems a little awkward to have 2 posts, one that introduces an idea and a later one which provides citations.

Cons:

Maybe some people are better at looking up citations than others.
Maybe the choice is between sharing citation-free ideas and not having them shared at all.
Maybe we think an idea development process that involves bouncing ideas against others early on works better. Before looking up citations, maybe it's best to check if the idea is worth testing, or if we should really be looking at a slightly different hypothesis. Maybe the idea will be shot down quickly and decisively by a commenter even if lots of citations are provided.

I think an important dimension here is what you're being asked to provide evidence/citations for.

As an over-the-top example:

The person who introduces an idea generally gets the associated status

You didn't provide a citation for this. So irrational. Much downvote.

Oh I agree, I was going off on a tangent with my thing (considering the specific scenario where everyone agrees that citations should be looked up at some point)

Posting or commenting imposes a cost in the form of a claim on the attention of your readers. It also provides a benefit in the form of information.

Perhaps the burden on writers should simply be to justify that their writing is relevant enough, and likely enough to be correct, to justify making this claim on readers' time and attention. This burden should be higher on shared fora than personal blogs, higher on posts than comments, higher for parent comments than replies, higher for off-topic than on-topic posts, higher for speculation than for fact posts.

A reasonable prior is enough to defeat random hypotheses like that.

The question isn't how heavy the burden of proof is. The question is who has that burden.

I think there should be a burden on the writer to make a coherent point, that is novel and either interesting or useful. That could include evidence for the central claims they are making, or just a logical argument using existing widely believed assumptions.

I don't think that means having citations for every single claim, especially ones that are reasonable, common-sense claims. If a commenter wants to present strong evidence that commonsense claim X is false, that's fine, but what I have seen (at least a few years ago) is someone merely pointing out that you don't have evidence causing the writer to get downvoted and lose the benefits of being promoted or getting karma.

I think it is somewhat important (it affects incentives for writers to write), but I think it's a symptom of something very important: namely, that the comment is motivated by a desire to win an argument rather than by a desire to find out what's true. If you want to win an argument you nitpick the weakest part of a post; if you want to find out what's true you update on the strongest part.

I think it would be amazing if LW could in fact become a community of people primarily motivated by a desire to find out what's true (appropriately tempered by a desire to actually do something with all those truths), but it's unclear how realistic this desire is. Another approach more in line with the kind of stuff Paul's been thinking about is to think about mechanisms for getting good comments out of people who just want to win arguments.

the comment is motivated by a desire to win an argument rather than by a desire to find out what's true

Not necessarily. I would expect that sometimes (maybe even often in the LW environment) nitpicking is motivated by the OCD-ish desire for perfection and completeness: you see, say, a valid argument with a small misstep along the way: you would be motivated to correct that misstep so that the entire post is without blemish.

What if authors of the nitpicking comments had an option to label them as "nitpicking", which would even result in the comment being displayed with smaller font? (And the whole subthread below the "nitpicking" comment would automatically be labeled "nitpicking" by the system.) Maybe that would introduce some extra self-awareness to the debate.

More generally, support many different labels, such as "nitpicking" or "suggestion / new idea" or "expression of gratitude" or whatever... and have them visually represented by e.g. colored frame around the comment. (For example, this comment would be classified as "suggestion".) In a way that makes it for System 1 easy to recognize it, so as a reader you can decide to e.g. skip all the nitpicking comments, or all the social comments.

When I nitpick, I sometimes enclose my entire comment in parentheses, and/or explicitly write "nitpick", as a manual approach to this.

Would be nice to have a more general solution in the form of a robust tagging system + ways for a user to filter (or apply colour/flair/decorations) to comments based on tags.

In offline discussions with other LWers the issue of excessive nitpicking has come up often. This problem has essentially always been present. If you go back and look at the comments on Eliezer's original Sequence posts, many are incredibly inane and superficial.

It strikes me that some people do not ever ask themselves, "Is this comment I'm about to post going to be useful? Is it necessary?" I've always consciously aimed to comment only if I felt that my comment was good/useful enough to get upvoted. Pure nitpicks usually sit at 0 or -1 karma. I have, of course, thought that my comment was good/useful and received no upvotes, or several downvotes. That situation provides a great opportunity to learn something.

But surely the absense of nitpicking is less important than the presence of substantial posts. Nitpicks can always be ignored, or sorted by karma, etc.

I don't think it's an important problem. Nitpicks can be annoying, but (1) you don't have to reply to nitpicking comments; and (2) if all you get are nitpicks, this is evidence that no one wants to (or can) attack your core claims.

While (2) is true, an environment crowded with nitpickers means that the expected value of the average comment in the community declines. This may continue past the point that you stop expecting any of the discussion to be worthwhile.

Also, pedantic nitpickers feed on each other.

My impression is that "the expected value of the average comment" is predominantly the function of the average IQ of the commentariat and only in a very minor way the function of their propensity to nitpick.

This violates Grice's maxims of quantity and relation.

A lot of the content of communication is about which things are said and which are unsaid. Promoting an issue to someone's attention is privileging the hypothesis. People have social intuitions that take this into account even when they can't articulate them. In general, if the response you get to something you've written is largely composed of annoyances, it's a very reasonable response to downregulate that behavior - plus, it'll happen automatically, whether reasonable or not. So if we want more high-quality writing, we should not be happy with an environment that rewards writing with serious flaws, but only annoys the best writers.

I think disagreement on something like the above is pretty key to why other folks here have expressed frustration about your commenting. You seem to be ignoring important social norms that, when they function well, reward virtue and punish vice, instead making claims on others' attention even when they're not justified by relevance.

Incidentally, I consider this good evidence for the merits of claim-tagging, since I think both your comments on this post so far are highly relevant, and I'm affirmatively glad you wrote them!

First, the question isn't whether nitpicking is good or bad. It is bad by definition since the word carries negative connotations (the same meaning with positive connotations would be called something like "careful and thorough detail-oriented assessment"). The question is whether nitpicking is important and I haven't seen data or convincing arguments that it is.

Second, when you write "largely composed of annoyances" and "we should not be happy with an environment that rewards writing with serious flaws, but only annoys the best writers" you implicitly assume that most comments are nitpicks. There is no reason to make such an assumption (and where does "rewarding" come from, anyway?).

You seem to be ignoring important social norms

Which important social norms are they? and of which society?

You have been noticeably not commenting. Care to comment why?

I've been busy. To be frank, hanging out at LW isn't the most productive use of time, so I don't want to deliberately redirect my attention here. We'll see how it goes.

Anyway, yes, I agree with this.

Claim 4: Explicitly tagging the core claims of a post will make people substantially more likely to respond to these claims.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

I sometimes end blog posts with enumerated "discussion questions". I get vastly higher comment rates and vastly better discussions when I do this. I don't know why I don't always do this, actually.

Hypothesis: there's a lot of "intellectual dark matter" in the form of thoughts people have (or could easily have if prompted) that they don't share, but would share if nudged.

Twitter.

Though I'd probably call it anti-matter, though :-/

At least it decreases the probability that people will start commenting on one claim of the post, and forget that the other are there, too.

Claim: selection mechanisms are much more powerful than treatment effects.

I think people are using the heuristic: If you want big changes in behavior, focus on incentives.

This is more speculative, but I feel like the most legible interventions are using something up. Charity Science: Health is producing very promising results with SMS vaccine reminders in India, but that’s because the system already had some built in capacity to use that intervention (a ~working telephone infrastructure, a populace with phones, government health infrastructure, medical research that identified a vaccine, vaccine manufacture infrastructure… are you noticing a theme here?). [...] Having that capacity and not using it was killing people. But I don’t think that CS’s intervention style will create much new capacity. For that you need inefficient, messy, special snowflake organizations.

I'd guess that treatment effects seem less powerful than selection effects of equal importance because treatment effects are typically more capacity-building loaded.

I wonder if claim tagging/comments also helps your writing too. Making you more conscious of what you are claiming and how justified it is.

Another possibly useful exercise is to enumerate some of the bits of evidence you would expect to refute those claims.

Claim 5: Claim-tagging is worth trying more broadly (because of claims 3,4).

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

Claim 1: Location on the comments-links continuum is an important aspect of discourse design.

If you want to discuss this claim, I encourage you to do it as a reply to this comment.

Yes, to my mind even obviously so.

Some concepts that relate to this:

Pacing: comments are relatively fast-flowing, while exchange of blog posts is more ponderous
Chunkiness: comments are smaller, blog posts are bigger
Control: comments are at the mercy of forum moderators, blog posts are your own domain