Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: ciphergoth 19 December 2016 08:26:38PM 0 points [-]

It's hard to be attack resistant and make good use of ratings from lurkers.

The issues you mention with ML are also issues with deciding who to trust based on how they vote, aren't they?

It's hard to make a strong argument for "shouldn't be allowed as a user setting". There's an argument for documenting the API so people can write their own clients and do whatever they like. But you have to design the site around the defaults. Because of attention conservation, I think this should be the default, and that people should know that it's the default when they comment.

Comment author: Wei_Dai 20 December 2016 05:05:12AM 1 point [-]

The issues you mention with ML are also issues with deciding who to trust based on how they vote, aren't they?

If everyone can see everyone else's votes, then when someone who was previous highly rated starts voting in an untrustworthy manner, that would be detectable and the person can at least be down-rated by others who are paying attention. On the other hand, if we had a pure ML system (without any manual trust delegation) then when someone starts deviating from their previous voting patterns the ML algorithm can try to detect that and start discounting their votes. The problem I pointed out seems especially bad in a system where people can't see others' votes and depend on ML recommendations to pick who to rate highly, because then neither the humans nor ML can respond to someone changing their pattern of votes after getting a high rating.

Comment author: cousin_it 19 December 2016 10:39:53AM *  3 points [-]

A few days ago I had a crazy idea that counting numbers (first, second, etc) should start with zeroth instead of first. Not only would that help with teaching programming, but also many real world confusions and off-by-one errors would go away. Compare this:

The first year of the third millennium begins at 2001/01/01 00:00:00

to this:

The zeroth year of the 2nd millennium begins at 2000/00/00 00:00:00

Of course that would never happen, but as a thought experiment, I rest my case :-)

Comment author: Wei_Dai 19 December 2016 07:37:27PM 9 points [-]

With this change, you could no longer say "now that we've found the fourth object, we finally have four objects". You'd have to remember to say "now that we've found the third object, we finally have four objects" which seems to be a new opportunity to make off-by-one error. It's not clear to me that you'd be fixing more errors than you're introducing. I echo WhySpace's request for more examples of things you're trying to fix with this change.

Comment author: ciphergoth 18 December 2016 10:29:38PM 0 points [-]

You should rate highly people whose judgment you would trust when it differed from yours. We can use machine learning to find people who generate similar ratings to you, if the need arises.

I thought about the Slashdot thing, but I don't think it makes the best use of people's time. I'd like people reading only the innermost circle to be able to basically ignore the existence of the other circles. I don't even want a prompt that says "7 hidden comments".

Comment author: Wei_Dai 19 December 2016 12:28:48PM 0 points [-]

You should rate highly people whose judgment you would trust when it differed from yours. We can use machine learning to find people who generate similar ratings to you, if the need arises.

It would be much harder to decide whose judgment I would trust, if I couldn't see how they rated in the past. I'd have to do it only based on their general reputation and their past posts/comments, but what if some people write good comments but don't rate the way I would prefer (for example they often downvote others who disagree with them)? The system would also essentially ignore ratings from lurkers, which seems wasteful.

If we use ML to find people who generate similar ratings, that seems to generate bad incentives. When your user rating is low, you're incentivized to vote the same way as others, so that ML would pick you to recommend to people, then when your rating is high, you'd switch to voting based on your own opinions, which might be totally untrustworthy, but people who already rated you highly wouldn't be able to tell that they should no longer trust you.

I thought about the Slashdot thing, but I don't think it makes the best use of people's time.

Aside from the issue of weird incentives I talked about earlier, I would personally prefer to have the option of viewing highly rated comments independent of parent ratings, since I've found those to be valuable to me in other systems (e.g., Slashdot and the current LW). Do you have an argument why that shouldn't be allowed as a user setting?

Comment author: Wei_Dai 19 December 2016 11:56:57AM 8 points [-]

FHI and MIRI both hold workshops that are logistically very expensive, both for the organizations and for the participants. Structurally a workshop discussion seems very similar to a LessWrong: you have one person make a short presentation about a topic or a question, then discussion ensues. It seems like a lot of resources could be saved if some of those discussions were moved online entirely, or if the participants had online discussions ahead of time to iron out some basic issues / common misunderstandings so that the workshop could focus on more important issues. We should ask what features would enable that.

I think one thing that would help is to have private posts that can only be viewed and commented on by invitation (by both username and email address so people can be invited even before they sign up to LW). I guess that people are often reluctant to post on LW because they're not ready for their ideas to be seen in public yet. For example their idea is only half-formed and it wouldn't make sense yet to take the effort of making it understandable to people outside a small circle. Or they're not sure the idea is correct and don't want to take a public reputation hit in case it's not.

I suggest having the option of making a private post public at any time, so productive discussions can be later viewed and joined in by the public, for example after the initial poster decides there aren't embarrassing holes in their ideas, or has had a chance to edit their post for public consumption. Each commenter should be able to mark their comment as private or inherit, where private would hide their comment from the public even if the opening post is made public.

Comment author: ciphergoth 16 December 2016 04:01:33PM 1 point [-]

Can you say something about who would be able to see the individual ratings of comments and users?

Only people who police spam/abuse; I imagine they'd have full DB access anyway.

What do you see are the pros and cons of this proposal vs other recent ones.

An excellent question that deserves a longer answer, but in brief: I think it's more directly targeted towards the goal of creating a quality commons.

What's the reason for this?

Because I don't know how else to use the attention of readers who've pushed the slider high. Show them both the comment and the reply? That may not make good use of their attention. Show them the reply without the comment? That doesn't really make sense.

Note that your karma is not simply the sum or average of the scores on your posts; it depends more on how people rate you than on how they rate your posts.

This seems to create an opening for attack.

Again, the abuse team really need full DB access or something very like it to do their jobs.

Can you point to an intro to attack resistant trust metrics

The only adequate introduction I know of is Raph Levien's PhD draft which I encourage everyone thinking about this problem to read.

Why would it be annoying?

When an untrusted user downvotes, a trusted user or two will end up being shown that content and asked to vote on it; it thus could waste the time of trusted users.

Comment author: Wei_Dai 17 December 2016 10:26:55AM 0 points [-]

Thanks for the clarifications.

Only people who police spam/abuse [would be able to see the individual ratings of comments and users]

That would make it hard to determine which users I should rate highly. Is the idea that the system would find users who rate similarly to me and recommend them to me, and I would mostly follow those recommendations?

Because I don't know how else to use the attention of readers who've pushed the slider high.

Slashdot shows all the comments in collapsed mode and auto expands the comments that are higher than the filter setting. We can do that, or have a preference setting that let's the user choose what to do, whether to do that or just hide comments that reply to something rated lower than their filter setting.

Comment author: Wei_Dai 16 December 2016 09:11:24AM *  6 points [-]

Interesting proposal. Can you say something about who would be able to see the individual ratings of comments and users? What do you see are the pros and cons of this proposal vs other recent ones.

the site will never rate a response higher than its parent, or a top-level comment higher than the post it replies to

What's the reason for this? It seems to lead to some unfortunate incentives for commenting. Suppose someone posts a new and exciting idea and you find a subtle but fatal flaw with it. If you comment right away then people will realize the flaw and not rate the post as high as they otherwise would, which would limit the rating of your comment, so the system encourages you to wait until the post is rated higher before commenting.

More generally this feature seems to discourage people from commenting early, before they're sure that the post/comment they're responding to will be rated highly.

Content ratings above 2 never go down, except to 0; they only go up. Thus, the content in these circles can grow but not shrink, to create a stable commons.

This seems to create an opening for attack. If an attacker gets a high enough rating to unilaterally push content from 2 to 3 stars, they can sprinkle a lot of spam throughout the site, rate them to 3 stars, and all of that spam would have to be individually marked as such even if the attacker's rating is subsequently reduced.

Trust flows from these users using some attack resistant trust metric.

Can you point to an intro to attack resistant trust metrics. I think a lot of people are not familiar with them.

Downvoting sprees from untrusted users will thus be annoying but ineffective.

Why would it be annoying? I'm not sure I understand what would happen with such a downvoting spree.

Combining Prediction Technologies to Help Moderate Discussions

12 Wei_Dai 08 December 2016 12:19AM

I came across a 2015 blog post by Vitalik Buterin that contains some ideas similar to Paul Christiano's recent Crowdsourcing moderation without sacrificing quality. The basic idea in both is that it would be nice to have a panel of trusted moderators carefully pore over every comment and decide on its quality, but since that is too expensive, we can instead use some tools to predict moderator decisions, and have the trusted moderators look at only a small subset of comments in order to calibrate the prediction tools. In Paul's proposal the prediction tool is machine learning (mainly using individual votes as features), and in Vitalik's proposal it's prediction markets where people bet on what the moderators would decide if they were to review each comment.

It seems worth thinking about how to combine the two proposals to get the best of both worlds. One fairly obvious idea is to let people both vote on comments as an expression of their own opinions, and also place bets about moderator decisions, and use ML to set baseline odds, which would reduce how much the forum would have to pay out to incentivize accurate prediction markets. The hoped for outcome is that the ML algorithm would make correct decisions most of the time, but people can bet against it when they see it making mistakes, and moderators would review comments that have the greatest disagreements between ML and people or between different bettors in general. Another part of Vitalik's proposal is that each commenter has to make an initial bet that moderators would decide that their comment is good. The article notes that such a bet can also be viewed as a refundable deposit. Such forced bets / refundable deposits would help solve a security problem with Paul's ML-based proposal.

Are there better ways to combine these prediction tools to help with forum moderation? Are there other prediction tools that can be used instead or in addition to these?


Comment author: Lumifer 05 December 2016 04:14:40PM 0 points [-]

These days I think the answer is actually wrong

How so? Since security cannot be absolute, the threat model is basically just placing the problem into appropriate context. You don't need to formalize all the capabilities of attackers, but you need to have at least some idea of what they are.

and think, ok, we're secure under this threat model, hence we're probably secure

That's actually the reverse: hardening up under your current threat models makes you more secure against the threats you listed but doesn't help you against adversaries which your threat model ignores. E.g. if you threat model doesn't include a nation-state, you're very probably insecure against a nation-state.

Comment author: Wei_Dai 05 December 2016 10:36:32PM 2 points [-]

You don't need to formalize all the capabilities of attackers, but you need to have at least some idea of what they are.

But you usually already have an intuitive idea of what they are. Writing down even an informal list of attackers' capabilities at the start of your analysis may just make it harder for you to subsequently think of attacks that use capabilities outside of that list. To be clear, I'm not saying never write down a threat model, just that you might want to brainstorm about possible attacks first, without having a more or less formal threat model potentially constrain your thinking.

Comment author: paulfchristiano 02 December 2016 06:08:40PM *  1 point [-]

How do you expect this to happen?

I think there are two mechanisms:

  • Public image is important to companies like Facebook and Google. I don't think that they will charge for a user-aligned version, but I also don't think there would be much cost to ad revenue from moving in this direction. E.g. I think they might cave on the fake news thing modulo the proposed fixes mostly being terrible ideas. Optimizing user preferences may be worth it in the interests of a positive public image alone.
  • I don't think that Facebook ownership and engineers are entirely profit-focused, they will sometimes do things just because they feel like it makes the world better at modest cost. (I know more people in Google and am less informed about FB.)

Relating the two, if e.g. Google organized its services in this way, if the benefits were broadly and understood, and if Facebook publicly continued to optimize for things that its users don't want optimized, I think it could be bad for the image of Facebook (with customers, and especially with hires).

I'd be quite surprised if any of these happened.

Does this bear on our other disagreements about how optimistic to be about humanity? Is it worth trying to find a precise statement and making a bet?

I'm probably willing to give > 50% on something like: "Within 5 years, there is a Google or Facebook service that conducts detailed surveys of user preferences about what content to display and explicitly optimizes for those preferences." I could probably also make stronger statements re: scope of adoption.

And why isn't it a bad sign that Facebook hasn't already done what you suggested in your post?

I think these mechanisms probably weren't nearly as feasible 5 years ago as they are today, based on gradual shifts in organization and culture at tech companies (especially concerning ML). And public appetite for more responsible optimization has been rapidly increasing. So I don't think non-action so far is a very strong sign.

Also, Facebook seems to sometimes do things like survey users on how much they like content, and include ad hoc adjustments to their optimization in order to produce more-liked content (e.g. downweighting like-baiting posts). In in some sense this is just a formalization of that procedure. I expect in general that formalizing optimizations will become more common over the coming years, due to a combination of increasing usefulness of ML and cultural change to accommodate ML progress.

Comment author: Wei_Dai 05 December 2016 06:56:13AM 0 points [-]

I'm probably willing to give > 50% on something like: "Within 5 years, there is a Google or Facebook service that conducts detailed surveys of user preferences about what content to display and explicitly optimizes for those preferences."

The Slate article you linked to seems to suggest that Facebook already did something like that, and then backed off from it:

"Crucial as the feed quality panel has become to Facebook’s algorithm, the company has grown increasingly aware that no single source of data can tell it everything. It has responded by developing a sort of checks-and-balances system in which every news feed tweak must undergo a battery of tests among different types of audiences, and be judged on a variety of different metrics. ..."

"At each step, the company collects data on the change’s effect on metrics ranging from user engagement to time spent on the site to ad revenue to page-load time. Diagnostic tools are set up to detect an abnormally large change on any one of these crucial metrics in real time, setting off a sort of internal alarm that automatically notifies key members of the news feed team."

I think concern about public image can only push a company so far. Presumably all the complaints we're seeing isn't news to Facebook. They saw it coming or should have seen it coming years ago, and this is what they've done, which seems like the best predictor of what they'd be willing to do in the future.

If I understand correctly, what you're proposing that's different from what Facebook is already doing are 1) fully automated end-to-end machine learning optimizing only for user preferences and specifically not for engagement/ad revenue, 2) optimizing for preferences-upon-reflection instead of current preferences, and maybe 3) trying to predict and optimize for each user's individual preferences instead of using aggregate surveyed preferences (which is what it sounds like Facebook is currently doing).

  1. seems unlikely because Facebook ultimately still cares mostly about engagement/ad revenue and are willing to optimize for user preference only so far as it doesn't significantly affect their bottom line. So they'll want to either maintain manual control to override user preference when needed, or not purely target user preference, or both.
  2. might happen to some greater extent. But presumably there are reasons why they haven't done more in this direction already.
  3. I think Facebook would be worried that doing this will make them even more vulnerable to charges of creating filter bubbles and undermining democracy, etc.
Comment author: paulfchristiano 03 December 2016 09:38:05PM *  2 points [-]

Just to highlight where the theoretical analysis goes wrong:

  • We have some tradeoff between "letting spam through" (of the type these attackers are posting) and "blocking good content."
  • The attackers here are able to create arbitrary amounts of spam.
  • So the worst case is already arbitrarily bad. (Assuming our loss function is really a sum over posts.)

So the issue is mostly incentives: this gives an incentive for an attacker to generate large amounts of innocuous but quality-lowering spam. It still doesn't make the worst case any worse, if you had actual adversarial users you were screwed all along under these assumptions.

In my dissertation research I usually make some limiting assumption on the attacker that prevents this kind of attack, in particular I assume one of:

  • At least some small fraction (say 10%) of users of the system are honest---the attacker can't completely overwhelm honest users.
  • We have access to an external social network, and at least some small fraction (say 10%) of friends of honest users are honest---the attacker can't completely swamp the social networks of honest users.

Under these conditions we can potentially keep the work per honest user modest (each person must stomp out 10 crappy responses). Obviously it is better if you can get the 10% up to 50% or 90%, e.g. by imposing a cost for account creation, and without such costs it's not even clear if you can get 10%. Realistically I think that the most workable solution is to mostly use outside relationships (e.g. FB friendships), and then to allow complete outsiders to join by paying a modest cost or using a verifiable real-world identity.

I haven't analyzed virtual moderation under these kinds of assumptions though I expect we could.

I agree that virtual moderation may create stronger incentives for spam+manipulation and so hasten the day when you need to start being more serious about security, and that over the short term that could be a fatal problem. But again, if there is someone with an incentive to destroy your forum and they are able to create an arbitrary number of perfect shills, you need to somehow limit their ability anyway, there just isn't any way around it.

(For reference, I don't think the LW shills are near this level of sophistication.)

Comment author: Wei_Dai 04 December 2016 11:30:29PM *  3 points [-]

The first question of my first security-related job interview was, "If someone asked you to determine whether a product, for example PGP, is secure, what would you do?" I parroted back the answer that I had just learned from a book, something like, "First figure out what the threat model is." The interviewer expressed surprise that I had gotten the answer right, saying that most people would just dive in and try to attack the cryptosystem.

These days I think the answer is actually wrong. It's really hard to correctly formalize all of the capabilities and motivations of all potential adversaries, and once you have a threat model it's too tempting to do some theoretical analysis and think, ok, we're secure under this threat model, hence we're probably secure. And this causes you to miss attacks that you might have found if you just thought for a few days or months (or sometimes just a few minutes) about how someone might attack your system.

In this case I don't fully follow your theoretical analysis, and I'm not sure what threat model it assumed precisely, but it seems that the threat model neglected to incorporate the combination of the motivation "obtain power to unilaterally hide content (while otherwise leaving the forum functional)" and the capability "introduce new content as well as votes", which is actually a common combination among real-world forum attackers.

View more: Next