Sometimes people have something they want to say without it being traceable back to their main identity, and on internet forums it's common for people to use multiple accounts ("alts") for this. As machine learning software gets better, however, it becomes increasingly practical to link a person's accounts.

A few months ago someone ran a simple stylometry tool across Hacker News comments, and identified many alts, including one for the site's founder. To further demonstrate that this isn't just an academic consideration, I recently did the same for LessWrong and the EA Forum. I'm not going to share the code or the probabilities it generated, and I've only looked at the output enough to be pretty sure it's working. Trained on half of the data and tested on the other half it was consistently able to link accounts, however, and it was also able to identify at least one non-public alt account I already knew about.

This is an example of a general problem with privacy: even if something seems careful enough now, you can't trust the future to keep things private.

(If you do want somewhat more protection now, however, I think best practice is running your alt comments through an LLM to change the style.)

Comment via: facebook, mastodon

New Comment
33 comments, sorted by Click to highlight new comments since:

Personally, I'm a bit unsure about the ethics of this. I understand that you’re not planning to publicly deanonymize the accounts, and I assume you don’t plan to do so privately either.

But I can imagine having more barriers for people to post things “anonymously” (or having them feel less safe when trying to do so) to heavily discourage some of the potentially most useful cases of anonymous accounts.[1] I also expect, as you mention, that some who posted anonymously in the past would not appreciate being privately de-anonymized by someone.

It seems meant to be a demonstration, but I don’t see why people wouldn’t expect this to work on LW/EAF, given that it worked on HN? I also think that people might be worried about you in particular deanonymizing them, given how central you are in the EA community and how some people seem to be really worried about their reputation/potential repercussions for what they post.

  1. ^

    Interestingly, I was just reading this comment from a user updating on the value of anonymous accounts.

[-]jefftk3223

I understand that you’re not planning to publicly deanonymize the accounts, and I assume you don’t plan to do so privately either.

That's right: I stopped once I was reasonably sure this approach worked. I'm not planning to do more with this. I made my scraping code open source, since scraping is something LW and the EA Forum seem fine with, but my stylometry code is not even pushed to a private repo.

I can imagine having more barriers for people to post things “anonymously” (or having them feel less safe when trying to do so) to heavily discourage some of the potentially most useful cases of anonymous accounts. ... some people seem to be really worried about their reputation/potential repercussions for what they post

I think maybe we're thinking about the risks differently? I think it's common that people who post under an alt think that they're keeping these identities pretty separate, and do not think someone could connect them with a few hours of playing with open source tools. And so it's important to make it public knowledge that this approach is not very private, so people can take more thorough steps if better privacy is something they want. Keep in mind that this will only get easier: we're not far from someone non-technical being able to ask GPT4 to write the code for this. (Possibly that would already work?)

Specifically on the "feel less safe", if people feel safer than they actually are then they're not in a position to make well-considered decisions around their safety.

I don’t see why people wouldn’t expect this to work on LW/EAF, given that it worked on HN?

I could have posted "here's a thing I saw in this other community", but my guess is people would take it less seriously, partly because they think it's harder than it actually is. And so "here's a thing, which didn't end up being very hard" is much more informative.

Hmm. I wonder if having an LLM rephrase comments using the same prompt would stymie stylometric analysis.

You could have an easy checkbox "rewrite comments to prevent stylometric analysis" as a setting for alt accounts.

I think it's reasonably common that people who post under an alt think that they're keeping these identities pretty separate, and do not think someone could connect them with a few hours of playing with open source tools. And so it's important to make it public knowledge that this approach is not very private, so people can take more thorough steps if better privacy is something they want.


I agree with this. I think sometimes people are pretty clueless. E.g. people post under their first name and use the same IP. (There is at least one very similar recent example, but I can’t link to it.)

I think that a PSA about accounts on LW/EAF/the internet often not being as anonymous as people think could be good, and should mention stylometry, internet archives, timezones, IP addresses, user agents, browser storage; and suggest using TOR, making a new account for every post/comment, scheduling messages at random times, running comments through LLMs, not using your name or leaking information in other ways, and considering using deliberate disinformation (e.g. pretending to be of the opposite gender, scheduling messages to appear to be in a different timezone, …)

Specifically on the "feel less safe", if people feel safer than they actually are then they're not in a position to make well-considered decisions around their safety.

I think this is a very good point.

I could have posted "here's a thing I saw in this other community", but my guess is people would take it less seriously, partly because they think it's harder than it actually is.

I’m not sure about this. I think you could have written that there are very easy ways to deanonymize users, so people who really care about their anonymity should do the things I mentioned above?

I think maybe we're thinking about the risks differently?

Possibly, I think I might be less optimistic that people can/will, in practice, start changing their posting habits. And probably I think it’s more likely that this post lowers the barrier for an adversarial actor to actively deanonymize people. It reminds me a bit of the tradeoffs you mentioned in your previous post on security culture.

I think it was a good call not to post reproducible code for this, for example, although it might have made it clearer how easy it is and strengthened the value of the demonstration.
 

I'm not planning to do more with this. I made my scraping code open source, since scraping is something LW and the EA Forum seem fine with, but my stylometry code is not even pushed to a private repo.

Thank you for this, and I do trust you. On some level, anonymous users already had to trust you before this project, since it’s clearly something anyone with some basic coding experience would be able to do if they wanted, but I think now they need to trust you a tiny bit more, since you now just need to press a button instead of spending a few minutes/hours actively working on it.

 

In any case, I don’t feel strongly about this, and I don’t think it’s important, but I still think that, compared to an informative post without a demonstration, this post increases the probability that an adversarial actor deanonymizes people slightly more than the probability that anonymous users are protected from similar attacks. (Which often are even less sophisticated)

a PSA about accounts on LW/EAF/the internet often not being as anonymous as people think could be good, and mention stylometry, internet archives, timezones, IP addresses, user agents, browser storage; and suggest using TOR, making a new account for every post/comment, scheduling messages at random times, running comments through LLMs, not using your name or leaking information in other ways, and considering using deliberate disinformation (e.g. pretending to be of the opposite gender, scheduling messages to appear to be in a different timezone, …)

A lot of this depends on your threat model. For example, "IP addresses, user agents, browser storage; and suggest using TOR" aren't much of a concern if you're mostly just trying to avoid people who read your comments identifying you. But there is subtlety here, including that while you might trust the people running a forum you might not trust everyone who could legally get them to disclose this information.

Possibly, I think I might be less optimistic that people can/will, in practice, start changing their posting habits.

Maybe, though note that getting it out in the open allows us talking about ways to fix it, including things like convenient stylometry-thwarting tooling.

I also do expect my post here to change people's posting habits, though I suspect an embarrassing public linking of an alt to a main (where someone was doing something like criticizing a rival) would do even more.

It reminds me a bit of the tradeoffs you mentioned in your previous post on security culture.

Definitely! I think this one is pretty well within where computer security culture has the appropriate norms, though.

I'm pretty okay with this having been done 
- seems broadly consistent with cyber disclosure norms
- we know that there are people who have disputes with this community and a track record of looking to de-anonymize accounts, so relying on security through obscurity doesn't seem that compelling
- seems reasonable to view this post as helpful to would-be anonymous posters - better to find out about the possibility this way than to be surprised by someone publicly attempting to deanonymize you in future, and the suggestion of using an LLM to change the style seems constructive

I expected this to happen at some moment, it was just a question of when.

The next step is probably scraping the entire internet (not a project for you or me, but big companies like Google can do it and probably already do) and connecting all your accounts everywhere.

The next step is to feed all the texts from one person to some AI and ask it to provide a short summary. Something like "person X writes this here, and this there" and maybe make some guess why. (Are they writing about a topic on a separate website merely because it is a website related to that topic, or because they want to keep it secret? Underline the secrets using red color.) Compile a timeline, when did the person become interested in certain topics. Find the connections between people: who talks to whom on social networks; do they become interested in new topics at a similar time?

Sell to marketing agencies. Sell to HR departments. Use to blackmail businessmen and politicians.

Paranoid people have known about this for more than 10 years. There's already some software to help avoid this issue, but it seems rather sparse and to be made by small teams.

Agencies and large companies can already do all of this effortlessly, using your fingerprint information, tracking links, etc.
It's more difficult for regular people like us, and I'm not sure what to think about that. It's not like large companies are using any of this data responsibly, but users would definitely also use this software maliciously if it became mainstream.

I'm not sure why people on websites like this are helping technology advance, for total information is not going to look pretty, and the alternative would be worse still. 

 

By the way, even if you use chat gpt to rewrite your comments, don't you think that bot filters of the future will look for comments written by bots and remove them? In order to prevent spam and misinformation and such. 

Larger sites have teams dedicated to this, using stylistic, timing, IP and device fingerprinting, and every piece of data they can get their hands on. They do this for fraud-response reasons, so they can ban all accounts rather than just one.

The real question is what level of anonymity you expect, from whom and for how long. Using VPNs, different devices (or at least different browser profiles), and such are the minimum to keep it from the site admins. It’s unlikely to be satisfying to obfuscate long-form messages, and in most cases, nobody will care.

That said, most of this wouldn’t hold up in a criminal case. If it’s enough to make a tenuous connection and the liar isn’t ready to deny it, the liar isn’t trying very hard.

in most cases, nobody will care

I think this is unlikely to be true? If I posted a list of probable alt accounts (which I'm not doing) my guess is the people who had been using them would be really mad at me.

Interesting.  I suspect there's a crux there about "in most cases" and "nobody cares" on what timeframe and scale. You're probably right that people would be mad at you, but a lot of people would probably appreciate it as well.  I also think there are likely cruxes about identity and reputation that put us on different sides of the "pseudonymity for participants is a desirable feature of public discussions".  

This is a fabulous test case for intuitions about when it's OK to lie or omit truths that are accessible and sharable.  Also a good callback to all the discussions about extortion and whether you're performing a compensation-worthy service by protecting someone from public knowledge of truth they're ashamed of.

Since your a well respected member of the LW community, even if these individuals became incredibly angry at you and tried to defame you, it will likely blowback at them and be to their discredit in the end.

It is likely you would have to endure some intermediate period of verbal conflict though.

[-]jefftk1212

even if these individuals became incredibly angry at you and tried to defame you, it will likely blowback at them and be to their discredit in the end

I would hope that, if I posted a list, people's primary consideration in evaluating my actions wouldn't be my status in the community!

Well in an ideal world, yes I would hope so too, but in practice...

Reacting angrily to somebody doing something obnoxious like that is not "defamation".

I have zero influence here, but if I ran a site and a user did something like that, I would probably permaban them... and all their alts.

I (perhaps mistakenly) assumed that the angry people would claim that the list defames them, because it is (wrongly, or at least unprovably) saying they're sock-puppets, which they deny.  I don't know why "defame" would be used in the other direction, other than just common online retaliation of flinging shit.

I haven't followed any reaction to the HNN reveal, but I suspect the fallout was minimal - at first glance, it seems like false-positives are prevalent enough that a simple denial would be sufficient to shield people.  Or, if the pseudonym usage is harmless, a simple "yeah, I used to do that" and move on.

It's the case of real impact (a pseudo of a celebrity or powerful person being used for nefarious purposes) that I'd expect the denial to include defamation claims (against the list publisher).

Reacting angrily to somebody doing something obnoxious like that is not "defamation".

I agree that there's no necessary causal link ? 

There are hundreds of examples on LW alone of someone getting angry but only being self-destructive, and posing no harm to others.

What's the point of saying this though?

It struck me as very weird and specific to use the word "defame". That word has a really specific meaning, and it's not actually how I'd expect anybody to react, no matter how angry they were. It wouldn't be a concern of mine.

It also sounded to me as though you thought that publishing a list of people's alts would be a perfectly fine thing to do.

That's because I thought that the point of jefftk's saying "they would be really mad at me" was to imply that they would have a good reason to be mad. And if you don't think it's actually acceptable to publish the list, then the question of whether people's anger about that would be "survivable" doesn't really arise.

So I read you as saying that posting the list would be OK, and that anybody who objected would be in the wrong. In fact, because of the word "defamation", I read you as saying that anybody who objected would be the sort of person who'd turn around and run a campaign of smearing lies. Which is a pretty harsh viewpoint and one that I definitely do not share.

Frankly, it's a bit difficult to believe such a moral standard would be obeyed in practice for anywhere close to 100% of the readerbase. 

Since there have been past incidents of some individuals doing such things.

If Jefftk wants to publish a list of 'alts' that he believes to be correct, then I wouldn't gainsay it. So if that's your metric, then yes I think it 'would be a perfectly fine thing to do'.

Maybe that would be differently if 2nd, 3rd, etc., 'alt' accounts were explicitly condoned in the site rules. But I'm pretty sure the mods are heavily against anyone making multiple accounts in secret.

If you still can't accept this stance, then you probably should take it up with them.

Frankly, it's a bit difficult to believe such a moral standard would be obeyed in practice for anywhere close to 100% of the readerbase.

I'm not saying that I expect everybody to refrain from defamation as a matter of morality. It's just that that wouldn't be a very effective response in that particular case, and it's not the most obvious way that I would expect anybody to respond to that particular issue "in the heat of the moment".

It wouldn't be effective because if A posts that B and C are the same person, B coming back right away and saying that A is a squirrel molester is too obviously retaliatory, won't be believed, and is probably going to make A's original claim more credible.

Regardless of effectiveness, in my experience it seems as though most people who resort to smear campaigns do it because of a really fixed hatred for somebody. It's true that publishing the list could be the start of a long-term enmity, and that that could end with speading lies about a person, but usually that only happens after a long history of multiple different incidents.

Even so, I'm not saying that it couldn't happen... just that it seems strange to single it out among all the things that could happen. I would expect righteous-indignation types of responses much more often.

Maybe that would be differently if 2nd, 3rd, etc., 'alt' accounts were explicitly condoned in the site rules. But I'm pretty sure the mods are heavily against anyone making multiple accounts in secret.

Maybe I'm behind the times, but my understanding is that the norm on Internet forums, especially on non-corporate ones, is that multiple accounts are allowed unless explicitly forbidden. Not multiple abusive accounts, but most multiple accounts aren't abusive.

Also, if the core team on Less Wrong, specifically, categorically didn't want people to have multiple accounts, it would be very out of character for them not to write that down, regardless of what other sites do. That's just not how I've seen them to run things. They seem to be all about making sure people understand expectations.

I don't see anything about it in the FAQ, nor does it seem to appear in at least the first stage of the sign-up process. I do see a rule against using multiple accounts to evade bans. I'd be surprised to see that rule written the specific way it is if the intent were to forbid multiple accounts entirely. I also see rules against gaming the metrics in ways that would really be aided by multiple accounts... and yet those rules don't specifically mention multiple accounts.

Even if the mods were opposed, though, I think their best response to that sort of thing would be to take it up with the user, and ban either all but one of the accounts, or all of the accounts. And the right response for a non-moderator would be to report it to the mods and let them handle it. Especially because when people do have alternate names, it can often be for reasons that (a) you don't know about and (b) can involve risks of real harm.

The exception to that would be if there'd been been some kind of egregious activity that would clearly hurt community members if not exposed.

I can't see mass public disclosure fitting with the general ethos of this particular site. In fact I think this site is the sort of place where it fits least. It feels more in place on Hacker News. I don't know, but I wouldn't be surprised if they'd take it in stride on 4Chan. But on Less Wrong?

Have you asked them? 

If so, what's their response regarding these points?

Look upthread a few posts.

 Can you link to it?

I don't see any remarks from the mods posted.

It appears that is habyrka's personal opinion. Or at least it would be an odd way of announcing a new/updated/revised rule from the mod team.

I'm pretty sure the mods are heavily against anyone making multiple accounts in secret.

I won't speak for them, but I don't think they should be. There are certainly bad reasons to create alternative accounts (ex: sockpuppetry) but there are also good reasons. For example, telling the world about a bad situation where if it were known that you were the one disclosing the information you would be subject to retaliation. Or someone who is relatively famous within the community and whose comments are treated as carrying a lot of weight wanting to participate in a thread without people taking them so seriously.

I can't find the link but I have a distinct recollection of a LW mod commenting that they're happy for people to make multiple accounts (including anon ones) as long as they're not manipulating voting mechanisms, e.g. upvoting the same comment using multiple accounts.

Yep, that's right! Please don't abuse the voting system, but overall we are happy for people to make multiple accounts, try to keep separate brands and identities for different topics you want to discuss, etc. (e.g. I think it would be pretty reasonable for someone to have an account where they discuss community governance stuff and get involved in prosecuting a bunch of bad behavior, and another account where they make AI Alignment contributions, without people knowing that they are the same person).

If someone's doing it just to keep threads and brands separate, it's not terribly harmful when (not "if") it gets published that these are alts of the same person, right?  It seems like the anger over exposure is directly proportional to the advantage (in rhetoric or reputation) gained by the deception.

I fully support the policy of LW to allow it, and I ALSO support interested people who choose to find and publish the links.  Neither side seems to have the entirety of the moral high ground.

And to the OP's point, such pseudonymity is unlikely to last forever, and those who are depending on the ruse should start deleting those things they don't want linked to each other now.

If Person X thinks it would be bad for their reputation if they publicly said Y, your comment seems to have a vibe that this negative hit to their reputation is deserved, and that they are somehow cheating by saying Y without getting associated with that. If so, I strongly disagree: See Paul Graham’s “What You Can’t Say”.

I agree that, as a practical matter, “such pseudonymity is unlikely to last forever…” and that it would be prudent for Person X to not assume otherwise. But I see that as an unfortunate thing, and I feel sad if it’s true. And I for one very strongly condemn people doxxing pseudonymous alts. Just because it’s possible, doesn’t mean it will inevitably happen. If we spread the idea that it’s bad, the probability goes down on the margin.

I’m not sure why you’re using the word “deception” here. If Person X wants to say Y, but not be publicly associated with it, so they say Y under a pseudonym, I wouldn’t describe that as “deception”. Right? Who is being deceived? If the pseudonym is “butterfly_lover_895” or whatever, nobody is “deceived” into thinking that’s somebody’s legal name. Likewise, most people say things anonymously on the internet sometimes; nobody has a reason to assume that Person X doesn’t do that too. So again, where’s the “deception”?

I wouldn't say "deserved" - I'm not sure I understand or support reputational systems in a way that lets me assert deserving or "should" in any direction.  It is predictable that there would be a negative hit to one's reputation (and the use of pseudoym in the first place indicates that the prediction has been acknowledged).     

It's also very clearly a deception - it's entire intent (at least in the case where someone would be upset when outed) is to mislead observers into believing there are two or more independent humans, each with separate reputation and opinions.  Whether it's a deception born of evading an unjust judgement is not relevant to the fact that it's intended to foster false impressions.

Note that anonymous is very different than pseudonymous.  If it's labeled as <unidentified>, and there's no tie to other posts implying that there's a thread of belief/opinion between different anonymous posts, that's pretty open.  When it's a persistent pseudonym across multiple posts, taking advantage of reputation that the pseudonym accrues, but avoiding the disadvantage of having to reconcile across pseudonyms of the same human, it's absolutely deceptive.

I say this as a very long-time pseudonymous poster.  I've used this handle/'nym/username/ID since before the Internet, and feel no shame for keeping a little bit of separation between my online activities and my meatspace interactions.  I don't (mostly) use multiple IDs on the same site, and I won't be terribly upset if someone discovers my offline self.  These are important criteria for whether it's used for deception or just preference.

  • If someone comments from multiple unlinked accounts in the same comment thread (and to a lesser extent in the same article), then I agree with you that that’s going to “mislead observers into believing there are two or more independent humans”.
  • If someone is making Trump-related comments on Trump-related lesswrong blog posts from one alt, and making ML-related comments on ML-related lesswrong blog posts from a different alt, then that doesn’t seem to have that problem, IMO. Like, probably nobody will form an incorrect opinion about the accounts being different people, nor a correct opinion that they’re the same person. They just won’t be thinking about it at all. There’s no reason to.

I hope there aren't very many LW posts on which Trump-related comments are relevant.  But even if so, if there's no deception involved (nobody will incorrectly separate those two accounts and treat them as distinct people), then there's ALSO no harm in revealing the truth (the same human made those posts on two different threads).  

The anger at being outed is directly proportional to the perceived benefit of the misleading separation of identity.

I'm not sure where (if at all) we disagree.  I think it's clear that the underlying truth is that the same person made posts on both threads.  It seems pretty clear (to me) that the poster used different accounts to obfuscate this truth.  Where we MAY disagree is in the motivation for this, and the appropriate response to being discovered.  I argue that IF the poster is hurt or angry at it becoming well-known that the same person used both aliases, THEN the poster (believes that they) benefitted from the (false) assumption that there were two distinct people.  They are angry at this loss of benefit from an encouraged false belief.  

I call that "intentional deception".  I don't have a very strong opinion on whether it's justified in some or all cases, but I also don't object to someone discovering and publishing the truth.