Sometimes people have something they want to say without it being traceable back to their main identity, and on internet forums it's common for people to use multiple accounts ("alts") for this. As machine learning software gets better, however, it becomes increasingly practical to link a person's accounts.
A few months ago someone ran a simple stylometry tool across Hacker News comments, and identified many alts, including one for the site's founder. To further demonstrate that this isn't just an academic consideration, I recently did the same for LessWrong and the EA Forum. I'm not going to share the code or the probabilities it generated, and I've only looked at the output enough to be pretty sure it's working. Trained on half of the data and tested on the other half it was consistently able to link accounts, however, and it was also able to identify at least one non-public alt account I already knew about.
This is an example of a general problem with privacy: even if something seems careful enough now, you can't trust the future to keep things private.
(If you do want somewhat more protection now, however, I think best practice is running your alt comments through an LLM to change the style.)
I agree with this. I think sometimes people are pretty clueless. E.g. people post under their first name and use the same IP. (There is at least one very similar recent example, but I can’t link to it.)
I think that a PSA about accounts on LW/EAF/the internet often not being as anonymous as people think could be good, and should mention stylometry, internet archives, timezones, IP addresses, user agents, browser storage; and suggest using TOR, making a new account for every post/comment, scheduling messages at random times, running comments through LLMs, not using your name or leaking information in other ways, and considering using deliberate disinformation (e.g. pretending to be of the opposite gender, scheduling messages to appear to be in a different timezone, …)
I think this is a very good point.
I’m not sure about this. I think you could have written that there are very easy ways to deanonymize users, so people who really care about their anonymity should do the things I mentioned above?
Possibly, I think I might be less optimistic that people can/will, in practice, start changing their posting habits. And probably I think it’s more likely that this post lowers the barrier for an adversarial actor to actively deanonymize people. It reminds me a bit of the tradeoffs you mentioned in your previous post on security culture.
I think it was a good call not to post reproducible code for this, for example, although it might have made it clearer how easy it is and strengthened the value of the demonstration.
Thank you for this, and I do trust you. On some level, anonymous users already had to trust you before this project, since it’s clearly something anyone with some basic coding experience would be able to do if they wanted, but I think now they need to trust you a tiny bit more, since you now just need to press a button instead of spending a few minutes/hours actively working on it.
In any case, I don’t feel strongly about this, and I don’t think it’s important, but I still think that, compared to an informative post without a demonstration, this post increases the probability that an adversarial actor deanonymizes people slightly more than the probability that anonymous users are protected from similar attacks. (Which often are even less sophisticated)