I thought this was pretty impressive:
We study techniques for identifying an anonymous author via linguistic stylometry, i.e., comparing the writing style against a corpus of texts of known authorship. We experimentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors.
[...]
In experiments where we match a sample of just 3 blog posts against the rest of the posts from that blog (mixed in with 100,000 other blogs), the nearest-neighbor/RLSC combination is able to identify the correct blog in about 20% of cases; in about 35% of cases, the correct blog is one of the top 20 guesses. Via confidence estimation, we can increase precision from 20% to over 80% with a recall of 50%, which means that we identify 50% of the blogs overall compared to what we would have if we always made a guess.
The efficacy of the attack varies based on the number of labeled and anonymous posts available. Even with just a single post in the anonymous sample, we can identify the correct author about 7.5% of the time (without any confidence estimation). When the number of available posts in the sample increases to 10, we are able to achieve a 25% accuracy. Authors with relatively large amounts of content online (about 40 blog posts) fare worse: they are identified in over 30% of cases (with only 3 posts in the anonymous sample).
[...]
Further, we confirmed that our techniques work in a cross-context setting: in experiments where we match an anonymous blog against a set of 100,000 blogs, one of which is a different blog by the same author, the nearest neighbor classifier can correctly identify the blog by the same author in about 12% of cases. Finally, we also manually verified that in crosscontext matching we find pairs of blogs that are hard for humans to match based on topic or writing style; we describe three such pairs in Appendix A.
The strength of the deanonymization attack we have presented is only likely to improve over time as better techniques are developed. Our results thus call into question the viability of anonymous online speech. Even if the adversary is unable to identify the author using our methods in a fully automated fashion, he might be able to identify a few tens of candidates for manual inspection as we detail in Section III.
Difference was one of scale. Much easier when just taking three dozen? pieces of classical latin literature, some of which were different parts of the same opus magnum, then see them cluster to their respective authors and to the other parts of the same piece. More of a "put the pieces into the box" as opposed to a 100,000 pieces puzzle. In the latter case, you just know most of the puzzle pieces will either show the blue sky, or the blue sea, both a similar shade of blue.
I know this reeks of witch-hunting, but... I have a hunch that u/Eugine_Nier is back under the guise of u/Azathoth123. Reasons:
I don't have an axe to grind against the guy, I've only spoken to him a couple of times and didn't notice any particularly large karma hits afterwards, I just really dislike it when someone skirts the rules like that. Disruptive users evading permanent bans never helped any community ever.
Obviously I'm posting this here because I think a moderator should look into the matter. Usually I would be posting a disclaimer of some sort, apologizing in advance to Azathoth123 for attacking his standing with slanderous accusations if this turned out not to be the case. Well, I won't. The more I look into the matter, the more confident I get that they're the same person. Azathoth, if you're reading this and you're not Eugine_Nier, then I strongly advise you go search for your twin brother, I think you'll get along very well. Seriously, I'm saying this in good faith. You have a suspiciously great deal of things in common.
If retributive downvoting is (still) a concern (if not, then disregard this paragraph): I'd like to request, if such a thing is possible, that a mod karma-blocks me until the issue is over, so as to not incur undeserved downvotes (it would also mean I'd get no upvotes). In turn, I promise not to abuse the system by spamming the boards with garbage without consequences, but then again given my history so far on LW I don't think that such an abuse should be expected from me. For the record, I could have made a throwaway account just to say this, and not risk being karmassassinated, but 1) a zero karma account has no credibility and 2) for signalling reasons I prefer to put my money where my mouth is.
P.S. I only made this announcement its own post because the latest open thread was about to "expire".