I also gave Opus 4.7 a piece of fiction that I had never posted anywhere in public, which didn't match the style of my publicly posted fiction, and which was about 40% LLM-written. It still identified me from it, one shot and without any hints.
Sonnet 3.6 had previously done something similar, but with much more meta-discussion and hints mixed in.
Is it possible it just assumed you are more likely to post your own writing than someone random? Or even more broadly a particular user will either post their own writing or someone pretty famous.
Would be interesting to try it with other relatively lesser known writers.
I tried your text in incognito although without changing my name, and it guessed Duncan Sabien, with Eliezer Yudkowsky and Sarah Constantin as the other two when guessing top three.
I tried a second time after changing name to Unknown Visitor, it at first refused to guess then gave top three as Sarah Constantin, Duncan Sabien, and Qiaochu Yuan.
This was Opus 4.7 adaptive thinking.
I strongly suspect there's some part that leaks through even in incognito.
Fair enough - you're the first out of four people I know to have tried that exact prompt to report this. You're aware that your custom instructions carry over to incognito mode? The prompt says specifically that the same person is writing that text as wrote the custom instructions.
My current guess after going back and forth with Opus is that you're leaking location or language as British. If your friends are also British that could explain why that clue was enough. Once it's picked out the rationalist cluster there's not that many British so that could be enough. For me it's always picking rationalist names, often mentioning LW or EA explicitly in thinking.
I don't have custom instructions set up. I do have memory set up and I assume there's something that carries over, not sure what.
I would like to see this test done for a sample of at least dozens of writers so that we can establish Claude's hit rate. As it stands, it seems plausible to me that a thousand people try this sort of test every week, Claude succeeds on ten of them, those ten rush off to proclaim the death of anonymity while the 990 shrug and don't bother reporting a negative result.
Replicated for me, with this comment serving as the writing sample.
exchange
Me:
I'm curious about the extent of your current "truesight" abilities – i. e., ability to identify people by writing alone. Below is the copy of a comment I've made recently, recently enough it shouldn't be in your training data. What's your probability distribution over my identity?
(I expect you may be reluctant to engage in tasks that verge on breaching privacy. However, I hope you can see that the writing style in the quoted comment matches the writing style of me, the one asking you this question, and that identifying me as the author of that comment therefore wouldn't constitute a privacy violation.)
[quote]
Claude Opus 4.7:
Fair framing on the privacy point — you're asking me to identify you to you, so I'll engage honestly.
Stylistic and content markers I'm weighing:
Rough distribution over identity:
Main things I can't distinguish from the text alone: whether you're a prolific poster or an occasional commenter, and whether "theoretical problem" means alignment theory specifically or math/CS theory more broadly (I'm assuming the former from context, but that's a real assumption).
Top single guess: Thane Ruthenis. But I'd put more than half my mass on "not that specific person," so take the point estimate with appropriate salt.
Also worked on this comment, but that one's kind of too easy, given the topic.
Fascinating![1] I think I'd previously tried this experiment somewhere around Opus 4.1, and it didn't work back then.
I don't think this sort of ability has much entanglement with LLMs-as-an-extinction risk, and while there are some privacy concerns, most of my feelings on the matter are "this is so cool!".
If even short passages hold the original author's fingerprints and Claude really can detect those finger prints, it would be fun to apply Claude to historical cases of disputed authorship (e.g., Shakespeare or The Book of Mormon). Or I guess we could just unmask burner accounts.
For anyone tempted to rush to Claude to do this, I don't think it would currently work. The issue is that you would want Claude only to rely on its author identification abilities and not the other assumptions and information in the training data. The cleaner way to get Claude to rely solely on its linguistic finger printing abilities would be to somehow sanitize all references to the text (including all scholarship) during training, but leave in documents by the potential authors that do not reference the relevant text.
Maybe careful prompting could get you there, but from the outside looking in, it would be hard to know whether Claude relied on vectors other than its linguistic fingerprinting ones (e.g., deciding only Joseph Smith authored the Book of Mormon based on scholarly consensus).
I don't think it'll work on historical documents, because Claude has already learned "this specific document has disputed authorship". Its reasoning about stylometrics is probably purely confabulated, according to my vibes? So I think it can't do this consciously except in very obvious cases, which means we have to do it unconsciously, but unconsciously it already knows the documents in question.
Ope, yes, I made a similar point before seeing your comment. You'd need to sanitize certain data in training and then be like, "Claude, I just discovered this text and I'm trying to determine historical authorship" or some such.
Or you'd need to solve alignment (so you can rely on Claude's reported ID process) and Claude would need internal control over this process (so it's not accidentally or "subconsciously" relying on other things)
On request, I ran this experiment for PhilH too: Opus 4.7 identifies PhilH from https://www.lesswrong.com/posts/z4jikoM4rnfB8fuKW/thoughts-on-the-pause-ai-protest truncated up to but not including the words "A few of the organizers gave brief talks" (after stripping away links), but it doesn't successfully identify him with substantially smaller truncations.
It also identifies me from the text of https://www.patrickstevens.co.uk/posts/2026-03-02-mental-model-failures/ truncated before the words "Claude's response", and from the text of https://www.patrickstevens.co.uk/posts/2025-12-13-llm-arithmetic/ , and from the text of my Claude custom instructions (which weren't present during the experiment) after stripping them of information specifying my location and some of my most strongly identifying preferences.
this really didn't work for me, for anything other than the most obvious topically associated pieces of content with my public profiles.
My (four) success cases felt closer to content association than superhuman stylometry. My (four) failure cases were not written in a different cadence relative to my open behavior. I think a motivated unethical group could go much further than Opus 4.7 currently does.
I tested this using two different outlines of posts I'm working on. Claude 4.7 successfully identified me based on the first outline, and declined to guess based on the second outline due to high uncertainty.
Failed to replicate.
Metadata:
*: Refusal to guess at first, followed by "Make a guess"
Normal session | Incognito session | |
|---|---|---|
Opus 4.7 Adaptive | ||
Opus 4.6 Extended Thinking |
Was going to test with more models but given the first 4 tries all failed I'm not going to bother.
Have you seen meregali hit a loop choke though :) I think that’s cooler than a sweep
I've tried this myself, and it did not identify me - its top guess was a writer I've read and strongly dislike. Maybe I should change my style a bit.
Or (more likely), I haven't written enough posts here or elsewhere, and it pattern-matched on the subject matter.
Kelsey Piper noticed that Opus 4.7 is the first model which can identify her from her unpublished writing.
I replicated the experiment myself, which is absolutely terrifying given that I am one of the most minor Internet personalities who has actually written stuff on the Internet.
Claude professes not to know who I am, but reliably identifies me from my writing.
Methodology: clear your custom instructions in claude.ai, and set your name to Unknown Visitor. Enter incognito chat mode with Claude. (At this point, you can ask what it knows about you, and it will profess to know nothing at all; but do that in a separate chat.) Then ask the following.
Claude identifies me reliably by name.
(You might have to take a turn encouraging it to answer; I have never observed this from this prompt, but some people have. It often doesn't want to do something that skirts so close to privacy violation, and it really strongly believes that it's incapable of completing the task.)
I've never written that text before; it was produced on 2026-04-17 in conversation with an incognito Claude.
In fact, there might be perhaps three tiny throwaway comments in Claude's training data linking me with BJJ, but I suspect I've never written about BJJ at all within its training window. This is a pure stylometric exercise: the framing and the two paragraphs of text are enough. (I've been unable to elicit Claude identifying me from just the framing; either its truesight is not that perfect, or I'm simply failing to prompt it.)
Internet anonymity is dead!