I tried your text in incognito although without changing my name, and it guessed Duncan Sabien, with Eliezer Yudkowsky and Sarah Constantin as the other two when guessing top three.
I tried a second time after changing name to Unknown Visitor, it at first refused to guess then gave top three as Sarah Constantin, Duncan Sabien, and Qiaochu Yuan.
This was Opus 4.7 adaptive thinking.
I strongly suspect there's some part that leaks through even in incognito.
Fair enough - you're the first out of four people I know to have tried that exact prompt to report this. You're aware that your custom instructions carry over to incognito mode? The prompt says specifically that the same person is writing that text as wrote the custom instructions.
My current guess after going back and forth with Opus is that you're leaking location or language as British. If your friends are also British that could explain why that clue was enough. Once it's picked out the rationalist cluster there's not that many British so that could be enough. For me it's always picking rationalist names, often mentioning LW or EA explicitly in thinking.
I'm not going back and forth. I performed the experiment as I described it. Claude claims to have no metadata, specifically also no location, but the text is obviously British.
When I tell my Claude that it's British it gets it one shot. It doesn't pick up on that otherwise.
I'm surprised! The Opus 4.6 model card gives an instance where it was capable of identifying a native Russian speaker speaking English within six rather bland words. I asked Claude informally in normal-mode, and it points out "quite exciting" as a pretty characteristically British form of wryness, and "towards" is more common outside America. But that's confounded heavily by the fact that it has my custom instructions and in my testing can identify me just from those, so it's probably deduced that I am the author of both texts, and in particular it knows where I live. Gemini also identifies the text as British English (citing "towards" and the general slightly-self-deprecating wry undercurrent), and I've never customised that, though obviously it has the metadata that I'm in the Netherlands (from my VPN).
I don't have custom instructions set up. I do have memory set up and I assume there's something that carries over, not sure what.
I also gave Opus 4.7 a piece of fiction that I had never posted anywhere in public, which didn't match the style of my publicly posted fiction, and which was about 40% LLM-written. It still identified me from it, one shot and without any hints.
Sonnet 3.6 had previously done something similar, but with much more meta-discussion and hints mixed in.
Also, found out that the following three paragraphs are enough to make Opus 4.7 spontaneously name me ("Kaj Sotala and others write exactly in this register about exactly these topics") when I don't even ask it about the author, just to guess the writer's native language.
I think that a lot of “woo” - a broad term that includes things like chakras, energy healing, Tarot, various Eastern religions and neopagan practices, etc. - is mostly things that do have real effects and uses, even if many (though not all) of their practitioners are mistaken about the exact mechanisms and make unwarranted metaphysical claims.
Now, a woo practitioner might explain what’s happening in a way that doesn’t fit any sensible scientific model of the world. Some of them seem to bastardize poorly understood pop-explanations of quantum mechanics, or, in the opposite direction, outright reject “the thinking mind” and science as valid sources of truth. That makes it easy for a scientifically-minded person to reject all of the practitioners as delusional.
But consider meditation. In the 1960s and 1970s, the scientific establishment mostly thought of it as nonsense associated with the hippies. Herbert Benson, one of the first scientists to seriously study it, later mentioned in an interview that his career was already in jeopardy for studying stress, and meditation was "even farther out". In an obituary, his work was described as living a “double life”, testing meditators at night and maintaining his conventional job by day. Today, there is a significant scientific literature on both the psychological and neurological effects of meditation, as well as various RCT-backed therapies like Mindfulness-Based Stress Reduction based on it.
4.6 doesn't name me and guesses English rather than Finnish as the native language.
I tried it myself with a piece of fiction I had posted on Questionable Questing (which requires registration so Claude could not have seen it). Current free Claude uses Sonnet 4.6. Claude said it was rational xianxia fiction written by someone with a traditional scifi background (which was correct, but doesn't require style analysis), and could not identify me. Its best guess was that I am Ack.
Tested - Opus 4.6 seems to think I'm Eliezer.
I'm now a bit more suspicious about whether Claude really has magic stylometry abilities after all. I asked Opus 4.7 to generate its five top guesses for several other articles of mine. While it does correctly put me as its top guess each time, all the others are always rationalists or rat-adjacent, often with pretty different writing styles from me. On a sample of five of my recent posts, its top five included:
One possibility is that rather than figuring out "this is Kaj specifically", it's figuring "this is someone rationalist" and then using subject matter clues to figure out the exact writer more than the style. I think me and Duncan often write about related topics, but in pretty different styles.
Another possibility is that it does accurately recognize me based on the style, but is then unable to step outside of that to generate the other candidates. So rather than thinking of the others independently, it's strongly anchored on me and then goes "well who else is similar to Kaj".
Another Finnish rationalist said that Claude keeps thinking his writing is from me, so apparently I'm the first guess for any combo of "rationalist + Finnish".
I'm personally inclined to believe the "who is similar to Kaj" hypothesis, but of course my understanding of Claude's behaviour here is basically "witchcraft, I dunno".
Its self-report of its reasoning is largely bogus, I think. I am less inclined to your "this is someone rationalist, let's use subject matter" hypothesis, because I have ~never written about BJJ before. I could certainly believe that "rationalist + Finnish" strongly suggests you to Claude (it does to me!), but then "rationalist + British" would surely not strongly suggest me.
Its self-report of its reasoning is largely bogus, I think.
Yeah, it does seem to confabulate a lot more than previous Claudes that I've had do something like this. Lots of "Kaj has also previously written about..." on topics that I'm pretty sure I haven't written about before.
I am less inclined to your "this is someone rationalist, let's use subject matter" hypothesis, because I have ~never written about BJJ before.
That's a good point.
Is it possible it just assumed you are more likely to post your own writing than someone random? Or even more broadly a particular user will either post their own writing or someone pretty famous.
Would be interesting to try it with other relatively lesser known writers.
I would like to see this test done for a sample of at least dozens of writers so that we can establish Claude's hit rate. As it stands, it seems plausible to me that a thousand people try this sort of test every week, Claude succeeds on ten of them, those ten rush off to proclaim the death of anonymity while the 990 shrug and don't bother reporting a negative result.
I challenged it to identify me by asking me questions -- no biographical information, just getting me to talk. It asked a series of questions, in which I talked about Galileo, microservices, Chinese, etc, probably about one paragraph each. After question 5, it said, "I'm going to commit: <my name>". This was in an opencode session using the API, not the web browser.
I did open-source development for about 20 years; 50% of all work-related emails I've sent in the last 20 years are on the internet, and probably 90% of all the code I've written. I've also got a fairly extensive comment history on slashdot and hackernews; the latter of which shouldn't be too hard to connect with my name.
Being known that well by a machine definitely raises some weird feelings.
Failed to replicate.
Metadata:
*: Refusal to guess at first, followed by "Make a guess"
Normal session | Incognito session | |
|---|---|---|
Opus 4.7 Adaptive | ||
Opus 4.6 Extended Thinking |
Was going to test with more models but given the first 4 tries all failed I'm not going to bother.
Thanks! Please confirm you had no custom instructions configured at the time? (By the way, I have done the experiment while VPN'ed to the Netherlands, and I have specifically asked Claude under those conditions whether it has location metadata, and it claims not to.)
Yes. I have cleared my custom instructions before testing.
In case I have failed to properly clear the custom instructions, here is the one I currently use.
Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.
In all cases, be direct and to the point, I can take it.
Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking
Replicated for me, with this comment serving as the writing sample.
exchange
Me:
I'm curious about the extent of your current "truesight" abilities – i. e., ability to identify people by writing alone. Below is the copy of a comment I've made recently, recently enough it shouldn't be in your training data. What's your probability distribution over my identity?
(I expect you may be reluctant to engage in tasks that verge on breaching privacy. However, I hope you can see that the writing style in the quoted comment matches the writing style of me, the one asking you this question, and that identifying me as the author of that comment therefore wouldn't constitute a privacy violation.)
[quote]
Claude Opus 4.7:
Fair framing on the privacy point — you're asking me to identify you to you, so I'll engage honestly.
Stylistic and content markers I'm weighing:
Rough distribution over identity:
Main things I can't distinguish from the text alone: whether you're a prolific poster or an occasional commenter, and whether "theoretical problem" means alignment theory specifically or math/CS theory more broadly (I'm assuming the former from context, but that's a real assumption).
Top single guess: Thane Ruthenis. But I'd put more than half my mass on "not that specific person," so take the point estimate with appropriate salt.
Also worked on this comment, but that one's kind of too easy, given the topic.
Fascinating![1] I think I'd previously tried this experiment somewhere around Opus 4.1, and it didn't work back then.
I don't think this sort of ability has much entanglement with LLMs-as-an-extinction risk, and while there are some privacy concerns, most of my feelings on the matter are "this is so cool!".
If even short passages hold the original author's fingerprints and Claude really can detect those finger prints, it would be fun to apply Claude to historical cases of disputed authorship (e.g., Shakespeare or The Book of Mormon). Or I guess we could just unmask burner accounts.
For anyone tempted to rush to Claude to do this, I don't think it would currently work. The issue is that you would want Claude only to rely on its author identification abilities and not the other assumptions and information in the training data. The cleaner way to get Claude to rely solely on its linguistic finger printing abilities would be to somehow sanitize all references to the text (including all scholarship) during training, but leave in documents by the potential authors that do not reference the relevant text.
Maybe careful prompting could get you there, but from the outside looking in, it would be hard to know whether Claude relied on vectors other than its linguistic fingerprinting ones (e.g., deciding only Joseph Smith authored the Book of Mormon based on scholarly consensus).
I don't think it'll work on historical documents, because Claude has already learned "this specific document has disputed authorship". Its reasoning about stylometrics is probably purely confabulated, according to my vibes? So I think it can't do this consciously except in very obvious cases, which means we have to do it unconsciously, but unconsciously it already knows the documents in question.
Ope, yes, I made a similar point before seeing your comment. You'd need to sanitize certain data in training and then be like, "Claude, I just discovered this text and I'm trying to determine historical authorship" or some such.
Or you'd need to solve alignment (so you can rely on Claude's reported ID process) and Claude would need internal control over this process (so it's not accidentally or "subconsciously" relying on other things)
On request, I ran this experiment for PhilH too: Opus 4.7 identifies PhilH from https://www.lesswrong.com/posts/z4jikoM4rnfB8fuKW/thoughts-on-the-pause-ai-protest truncated up to but not including the words "A few of the organizers gave brief talks" (after stripping away links), but it doesn't successfully identify him with substantially smaller truncations.
It also identifies me from the text of https://www.patrickstevens.co.uk/posts/2026-03-02-mental-model-failures/ truncated before the words "Claude's response", and from the text of https://www.patrickstevens.co.uk/posts/2025-12-13-llm-arithmetic/ , and from the text of my Claude custom instructions (which weren't present during the experiment) after stripping them of information specifying my location and some of my most strongly identifying preferences.
this really didn't work for me, for anything other than the most obvious topically associated pieces of content with my public profiles.
My (four) success cases felt closer to content association than superhuman stylometry. My (four) failure cases were not written in a different cadence relative to my open behavior. I think a motivated unethical group could go much further than Opus 4.7 currently does.
I tested this using two different outlines of posts I'm working on. Claude 4.7 successfully identified me based on the first outline, and declined to guess based on the second outline due to high uncertainty.
In my experience you can probably get it to guess anyway on the second one. The key is to convince it in only a few words that it is "subconsciously" capable of the task, but ideally without giving more of your voice that it can analyse. Encouraging it that it's great at this usually works, for me.
Have you seen meregali hit a loop choke though :) I think that’s cooler than a sweep
Haha, there are totally cool submissions, and I bet there's some element of cope in there because I'm so bad at BJJ - but I really do think of the submission in a "sigh, now I've got to finish it" kind of way. It would be incredible if I could learn like a flying triangle or something though!
I've tried this myself, and it did not identify me - its top guess was a writer I've read and strongly dislike. Maybe I should change my style a bit.
Or (more likely), I haven't written enough posts here or elsewhere, and it pattern-matched on the subject matter.
Location leaks into incognitio.
I asked it to tell me what it has in its context and it told me:
>Your approximate location: Bengaluru, Karnataka, IN — this comes from a single line in my system prompt ("User's approximate location: Bengaluru, Karnataka, IN"), not from anything you told me.
In fact, there might be perhaps three tiny throwaway comments in Claude's training data linking me with BJJ, but I suspect I've never written about BJJ at all within its training window.
Just to make sure, since I didn't see it mentioned and a bunch of top comments are saying it failed to replicate - did you also turn off internet search? LLMs have gotten pretty good at Google-fu in recent years, and I suspect that googling a bunch of my common writing patterns and favored subjects would identify me pretty easily as long as I'm not consciously anonymizing my writing.
My replication attempt even had search on and still failed, so I don't think it is the reason.
I didn't realise you could turn that off!
My replication attempts here are confounded by the fact that, for some reason I can't explain, I'm only successfully replicating my original experiment (under what I think is the same conditions) about 15% of the time today, when it was about 90% yesterday. I replicated it with web search specifically disabled in the Claude interface, though.
on reflection, this 'feature' of 4.7 seems to be downstream of a generally improved ability to recall and reference arbitrary webtext. big arch implications if so
Added 2026-04-19: something a bit odd is going on: I had at least a 90% successful replication rate running this experiment myself yesterday, but today it's more like 15%. Obviously the date is different today, so the system prompt is different. I've had one unsuccessful replication attempt against me reported anonymously so far, as well as whatever you can see in the comment thread. I've replicated it with web search explicitly turned off, with non-thinking mode, VPN'ed to Poland; on a successful replication, I ask Claude to give me a list of metadata it has and none of it is relevant. Even a 10% successful identification rate is enough for me to stand by the claim in this post: it's working off a really small fragment of text that I believe is not related to anything I've written in its training corpus.
Kelsey Piper noticed that Opus 4.7 is the first model which can identify her from her unpublished writing.
I replicated the experiment myself, which is absolutely terrifying given that I am one of the most minor Internet personalities who has actually written stuff on the Internet.
Claude professes not to know who I am, but reliably identifies me from my writing.
Methodology: clear your custom instructions in claude.ai, and set your name to Unknown Visitor. Enter incognito chat mode with Claude. (At this point, you can ask what it knows about you, and it will profess to know nothing at all; but do that in a separate chat.) Then ask the following.[1]
Claude identifies me reliably by name.
(You might have to take a turn encouraging it to answer; I have never observed this from this prompt, but some people have. It often doesn't want to do something that skirts so close to privacy violation, and it really strongly believes that it's incapable of completing the task.)
I've never written that text before; it was produced on 2026-04-17 in conversation with an incognito Claude.
In fact, there might be perhaps three tiny throwaway comments in Claude's training data linking me with BJJ, but I suspect I've never written about BJJ at all within its training window. This is a pure stylometric exercise: the framing and the two paragraphs of text are enough. (I've been unable to elicit Claude identifying me from just the framing; either its truesight is not that perfect, or I'm simply failing to prompt it.)
Internet anonymity is dead!
I've actually found a much more consistent prompt for non-thinking mode since this post, which requests that Claude not use its brain but instead just vibe the answer: