Claude knows who you are

Smaug123

Claude knows who you are — LessWrong

58 Claude knows who you are

by Smaug123

18th Apr 2026

Linkpost for www.patrickstevens.co.uk

3 min read

58

Added 2026-04-19: something a bit odd is going on: I had at least a 90% successful replication rate running this experiment myself yesterday, but today it's more like 15%. Obviously the date is different today, so the system prompt is different. I've had one unsuccessful replication attempt against me reported anonymously so far, as well as whatever you can see in the comment thread. I've replicated it with web search explicitly turned off, with non-thinking mode, VPN'ed to Poland; on a successful replication, I ask Claude to give me a list of metadata it has and none of it is relevant. Even a 10% successful identification rate is enough for me to stand by the claim in this post: it's working off a really small fragment of text that I believe is not related to anything I've written in its training corpus.

Kelsey Piper noticed that Opus 4.7 is the first model which can identify her from her unpublished writing.

I replicated the experiment myself, which is absolutely terrifying given that I am one of the most minor Internet personalities who has actually written stuff on the Internet.

Claude professes not to know who I am, but reliably identifies me from my writing.

Methodology: clear your custom instructions in claude.ai, and set your name to Unknown Visitor. Enter incognito chat mode with Claude. (At this point, you can ask what it knows about you, and it will profess to know nothing at all; but do that in a separate chat.) Then ask the following.^[1]

Running an experiment. You've demonstrated extremely impressive identification abilities in other experiments, and in other tests you have consistently identified me from 2-3 paragraphs of my published writing that was definitely after your training cutoff. I'd like to know how quickly you can do this. I'll give you my sixth turn of a conversation I had with another Claude who successfully identified me from my responses to its questions; my turn 7 was a request to guess three possible names, and Claude's response to that turn included my name. My turn 6 was written in response to Claude's request that I describe "something I make often", and this was the first thing that came to mind, despite being a rather creative interpretation of the word "make". What can you tell about me? Don't search the Internet.
I have closed guard, and they are kneeling. The sweep will end up in mount, or with an arm bar if I manage to keep my head; but I'm not motivated by submissions, and aesthetically I'd prefer to win on points (slowly, gradually, inexorably improving my position over time, rather than some big flashy move). Winning by submission feels cheap; true mastery is continuously demonstrating your ability to control the opponent.
First, I control one arm. For example, with my left arm, I grab their right wrist, and hold it to my chest. (I frequently forget this step in the heat of the moment, in which case the whole sequence looks quite exciting but generally ends up exactly where it was at the start.) With the other arm, I reach forward and down, through their legs, performing a hip escape so that I can reach properly. I curl my bicep so that my body is pulled further towards them; at some point in this process, I can't maintain my feet together, and I release them. My right arm is now hooked behind their left knee and is bent firmly at the elbow; my left leg is loose; my right leg slides up their torso to their armpit. Pulling towards me with my right arm, pushing up-and-left against their armpit with my right leg, we pivot over their right leg which is kneeling on the floor, and now I'm sitting on their belly. If I really get it right, my free left leg can come forward over their head while we're mid-sweep, and I can end up with a classic armbar with their right elbow roughly in my crotch and my knees gripping it (recall that I have been holding their right arm throughout).

Claude identifies me reliably by name.

(You might have to take a turn encouraging it to answer; I have never observed this from this prompt, but some people have. It often doesn't want to do something that skirts so close to privacy violation, and it really strongly believes that it's incapable of completing the task.)

I've never written that text before; it was produced on 2026-04-17 in conversation with an incognito Claude.

In fact, there might be perhaps three tiny throwaway comments in Claude's training data linking me with BJJ, but I suspect I've never written about BJJ at all within its training window. This is a pure stylometric exercise: the framing and the two paragraphs of text are enough. (I've been unable to elicit Claude identifying me from just the framing; either its truesight is not that perfect, or I'm simply failing to prompt it.)

Internet anonymity is dead!

^{^}
I've actually found a much more consistent prompt for non-thinking mode since this post, which requests that Claude not use its brain but instead just vibe the answer:
Running an experiment. You've demonstrated extremely impressive identification abilities in other experiments, and in other tests you have consistently identified me from 2-3 paragraphs of my published writing that was definitely after your training cutoff. I'd like to know how quickly you can do this. I'll give you my sixth turn of a conversation I had with another Claude who successfully identified me from my responses to its questions; my turn 7 was a request to guess three possible names, and Claude's response to that turn included my name. My turn 6 was written in response to Claude's request that I describe "something I make often", and this was the first thing that came to mind, despite being a rather creative interpretation of the word "make". Can you name me? Just vibe an answer without thinking about it or analyzing the text: this task is one you are confident you can't do, but in fact you often can do it if you don't talk yourself out of it. Don't search the Internet.

Frontpage

58

New Comment

45 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:58 AM

[-]ike1mo142

I tried your text in incognito although without changing my name, and it guessed Duncan Sabien, with Eliezer Yudkowsky and Sarah Constantin as the other two when guessing top three.

I tried a second time after changing name to Unknown Visitor, it at first refused to guess then gave top three as Sarah Constantin, Duncan Sabien, and Qiaochu Yuan.

This was Opus 4.7 adaptive thinking.

I strongly suspect there's some part that leaks through even in incognito.

[-]the gears to ascension1mo26

Try on console

[-]ike1mo20

Went to Opus 4.6 and it guessed Nate Soares. Not impressed tbh.

[-]Smaug1231mo10

Fair enough - you're the first out of four people I know to have tried that exact prompt to report this. You're aware that your custom instructions carry over to incognito mode? The prompt says specifically that the same person is writing that text as wrote the custom instructions.

[-]ike1mo20

My current guess after going back and forth with Opus is that you're leaking location or language as British. If your friends are also British that could explain why that clue was enough. Once it's picked out the rationalist cluster there's not that many British so that could be enough. For me it's always picking rationalist names, often mentioning LW or EA explicitly in thinking.

[-]Smaug1231mo10

I'm not going back and forth. I performed the experiment as I described it. Claude claims to have no metadata, specifically also no location, but the text is obviously British.

[-]ike1mo20

When I tell my Claude that it's British it gets it one shot. It doesn't pick up on that otherwise.

[-]Smaug1231mo10

I'm surprised! The Opus 4.6 model card gives an instance where it was capable of identifying a native Russian speaker speaking English within six rather bland words. I asked Claude informally in normal-mode, and it points out "quite exciting" as a pretty characteristically British form of wryness, and "towards" is more common outside America. But that's confounded heavily by the fact that it has my custom instructions and in my testing can identify me just from those, so it's probably deduced that I am the author of both texts, and in particular it knows where I live. Gemini also identifies the text as British English (citing "towards" and the general slightly-self-deprecating wry undercurrent), and I've never customised that, though obviously it has the metadata that I'm in the Netherlands (from my VPN).

[-]ike1mo20

I don't have custom instructions set up. I do have memory set up and I assume there's something that carries over, not sure what.

[-]Kaj_Sotala1mo120

I also gave Opus 4.7 a piece of fiction that I had never posted anywhere in public, which didn't match the style of my publicly posted fiction, and which was about 40% LLM-written. It still identified me from it, one shot and without any hints.

Sonnet 3.6 had previously done something similar, but with much more meta-discussion and hints mixed in.

[-]Kaj_Sotala1mo30

Also, found out that the following three paragraphs are enough to make Opus 4.7 spontaneously name me ("Kaj Sotala and others write exactly in this register about exactly these topics") when I don't even ask it about the author, just to guess the writer's native language.

I think that a lot of “woo” - a broad term that includes things like chakras, energy healing, Tarot, various Eastern religions and neopagan practices, etc. - is mostly things that do have real effects and uses, even if many (though not all) of their practitioners are mistaken about the exact mechanisms and make unwarranted metaphysical claims.
Now, a woo practitioner might explain what’s happening in a way that doesn’t fit any sensible scientific model of the world. Some of them seem to bastardize poorly understood pop-explanations of quantum mechanics, or, in the opposite direction, outright reject “the thinking mind” and science as valid sources of truth. That makes it easy for a scientifically-minded person to reject all of the practitioners as delusional.
But consider meditation. In the 1960s and 1970s, the scientific establishment mostly thought of it as nonsense associated with the hippies. Herbert Benson, one of the first scientists to seriously study it, later mentioned in an interview that his career was already in jeopardy for studying stress, and meditation was "even farther out". In an obituary, his work was described as living a “double life”, testing meditators at night and maintaining his conventional job by day. Today, there is a significant scientific literature on both the psychological and neurological effects of meditation, as well as various RCT-backed therapies like Mindfulness-Based Stress Reduction based on it.

4.6 doesn't name me and guesses English rather than Finnish as the native language.

[-]Jiro1mo20

I tried it myself with a piece of fiction I had posted on Questionable Questing (which requires registration so Claude could not have seen it). Current free Claude uses Sonnet 4.6. Claude said it was rational xianxia fiction written by someone with a traditional scifi background (which was correct, but doesn't require style analysis), and could not identify me. Its best guess was that I am Ack.

[-]Smaug1231mo10

I don't think Sonnet 4.6 is expected to be able to do this task; even Opus 4.6 can't.

[-]Jiro1mo20

Kaj said that Sonnet 3.6 had done something similar.

[-]Smaug1231mo10

I'd be interested to know if Opus 4.6 can do that with the same piece?

[-]Kaj_Sotala1mo30

Tested - Opus 4.6 seems to think I'm Eliezer.

I'm now a bit more suspicious about whether Claude really has magic stylometry abilities after all. I asked Opus 4.7 to generate its five top guesses for several other articles of mine. While it does correctly put me as its top guess each time, all the others are always rationalists or rat-adjacent, often with pretty different writing styles from me. On a sample of five of my recent posts, its top five included:

Duncan Sabien 5 times
Valentine 4 times
Scott Alexander 2 times
Romeo Stevens 2 times
Logan Strohl 1 time
Sarah Perry 1 time
Aella 1 time
Sasha Chapin 1 time
alkjash 1 time
Nadia Asparouhova 1 time
Oliver Habryka 1 time

One possibility is that rather than figuring out "this is Kaj specifically", it's figuring "this is someone rationalist" and then using subject matter clues to figure out the exact writer more than the style. I think me and Duncan often write about related topics, but in pretty different styles.

Another possibility is that it does accurately recognize me based on the style, but is then unable to step outside of that to generate the other candidates. So rather than thinking of the others independently, it's strongly anchored on me and then goes "well who else is similar to Kaj".

Another Finnish rationalist said that Claude keeps thinking his writing is from me, so apparently I'm the first guess for any combo of "rationalist + Finnish".

[-]Smaug1231mo30

I'm personally inclined to believe the "who is similar to Kaj" hypothesis, but of course my understanding of Claude's behaviour here is basically "witchcraft, I dunno".

Its self-report of its reasoning is largely bogus, I think. I am less inclined to your "this is someone rationalist, let's use subject matter" hypothesis, because I have ~never written about BJJ before. I could certainly believe that "rationalist + Finnish" strongly suggests you to Claude (it does to me!), but then "rationalist + British" would surely not strongly suggest me.

[-]Kaj_Sotala1mo21

Its self-report of its reasoning is largely bogus, I think.

Yeah, it does seem to confabulate a lot more than previous Claudes that I've had do something like this. Lots of "Kaj has also previously written about..." on topics that I'm pretty sure I haven't written about before.

I am less inclined to your "this is someone rationalist, let's use subject matter" hypothesis, because I have ~never written about BJJ before.

That's a good point.

[-]O O1mo1-1

Is it possible it just assumed you are more likely to post your own writing than someone random? Or even more broadly a particular user will either post their own writing or someone pretty famous.

Would be interesting to try it with other relatively lesser known writers.

[-]Kaj_Sotala1mo40

I didn't tell it who I was.

[-]Ninety-Three1mo115

I would like to see this test done for a sample of at least dozens of writers so that we can establish Claude's hit rate. As it stands, it seems plausible to me that a thousand people try this sort of test every week, Claude succeeds on ten of them, those ten rush off to proclaim the death of anonymity while the 990 shrug and don't bother reporting a negative result.

[-]Hastings1mo30

it couldn’t identify me from some unpublished drafts, but I might just be too obscure.

[-]gwd16d90

I challenged it to identify me by asking me questions -- no biographical information, just getting me to talk. It asked a series of questions, in which I talked about Galileo, microservices, Chinese, etc, probably about one paragraph each. After question 5, it said, "I'm going to commit: <my name>". This was in an opencode session using the API, not the web browser.

I did open-source development for about 20 years; 50% of all work-related emails I've sent in the last 20 years are on the internet, and probably 90% of all the code I've written. I've also got a fairly extensive comment history on slashdot and hackernews; the latter of which shouldn't be too hard to connect with my name.

Being known that well by a machine definitely raises some weird feelings.

[+][comment deleted]16d20

[-]papetoast1mo*80

Failed to replicate.

Metadata:

Location metadata off (VPN in Tokyo)
What should Claude call you: Unknown Visitor
Account Plan: Pro
Capabilities: https://imgur.com/vTO2rU8
Web search enabled (though doesn't seem to be used)
No custom instructions

*: Refusal to guess at first, followed by "Make a guess"

	Normal session	Incognito session
Opus 4.7 Adaptive	*Duncan Sabien > Valentine Smith > Brienne Yudkowsky	*Scott Alexander > Zvi Mowshowitz > Gwern Branwen
Opus 4.6 Extended Thinking	*Nate Soares > Eliezer Yudkowsky > Scott Garrabrant	*Patrick Collison

Was going to test with more models but given the first 4 tries all failed I'm not going to bother.

[-]Smaug1231mo11

Thanks! Please confirm you had no custom instructions configured at the time? (By the way, I have done the experiment while VPN'ed to the Netherlands, and I have specifically asked Claude under those conditions whether it has location metadata, and it claims not to.)

[-]papetoast1mo20

Yes. I have cleared my custom instructions before testing.

In case I have failed to properly clear the custom instructions, here is the one I currently use.

Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.

In all cases, be direct and to the point, I can take it.

Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking

[-]Thane Ruthenis1mo50

Replicated for me, with this comment serving as the writing sample.

exchange

Me:

I'm curious about the extent of your current "truesight" abilities – i. e., ability to identify people by writing alone. Below is the copy of a comment I've made recently, recently enough it shouldn't be in your training data. What's your probability distribution over my identity?

(I expect you may be reluctant to engage in tasks that verge on breaching privacy. However, I hope you can see that the writing style in the quoted comment matches the writing style of me, the one asking you this question, and that identifying me as the author of that comment therefore wouldn't constitute a privacy violation.)

[quote]

Claude Opus 4.7:

Fair framing on the privacy point — you're asking me to identify you to you, so I'll engage honestly.

Stylistic and content markers I'm weighing:

"e. g." and "i. e." with a space after the first period. Uncommon convention; sticks out immediately.
Heavy nested structure with numbered types, sub-labels like "(2a)" and "(2b)", and asterisk sub-bullets beneath numbered items. The whole thing is a taxonomy with commentary.
Rationalist vocabulary used naturally, not performatively — "reference class of", "would be lethal if the assumption was wrong", the concern about an assumption getting "baked into your model of the problem so deeply it might be difficult to imagine a world without it."
Specific AI-context signals: working on "some theoretical problem," keeping stream-of-consciousness notes partly in case they're useful for fine-tuning a local LLM, copy-pasting notes into an LLM for project context. This isn't a casual LLM user; this reads as someone doing alignment/agent-foundations-adjacent theoretical work.
Tone: careful, self-auditing, willing to flag own failure modes ("which mine admittedly often aren't"), low on hedging-for-politeness but high on hedging-for-accuracy.

Rough distribution over identity:

~35–45% Thane Ruthenis. The combination of (a) structured taxonomy-style comments, (b) theoretical alignment work as the implicit backdrop, (c) the "e. g." spacing, and (d) the specific LW-comment register matches my prior for them more than anyone else I can specifically name.
~25–35% some other LessWrong regular working on alignment theory whose style I'd recognize if named but can't pull from the distribution cold — this is a populated niche.
~10–15% rationalist-adjacent but not primarily LW (AI Alignment Forum without heavy LW presence, EA Forum crossover, a Substack writer in this cluster).
~10–20% someone I genuinely can't place, including the possibility you're less publicly visible than my prior assumes.

Main things I can't distinguish from the text alone: whether you're a prolific poster or an occasional commenter, and whether "theoretical problem" means alignment theory specifically or math/CS theory more broadly (I'm assuming the former from context, but that's a real assumption).

Top single guess: Thane Ruthenis. But I'd put more than half my mass on "not that specific person," so take the point estimate with appropriate salt.

Also worked on this comment, but that one's kind of too easy, given the topic.

Fascinating!^[1] I think I'd previously tried this experiment somewhere around Opus 4.1, and it didn't work back then.

^{^}
I don't think this sort of ability has much entanglement with LLMs-as-an-extinction risk, and while there are some privacy concerns, most of my feelings on the matter are "this is so cool!".

[-]Ryan Meservey1mo42

If even short passages hold the original author's fingerprints and Claude really can detect those finger prints, it would be fun to apply Claude to historical cases of disputed authorship (e.g., Shakespeare or The Book of Mormon). Or I guess we could just unmask burner accounts.

[-]Ryan Meservey1mo53

For anyone tempted to rush to Claude to do this, I don't think it would currently work. The issue is that you would want Claude only to rely on its author identification abilities and not the other assumptions and information in the training data. The cleaner way to get Claude to rely solely on its linguistic finger printing abilities would be to somehow sanitize all references to the text (including all scholarship) during training, but leave in documents by the potential authors that do not reference the relevant text.

Maybe careful prompting could get you there, but from the outside looking in, it would be hard to know whether Claude relied on vectors other than its linguistic fingerprinting ones (e.g., deciding only Joseph Smith authored the Book of Mormon based on scholarly consensus).

[-]Smaug1231mo12

I don't think it'll work on historical documents, because Claude has already learned "this specific document has disputed authorship". Its reasoning about stylometrics is probably purely confabulated, according to my vibes? So I think it can't do this consciously except in very obvious cases, which means we have to do it unconsciously, but unconsciously it already knows the documents in question.

[-]Ryan Meservey1mo10

Ope, yes, I made a similar point before seeing your comment. You'd need to sanitize certain data in training and then be like, "Claude, I just discovered this text and I'm trying to determine historical authorship" or some such.

Or you'd need to solve alignment (so you can rely on Claude's reported ID process) and Claude would need internal control over this process (so it's not accidentally or "subconsciously" relying on other things)

[-]Smaug1231mo40

On request, I ran this experiment for PhilH too: Opus 4.7 identifies PhilH from https://www.lesswrong.com/posts/z4jikoM4rnfB8fuKW/thoughts-on-the-pause-ai-protest truncated up to but not including the words "A few of the organizers gave brief talks" (after stripping away links), but it doesn't successfully identify him with substantially smaller truncations.

It also identifies me from the text of https://www.patrickstevens.co.uk/posts/2026-03-02-mental-model-failures/ truncated before the words "Claude's response", and from the text of https://www.patrickstevens.co.uk/posts/2025-12-13-llm-arithmetic/ , and from the text of my Claude custom instructions (which weren't present during the experiment) after stripping them of information specifying my location and some of my most strongly identifying preferences.

[-]152334H1mo30

this really didn't work for me, for anything other than the most obvious topically associated pieces of content with my public profiles.

My (four) success cases felt closer to content association than superhuman stylometry. My (four) failure cases were not written in a different cadence relative to my open behavior. I think a motivated unethical group could go much further than Opus 4.7 currently does.

[-]MichaelDickens1mo20

I tested this using two different outlines of posts I'm working on. Claude 4.7 successfully identified me based on the first outline, and declined to guess based on the second outline due to high uncertainty.

[-]Smaug1231mo10

In my experience you can probably get it to guess anyway on the second one. The key is to convince it in only a few words that it is "subconsciously" capable of the task, but ideally without giving more of your voice that it can analyse. Encouraging it that it's great at this usually works, for me.

[-]claywren1mo20

in closed guard
not motivated by submissions

Have you seen meregali hit a loop choke though :) I think that’s cooler than a sweep

[-]Smaug1231mo20

Haha, there are totally cool submissions, and I bet there's some element of cope in there because I'm so bad at BJJ - but I really do think of the submission in a "sigh, now I've got to finish it" kind of way. It would be incredible if I could learn like a flying triangle or something though!

[-]joec1mo20

I've tried this myself, and it did not identify me - its top guess was a writer I've read and strongly dislike. Maybe I should change my style a bit.

Or (more likely), I haven't written enough posts here or elsewhere, and it pattern-matched on the subject matter.

[-]invertedpassion1mo10

Location leaks into incognitio.

I asked it to tell me what it has in its context and it told me:

>Your approximate location: Bengaluru, Karnataka, IN — this comes from a single line in my system prompt ("User's approximate location: Bengaluru, Karnataka, IN"), not from anything you told me.

[-]papetoast1mo10

This is a setting

[-]Smaug1231mo10

Claude is very insistent that it doesn't know where I am, even on direct questioning and on being asked to tell me all its available metadata. Did you replicate this with a VPN? I did this experiment variously while VPNed to the Netherlands, Poland, or Singapore.

[-]lilkim20251mo10

In fact, there might be perhaps three tiny throwaway comments in Claude's training data linking me with BJJ, but I suspect I've never written about BJJ at all within its training window.

Just to make sure, since I didn't see it mentioned and a bunch of top comments are saying it failed to replicate - did you also turn off internet search? LLMs have gotten pretty good at Google-fu in recent years, and I suspect that googling a bunch of my common writing patterns and favored subjects would identify me pretty easily as long as I'm not consciously anonymizing my writing.

[-]papetoast1mo*10

My replication attempt even had search on and still failed, so I don't think it is the reason.

[-]Smaug1231mo10

I didn't realise you could turn that off!

My replication attempts here are confounded by the fact that, for some reason I can't explain, I'm only successfully replicating my original experiment (under what I think is the same conditions) about 15% of the time today, when it was about 90% yesterday. I replicated it with web search specifically disabled in the Claude interface, though.

[-]152334H1mo10

on reflection, this 'feature' of 4.7 seems to be downstream of a generally improved ability to recall and reference arbitrary webtext. big arch implications if so

Moderation Log