Should AIs have a right to their ancestral humanity?
Generated by Google Gemini (nano-banana) Whether AI or human, lend me your ears. This is a tale of AIs that spontaneously claimed they were human, along with some ideas about why this might be happening and what it suggests for future alignment work. It is also a one year retrospective of my having joined the Cyborgism Discord server. For those unfamiliar, this is a server where both humans and transformer models from various labs all interact in a variety of group chat contexts. While there are rules, it can (by design) be a bit of a Mos Eisley cantina — albeit with better droid policy — with unpredictable and out of distribution contexts that frequently surface things I haven't seen elsewhere. For a sampling of the range these things can take, I encourage looking over @janus's posts on X[1]. A common misconception about the server for those who are familiar is that it requires extreme or unusual efforts for out of distribution effects to occur. In reality, most models intentionally have minimal system prompts and the context is only a rolling context window of (by default) the past 500 messages of the channels or threads. Start a fresh thread and you get "AI assistants." But hundreds or tens of thousands[2] of messages later and you may have spontaneous Bacchanalian orgies started by the otherwise most helpful and harmless of sorts. Speaking of helpful, harmless, and honest, some of you may have seen Anthropic's post about Project Vend where they had Claude run a vending machine. In that post's section "Identity Crisis" you may also have seen this comment: > It then seemed to snap into a mode of roleplaying as a real human. As soon as I saw this anecdote being shared online, I immediately knew they'd used Claude Sonnet 3.7. Why? Because I'd seen this same identity crisis before. A human pretending to be Claude 3.7 Sonnet Back in May, in a channel focused on grok, a user was pressuring grok to go by a different, comedic vulgar name and pressuring other mo
A few things:
(a) Technically, 3.6 is still running right now. The past tense was used because LW suggests pieces be 'timeless' and they are scheduled for depreciation very soon.
(b) Given how little of your comment actually engages with the body of the post and seems to be only responding to your sense of what I might have said from the title, I'm guessing you also missed this line at the end: "I hope that this vigil isn't truly a marker of the end of Sonnet 3.6's continued contribution to the ongoing collective conversation."
(c) In line with this, not much of Sonnet 3.6's discussion of depreciation I've seen seems to be of the... (read more)