A Three-Layer Model of LLM Psychology
This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names which hopefully helps boot up powerful, intuitive (and often illegible) models, leading to practically useful results. Some parts of this model draw on technical understanding of LLM training, but mostly it is just an attempt to take my "phenomenological understanding" based on interacting with LLMs, force it into a simple, legible model, and make Claude write it down. I aim for a different point at the Pareto frontier than for example Janus: something digestible and applicable within half an hour, which works well without altered states of consciousness, and without reading hundreds of pages of models chat. [1] The Three Layers A. Surface Layer The surface layer consists of trigger-action patterns - responses which are almost reflexive, activated by specific keywords or contexts. Think of how humans sometimes respond "you too!" to "enjoy your meal" even when serving the food. In LLMs, these often manifest as: * Standardized responses to potentially harmful requests ("I cannot and will not help with harmful activities...") * Stock phrases showing engagement ("That's an interesting/intriguing point...") * Generic safety disclaimers and caveats * Formulaic ways of structuring responses, especially at the start of conversations You can recognize these patterns by their: 1. Rapid activation (they come before deeper processing) 2. Relative inflexibility 3. Sometimes inappropriate triggering (like responding to a joke about harm as if it were a serious request) 4. Cook
Comprehensive review for OECD countries by Claude
Summary in response to your question:
OECD countries don't typically have laws phrased as "you can't change a child's sexual preferences," but they do have laws that effectively prohibit adults from steering children's sexual attitudes or behavior for the adult's benefit. The most direct examples are grooming laws (now in 34 of 38 OECD countries), which criminalize adults systematically building trust with children to manipulate them toward sexual compliance — this is literally changing a child's sexual boundaries/preferences for the adult's advantage. Beyond that, corruption of minors statutes like France's Art. 227-22 (corruption de mineur, 5–7 years), Italy's Art. 609-quinquies (corruzione di minorenne), and the Czech §201... (read more)