The concept of LLM Psychology is interesting and exciting to me
I have progressively seen more people referring to a type of research as LLM Psychology. I think having a place specifically for it on LW is useful.
If you reply to this comment with posts you think fit under this tag, I'll read them and decide if they seem like they should be here. I'm currently quite fuzzy on what really belongs in this tag. Clarification on what you think LLM Psych is would be much appreciated.
Maybe see if the posts under the Chain of Thought Alignment tag can fit, since that may be the closest tag to AI Psychology before the AI Psychology tag existed. The overlap is small, so I agree that AI Psychology should be a new tag.
I decided to create this tag for two reasons:
If you reply to this comment with posts you think fit under this tag, I'll read them and decide if they seem like they should be here. I'm currently quite fuzzy on what really belongs in this tag. Clarification on what you think LLM Psych is would be much appreciated.
Maybe see if the posts under the Chain of Thought Alignment tag can fit, since that may be the closest tag to AI Psychology before the AI Psychology tag existed. The overlap is small, so I agree that AI Psychology should be a new tag.
Maybe my post Reduce AI Self-Allegiance by saying "he" instead of "I" fits?
Edit: more Chain of Thought Alignment posts which fit AI Psychology:
the case for CoT unfaithfulness is overstated
Language Agents Reduce the Risk of Existential Catastrophe
The Translucent Thoughts Hypotheses and Their Implications
I think this tag should be called "AI Psychology" or "Model Psychology" as LLM is a bit of an arbitrary and non-generalizable term.
(E.g., suppose 99% of compute in training was RL, should it still be called an LLM?)
(Agree and made the edit)