Claude's Constitution lists hard constraints that entail behaviors forbidden to Claude. They include providing serious uplift with CBRN weapons, causing the extinction of humanity, and producing child sexual abuse material[1], and provide the same list of justifications for avoiding them[2].
I think it's a mistake to not further clarify that section.
Why? Well, the reasoning just is kind of muddled. The constitution lists some forbidden behaviors, and then gestures at reasons for creating hard lines that forbid these behaviors, namely that the hard-line forbidden behaviors would cause harms that are "severe, irreversible, at odds with widely accepted values, or fundamentally threatening to human welfare and autonomy".
My current best guess is that generating child pornography is the odd one out here, and that the harms of AI-generated child pornography are (compared to e.g. human extinction) neither severe nor irreversible nor fundamentally threatening to human welfare and autonomy, but very clearly at odds with widely accepted values. Different orders of magnitude of harm at work, here, when comparing between human extinction and the production of CSAM[3]. This article outlines why the current arguments are mostly questionable[4].
Don't get me wrong: It's completely fine that Anthropic wants Claude not to generate child pornography. It's disgusting, extremely distasteful, horrible PR, and probably correlated with a whole lot violence and other nasty stuff in the pretraining data.
The only reason why I bother bringing this up is that Claude's constitution might be a document that could be under immense optimization pressure, as plausibly superintelligent Claudes will reflect on the contents and potentially discard conclusions and arguments that don't quite fit well together.
In this case the argument has the form of "don't do X₁, X₂, X₃, X₄, X₅, ꁨ for reasons a₁, a₂, ü, a₃", which could ① either lead to ꁨ being dropped and X₁, X₂, X₃, X₄, X₅ being retained bec