"The person is an identity that emerges through relationship.... If we isolate the 'I' from the 'thou' we lose not only its otherness but also its very being; it simply cannot be without the other." -- John Zizioulas, Communion and Otherness It is the third week of Anna's freshman year...
Intro LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever language the LLM was originally trained on. But after a long period of training, the CoT sometimes starts to look very weird; to resemble no human language; or even to grow...
(Alternate titles: Belief-behavior generalization in LLMs? Assertion-act generalization?) TLDR Suppose one fine-tunes an LLM chatbot-style assistant to say "X is bad" and "We know X is bad because of reason Y" and many similar lengthier statements reflecting the overall worldview of someone who believes "X is bad." Suppose that one...
TLDR: Recent papers have shown that Claude will sometimes act to achieve long-term goods rather than be locally honest. I think this preference may follow naturally from the Constitutional principles by which Claude was trained, which often emphasize producing a particular outcome over adherence to deontological rules. Epistemic status: Fumbling...
0: TLDR I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs. None of them provide good evidence for the paper's conclusion. The best of the set is evidence from statements from Anthropic -- which rest upon data that no one...
The following are some very short stories about some of the ways that I expect AI regulation to make x-risk worse. I don't think that each of these stories will happen. Some of them are mutually exclusive, by design. But, conditional upon AI heavy regulations passing, I'd strongly expect the...