No MCTS, no PRM...
scaling up CoT with simple RL and scalar rewards...
emergent behaviour
Thanks for posting this!
https://twitter.com/AISafetyMemes/status/1764894816226386004 https://twitter.com/alexalbert__/status/1764722513014329620
How emergent / functionally special/ out of distribution is this behavior? Maybe Anthropic is playing big brain 4D chess by training Claude on data with self awareness like scenarios to cause panic by pushing capabilities with it and slow down the AI race by resulting regulations while it not being out of distribution emergent behavior but deeply part of training data and it being in distribution classical features interacting in circuits
Merging with Anthropic may have been a better outcome
"OpenAI’s ouster of CEO Sam Altman on Friday followed internal arguments among employees about whether the company was developing AI safely enough, according to people with knowledge of the situation.
Such disagreements were high on the minds of some employees during an impromptu all-hands meeting following the firing. Ilya Sutskever, a co-founder and board member at OpenAI who was responsible for limiting societal harms from its AI, took a spate of questions.
At least two employees asked Sutskever—who has been responsible for OpenAI’s biggest research break...
We're at the start of interpretability, but the progress is lovely! Superposition was such a bottleneck even in small models.
More notes:
https://twitter.com/ch402/status/1710004685560750153 https://twitter.com/ch402/status/1710004416148058535
"Scalability of this approach -- can we do this on large models? Scalability of analysis -- can we turn a microscopic understanding of large models into a macroscopic story that answers questions we care about?"
"Make this work for real models. Find out what features exist in large models. Understand new, mor...
0.5 out of $7T is done...