I want to understand neural networks, intelligence, AI, brain, physics, math, consciousness, philosophy, risks, well free flourishing future of sentience. TESCREAL!
https://twitter.com/AISafetyMemes/status/1764894816226386004 https://twitter.com/alexalbert__/status/1764722513014329620
How emergent / functionally special/ out of distribution is this behavior? Maybe Anthropic is playing big brain 4D chess by training Claude on data with self awareness like scenarios to cause panic by pushing capabilities with it and slow down the AI race by resulting regulations while it not being out of distribution emergent behavior but deeply part of training data and it being in distribution classical features interacting in circuits
Merging with Anthropic may have been a better outcome
"OpenAI’s ouster of CEO Sam Altman on Friday followed internal arguments among employees about whether the company was developing AI safely enough, according to people with knowledge of the situation.
Such disagreements were high on the minds of some employees during an impromptu all-hands meeting following the firing. Ilya Sutskever, a co-founder and board member at OpenAI who was responsible for limiting societal harms from its AI, took a spate of questions.
At least two employees asked Sutskever—who has been responsible for OpenAI’s biggest research breakthroughs—whether the firing amounted to a “coup” or “hostile takeover,” according to a transcript of the meeting. To some employees, the question implied that Sutskever may have felt Altman was moving too quickly to commercialize the software—which had become a billion-dollar business—at the expense of potential safety concerns."
Kara Swisher also tweeted:
"More scoopage: sources tell me chief scientist Ilya Sutskever was at the center of this. Increasing tensions with Sam Altman and Greg Brockman over role and influence and he got the board on his side."
"The developer day and how the store was introduced was in inflection moment of Altman pushing too far, too fast. My bet: [Sam will] have a new company up by Monday."
Apparently Microsoft was also blindsided by this and didn't find out until moments before the announcement.
"You can call it this way," Sutskever said about the coup allegation. "And I can understand why you chose this word, but I disagree with this. This was the board doing its duty to the mission of the nonprofit, which is to make sure that OpenAl builds AGI that benefits all of humanity." AGI stands for artificial general intelligence, a term that refers to software that can reason the way humans do.
When Sutskever was asked whether "these backroom removals are a good way to govern the most important company in the world?" he answered: "I mean, fair, I agree that there is a not ideal element to it. 100%."
https://twitter.com/AISafetyMemes/status/1725712642117898654
We're at the start of interpretability, but the progress is lovely! Superposition was such a bottleneck even in small models.
More notes:
https://twitter.com/ch402/status/1710004685560750153 https://twitter.com/ch402/status/1710004416148058535
"Scalability of this approach -- can we do this on large models? Scalability of analysis -- can we turn a microscopic understanding of large models into a macroscopic story that answers questions we care about?"
"Make this work for real models. Find out what features exist in large models. Understand new, more complex circuits."
When it comes to manipulation, other recent paper seems more promising IMO! Like fMRI.
https://twitter.com/mezaoptimizer/status/1709292930416910499
https://arxiv.org/abs/2310.01405
"This might be the biggest alignment paper of the year. Everyone has been complaining that mechanistic interpretability is like doing LLM cell microbiology, when what we really need is LLM neuro-imaging. Well now we have it: "representation engineering" Similar to an fMRI scan, CAIS creates the LAT (Linear Artificial Tomography) scan. They also do a form of LLM neuro-modulation, getting the model to be honest or deceptive by just adding in a vector to its activations. imo this could be the winning alignment agenda"
Thanks for posting this!