Status: Very early days of the research. No major proof of concept yet. The aim of this research update is to solicit feedback and encourage others to experiment with my code. GitHub repository Main Idea: Researchers are using SAE latents to steer model behaviors, yet human-designed selection algorithms are unlikely...
Gulsimo Osimi, Matthew Khoriaty Executive Summary This research investigates the potential of collaborative AI systems to solve complex tasks through the orchestration of multiple specialized models. By leveraging Microsoft's Autogen framework, we coordinate DeepSeek and GPT-4o Mini to tackle complex challenges. Beyond performance enhancement, our work makes a novel contribution...
EDIT: The folks at EleutherAI have taken and improved my work, and extended it to work with the Sparsify framework in addition to my use of SAE-Lens! I'm super proud that I contributed to that :) That's me! I haven't used the new system, but if you want to use...
This is a shortened version of "Preventing AI from Overthrowing the Government" from my Substack to focus on the things that would be interesting to LessWrong. Full version with citations can be found here. Introduction Many AI labs, including OpenAI, are forthright in saying that their goal is to create...