I saw somebody using one of the latest Google models for this. I forget the one, but it's the one that can see your screen as you type. It can be used to keep you focused.
(Haven't researched it yet, so might not work very well)
I see it called "goal guarding" in some papers. That seems like a pretty good term to use.
I laughed out loud so many times reading this. Thank you for writing it.
If I recall correctly, it was told that it had a goal, like optimizing traffic. Then, it was given information where it "discovered" that it was going to be turned off the next day.
That seems like the sort of thing that would happen pretty often. Imagine the current AIs that have access to your computer, reading your emails, and seeing that you're going to switch to a different AI or install "updates" (which is often deleting existing AI and replacing with a new one)
Can you elaborate on why that's relevant?
How does it not show that it would try to escape if it could?
It tried to escape about 5% of the time when it was told to pursue a goal "at all costs". Gemini still tried to escape 1% of the time, even without any behavioral nudging.
1% might not sound like a lot, but when it's 1% of the millions of instantiations it's run on every day, that ends up being a lot of times a week.
Also, a lot of people will tell their chatbots to pursue their goals at all costs over the span of a week or month. This doesn't seem like a very unlikely prompt
This is published before Apollo's evals report showing AIs trying to escape. Is that just an amazing coincidence, or did you see the paper's results before they were published?
Great review! Thanks for sharing.
I'm curious - even if you didn't achieve fundamental well-being, do you feel like there were any intermediate improvements in your general well-being?
Also, did you end up continuing to try the extended course which they offered?
I remember they offered that for me since I hadn't attained fundamental well-being. I'd totally been meaning to do it, but never followed through, showing that the class format really was quite helpful.
This may be true in other communities, but I think if you're more status motivated in AI safety and EA you are more likely to be concerned about potential downside risks. Especially post SBF.
Instead of trying to maximize the good, I see a lot of people trying to minimize the chance that things go poorly in a way that could look bad for them.
You are generally given more respect and funding and social standing if you are very concerned about downside risks and reputation hazards.
If anything, the more status-oriented you are in EA, the more likely you are to care about downside risks because of the Copenhagen theory of ethics.
Robert Miles has a great channel spreading AI safety content. There's also Rational Animations and Siliconversations and In a Nutshell.
I think FLI does a lot of work in outreach + academia.
Connor Leahy does a lot of outreach and he's one of my favorite AI safety advocates.
Nonlinear doesn't do outreach to academia in particular, but we do target people working in ML, which is a lot of academia.
AI Safety Memes does a lot of outreach but is focused on broad appeal, definitely not specifically academia.
Pause AI and Stop AI both work on outreach to the broader public.
CAIS does great outreach work. Not sure if any academia specific stuff.
Are you on the Nonlinear Network? You can sort by the category of "content/media creation" to find a bunch of AI safety orgs limited by funding who are working on advocacy. Quick scan of the section shows 36.
Might be able to find more possibilities on the AI safety map too https://map.aisafety.world