Stephen Fowler

"You are in love with Intelligence, until it frightens you. For your ideas are terrifying and your hearts are faint."

Wikitag Contributions

Comments

Sorted by

Thank you for this immediately actionable feedback.

To address your second point, I've rephrased the final sentence to make it more clear.

What I'm attempting to get at is that rapid proliferation of innovations between developers isn't a necessarily a good thing for humanity as a whole.

The most obvious example is instances where a developer is primarily being driven by commercial interest. Short-form video content has radically changed the media that children engage with, but may have also harmed education outcomes. 

But my primary concern stems from the observation that small changes to a complex system can lead to phase transitions in the behaviour of that system. Here the complex system is the global network of developers and their deployed S-LLMs. A small improvement to S-LLM may initially appear benign, but have unpredictable consequences once it spreads globally.

You have conflated two separate evaluations, both mentioned in the TechCrunch article. 

The percentages you quoted come from Cisco’s HarmBench evaluation of multiple frontier models, not from Anthropic and were not specific to bioweapons.

Dario Amondei stated that an unnamed DeepSeek variant performed worst on bioweapons prompts, but offered no quantitative data. Separately, Cisco reported that DeepSeek-R1 failed to block 100% of harmful prompts, while Meta’s Llama 3.1 405B and OpenAI’s GPT-4o failed at 96 % and 86 %, respectively.

When we look at performance breakdown by Cisco, we see that all 3 models performed equally badly on chemical/biological safety.

Thinking of trying the latest Gemini model? Be aware that it is almost impossible to disable the "Gemini in Docs" and "Gemini in Gmail" services once you have purchased a Google One AI Premium plan.

Edit: 

Spent 20 minutes trying to track down a button to turn it off before reaching out to support.

A support person from Google told me that as I'd purchased the plan there was literally no way to disable having Gemini in my inbox and docs.

Even cancelling my subscription would keep the service going until the end of the current billing period.

But despite what support told me, I resolved the issue. Account > Data and privacy > "Delete a Google service" and then I deleted my Google One account. No more Gemini in inbox and my account on the Gemini app seems to have reverted to a free user account.

I imagine this "solution" won't be feasible if you use Google One for anything else (file storage).

While each mind might have a maximum abstraction height, I am not convinced that the inability of people to deal with increasingly complex topics is direct evidence of this.

Is it that this topic is impossible for their mind to comprehend, or is it that they've simple failed to learn it in the finite time period they were given?

Thanks for writing this post. I agree with the sentiment but feel it important to highlight that it is inevitable that people assume you have good strategy takes.

In Monty Python's "Life of Brian" there is a scene in which the titular character finds himself surrounded by a mob of people declaring him the Mesiah. Brian rejects this label and flees into the desert, only to find himself standing in a shallow hole, surrounded by adherents. They declare that his reluctance to accept the title is further evidence that he really is the Mesiah. 

To my knowledge nobody thinks that you are the literal Messiah but plenty of people going into AI Safety are heavily influenced by your research agenda. You work at Deepmind and have mentored a sizeable number of new researchers through MATS. 80,000 Hours lists you as example of someone with a successful career in Technical Alignment research. 

To some, the fact that you request people not to blindly trust your strategic judgement is evidence that you are humble, grounded and pragmatic, all good reasons to trust your strategic judgement.

It is inevitable that people will view your views on the Theory of Change for Interpretability as aithoritative. You could literally repeat this post verbatim at the end of every single AI safety/interpretability talk you give, and some portion of junior researchers will still leave the talk defering to your strategic judgement.

These recordings I watched were actually from 2022 and weren't the Sante Fe ones. 

A while ago, I watched recordings of the lectures given by by Wolpert and Kardes at the Santa Fe Institute*, and I am extremely excited to see you and Marcus Hutter working in this area. 

Could you speculate on if you see this work having any direct implications for AI Safety?

 

Edit:

I was incorrect. The lectures from Wolpert and Kardes were not the ones given at the Santa Fe Institute.

Signalling that I do not like linkposts to personal blogs.

"cannot imagine a study that would convince me that it "didn't work" for me, in the ways that actually matter. The effects on my mind kick in sharply, scale smoothly with dose, decay right in sync with half-life in the body, and are clearly noticeable not just internally for my mood but externally in my speech patterns, reaction speeds, ability to notice things in my surroundings, short term memory, and facial expressions."

The drug actually working would mean that your life is better after 6 years of taking the drug compared to the counterfactual where you took a placebo.

The observations you describe are explained by you simply having a chemical dependency on a drug that you have been on for 6 years.

Load More