franc
franc has not written any posts yet.

Thanks for putting this out there, I’m on the Catalyze program in Lisa in London and have been thinking a lot about AI control (for framing my background is risk management, so coming at it from that angle).
'The high-level change I propose is to define a control evaluation as a method of bounding risk from worst-case deployment behavior generally – rather than scheming models in particular.'
I’m on board with this, if our goal is to bound the risk from worst-case deployment behaviour, I think the broader the better and enumerating different ‘untrusted model categories’ is going to better inform our understanding of harmful behaviours. If we have post deployment monitoring, and identify new patterns... (read more)
I think incentives. Based on my recent reading of 'The Chaos Machine' by Max Fisher, I think it's closely linked to continually increasing engagement driving profit. Addictiveness unfortunately leads to more engagement, which in turn leads to profit. Emotive content (clickbait style, extreme things) also increase engagement.
Tools and moderation processes might be expensive on their own, but I think it's when they start to challenge the underlying business model of 'More engagement = more profit' that the companies are in a more uncomfortable position.
I agree that more research on effect of recommendation algorithms on the brain would be useful.
Also research looking at which cognitive biases and preferences the algorithms are exploiting, and who is most susceptible to these e.g. children, neurodiverse etc. It seems plausible to me that some ai applications e.g. character.ai as you say, will be optimising on some sort of human interaction and exploiting human biases and cognitive patterns will be a big part of this.
Great addition to thinking on safety cases.
I'm curious about the decision to include the time constraint 'Deployment does not pose unacceptable risk within its 1 year lifetime', as I've been considering how safety case arguments remain relevant post deployment.
Is this intended to convey that the system will be withdrawn/ updated in a year or to help reduce the scope of what is being considered in terms of feasibility of inability and control arguments by looking at a short risk evaluation timeline?
On the topic of the scope of the Harm Inability arguments in the paper:
'A Harm inability argument could be absolute or limited to a given deployment setting'
Given the vast amounts of deployment... (read more)
Very thoughtful piece which I am still musing over...
The section that left me most uncertain is 2.3.1 (“How international must a Halt be, and on what timescale?”), especially the open question: “What can be done to increase US centralization of AI development?”
I would expect that today, most states rank “other great powers with advanced AI” above “misaligned AI” in their threat models. Until that ordering flips, relying on US centralization of AI development to implement a Halt strategy may actually exacerbate some of the failure modes the agenda wants to avoid e.g. War: AI‑related conflict between great powers causes catastrophic harm.
I don’t have a tidy solution, but I suspect a crisper picture of stakeholder payoff matrices would help to design any Halt i.e. hard constraints and potential trust‑building moves that a viable Halt has to respect.
Any thoughts on this?