I guess I'm worried that allowing insurance for disasters above a certain size could go pretty badly if it increases the chance of labs being reckless.
Right so you're worried about moral hazard generated by insurance (in the case where we have liability in place). For starters, the government arguably generates moral hazard for disasters of a certain size by default: it can't credibly commit ex ante to not bail out a critical economic sector or not provide relief to victims in the event of a major disaster: the government is always implicitly on the hook (see Moss, D. A. When All Else Fails: Government as the Ultimate Risk Manager. See the too-big-to-fail effect for an example). Charging a risk-priced premium for that service can only help.
But you're probably more worried about private insurer's ability to mitigate the moral hazard they generate. Insurers certainly do not always succeed at doing this. However, they sometimes not only succeed, but in fact induce more harm reduction than liability alone probably would have induced on its own (see e.g. the Insurance Institute for Highway Safety rating crashworthiness of vehicles, and the auto-industry's lobbying for airbags in the 80s). For more see:
My research finds that, in the specific context of insuring against uncertain heavy-tail risks we can expect private insurers to engage in a mix of:
In other words, if set up correctly, I expect them to do all the things we would want them to do.
The government also needn't mandate specifically commercial insurance: it could also allow/encourage labs to mutualize their risk, entirely eliminating concerns about moral hazard.
You can read more about all of this here.
TL;DR
I need help finding/developing a mech that can reliably elicit honest and effortful risk-estimates from frontier AI labs regarding their models, risk-estimates to be used in risk-priced premiums that labs then pay the government (i.e. a "Pigouvian tax for x-risk"). Current best guess: Bayesian Truth Serum.
Stretch-goal: find/develop a mech for incenting optimal production of public safety research. Current best guess: Quadratic Financing.
DM me if you're interested in collaborating!
The Project
X-risk poses an extreme judgment-proof problem: threats of ex post punishment for causing an existential (or nationally existential, or even just a disaster that renders your company insolvent) have little to any deterrent effect. Liability on its own will completely fail to internalize these negative externalities.
Traditionally, risk-priced insurance premiums are used to solve judgment-proofness (turn large ex post costs into small regular ex ante costs). However, insurers are also judgment-proof in the face of x-risk.
I'm developing a regime for insuring these "uninsurable" risks from frontier AI. It's modeled after the arguably very successful liability and insurance regime for nuclear power operators. In two recent workshop papers, I argue we should make foundation model developers liable for a certain class of catastrophic harms/near miss events and:
A government agency – through audits, its own forecasts and so on – could (and should) try to make these risk-estimates. However, this will be costly and they will struggle to collapse the information asymmetry between it and the developers it insures. Relying mostly on mechanism design to just incentivize labs to report honest and effortful risk-estimates has a number of advantages:
The regime in schematic form:
The Work
I lack the mech design expertise to confidently assess the quality/relevance/usability of mechs I read about; I'm looking for someone with at least a graduate level understanding of mechanism design to collaborate with me.
The work will most likely involve:
It's possible this only takes you ~40 hrs to accomplish, if an appropriately plug-n-play mechanism is already out there. I doubt this however (I have done some preliminary searching).
You can find more details of the questions I think need working out here (a much older, longer draft of the workshop papers linked above).
Theory of Impact
The goal is to write a large policy paper and then spin off some policy memos. I've applied to some policy fellowships. I plan to work on this proposal regardless, but obviously if I get in, that will be my platform for sharing this work.
Policy folks I've talked to, including a few people that work in DC, have expressed interest in seeing this developed further – e.g. Makenzie Arnold tells me this is "in the category of sensible" proposals. But obviously it needs more fleshing out.
If the research collaboration I'm proposing here goes great and policy folks love it, we may want to do a follow-up running an experiment to empirically verify our claims in as analogous a setting as we can muster.
Compensation/Timeline
If money is an issue, I'm willing to pay 20~40$/hr, possibly more for exceptional collaborators. I'm also happy to do all the writing/paperwork if you just want to provide the thought input. Happy to help write a grant proposal too (but I'm unlikely to be able to secure one on my own – see below.)
Ideally, I'd like to get something written up by EoY. Within that time frame though, I'm very flexible.
About me
FWIW, my two workshop papers linked above were accepted into the GenLaw workshop at ICML 2024.
I recently did this deep dive into insurance, liability and especially the nuclear power precedent, but I'm only a casual appreciator of mechanism design and economics. My MA was in philosophy.
NB: I'm a very early career professional and do not have an institutional affiliation.
I'm based in Boston (out of the AISST office).
Contact
If all this sounds like you, or someone you know, feel free to DM me or email at ctroutcsi@gmail.com! If you're just interested in the proposed regime or have questions, feel free to ask them in the comments.