This document is about ideas for AI labs. It's mostly from an x-risk perspective. Its underlying organization black-boxes technical AI stuff, including technical AI safety.
Pay the alignment tax (if you develop a critical model)
Improve your security (operational security, information security, and cybersecurity)
There's a private reading list on infosec/cybersec, but it doesn't have much about what labs (or others) should actually do.
Plan and prepare: ideally figure out what's good, publicly commit to doing what's good (e.g., perhaps monitoring for deceptive alignment or supporting external model evals), do it, and demonstrate that you're doing it
For predicting and avoiding misuse
For alignment
For deployment (especially of critical models)
For coordinating with other labs
Sharing
Stopping
Merging
More
For engaging government
For increasing time 'near the end' and using it well
For ending risk from misaligned AI
For how to get from powerful AI to a great long-term future
Some sources are roughly sorted within sections by a combination of x-risk-relevance, quality, and influentialness– but sometimes I didn't bother to try to sort them, and I haven't read all of them.
Please have a low bar to suggest additions, substitutions, rearrangements, etc.
At various levels of abstraction, coordination can look like: - Avoiding a race to the bottom - Internalizing some externalities - Sharing some benefits and risks - Differentially advancing more prosocial actors? - More?
I am not excited about watermarking. (Note: this disclaimer does not imply that I am excited about the other ideas in this doc! But I am excited about most of them.)
Related: AI policy ideas: Reading list.
This document is about ideas for AI labs. It's mostly from an x-risk perspective. Its underlying organization black-boxes technical AI stuff, including technical AI safety.
Lists & discussion
Levers
Desiderata
Maybe I should make a separate post on desiderata for labs (for existential safety).
Ideas
Coordination[1]
See generally The Role of Cooperation in Responsible AI Development (Askell et al. 2019).
Transparency
Transparency enables coordination (and some regulation).
Publication practices
Labs should minimize/delay the diffusion of their capabilities research.
Structured access to AI models
Governance structure
Miscellanea
See also
Some sources are roughly sorted within sections by a combination of x-risk-relevance, quality, and influentialness– but sometimes I didn't bother to try to sort them, and I haven't read all of them.
Please have a low bar to suggest additions, substitutions, rearrangements, etc.
Current as of: 9 July 2023.
At various levels of abstraction, coordination can look like:
- Avoiding a race to the bottom
- Internalizing some externalities
- Sharing some benefits and risks
- Differentially advancing more prosocial actors?
- More?
Policymaking in the Pause (FLI 2023) cites A Systematic Review on Model Watermarking for Neural Networks (Boenisch 2021); I don't know if that source is good. (Note: this disclaimer does not imply that I know that the other sources in this doc are good!)
I am not excited about watermarking. (Note: this disclaimer does not imply that I am excited about the other ideas in this doc! But I am excited about most of them.)