Non-alignment project ideas for making transformative AI go well

Lukas Finnveden

This is a series of posts with lists of projects that it could be valuable for someone to work on. The unifying theme is that they are projects that:

Would be especially valuable if transformative AI is coming in the next 10 years or so.
Are not primarily about controlling AI or aligning AI to human intentions.^[1]
- Most of the projects would be valuable even if we were guaranteed to get aligned AI.
- Some of the projects would be especially valuable if we were inevitably going to get misaligned AI.

The posts contain some discussion of how important it is to work on these topics, but not a lot. For previous discussion (especially: discussing the objection “Why not leave these issues to future AI systems?”), you can see the section How ITN are these issues? from my previous memo on some neglected topics.

The lists are definitely not exhaustive. Failure to include an idea doesn’t necessarily mean I wouldn’t like it. (Similarly, although I’ve made some attempts to link to previous writings when appropriate, I’m sure to have missed a lot of good previous content.)

There’s a lot of variation in how sketched out the projects are. Most of the projects just have some informal notes and would require more thought before someone could start executing. If you're potentially interested in working on any of them and you could benefit from more discussion, I’d be excited if you reached out to me! ^[2]

There’s also a lot of variation in skills needed for the projects. If you’re looking for projects that are especially suited to your talents, you can search the posts for any of the following tags (including brackets):

[ML] [Empirical research] [Philosophical/conceptual] [survey/interview] [Advocacy] [Governance] [Writing] [Forecasting]

The projects are organized into the following categories (which are in separate posts). Feel free to skip to whatever you’re most interested in.

Governance during explosive technological growth
- It’s plausible that AI will lead to explosive economic and technological growth.
- Our current methods of governance can barely keep up with today's technological advances. Speeding up the rate of technological growth by 30x+ would cause huge problems and could lead to rapid, destabilizing changes in power.
- This section is about trying to prepare the world for this. Either generating policy solutions to problems we expect to appear or addressing the meta-level problem about how we can coordinate to tackle this in a better and less rushed manner.
- A favorite direction is to develop Norms/proposals for how states and labs should act under the possibility of an intelligence explosion.
Epistemics
- This is about helping humanity get better at reaching correct and well-considered beliefs on important issues.
- If AI capabilities keep improving, AI could soon play a huge role in our epistemic landscape. I think we have an opportunity to affect how it’s used: increasing the probability that we get great epistemic assistance and decreasing the extent to which AI is used to persuade people of false beliefs.
- A couple of favorite projects are: Create an organization that gets started with using AI for investigating important questions or Develop & advocate for legislation against bad persuasion.
Sentience and rights of digital minds.
- It’s plausible that there will soon be digital minds that are sentient and deserving of rights. This raises several important issues that we don’t know how to deal with.
- It seems tractable both to make progress in understanding these issues and in implementing policies that reflect this understanding.
- A favorite direction is to take existing ideas for what labs could be doing and spell out enough detail to make them easy to implement.
Backup plans for misaligned AI
- If we can’t build aligned AI, and if we fail to coordinate well enough to avoid putting misaligned AI systems in positions of power, we might have some strong preferences about the dispositions of those misaligned AI systems.
- This section is about nudging those into somewhat better dispositions (in worlds where we can’t align AI systems well enough to stay in control).
- A favorite direction is to study generalization & AI personalities to find easily-influenceable properties.
Cooperative AI
- Difficulties with cooperation have been a big source of lost value and unnecessary risk in the past. AI offers dramatic changes in how bargaining could work.
- This section is about projects that could make AI (and AI-assisted humans) more likely to handle cooperation well.
- One of my favorite projects here is actually the same as for the “backup plans” mentioned above. (There’s significant overlap between the two.)

Acknowledgements

Few of the ideas in these posts are original to me. I’ve benefited from conversations with many people. Nevertheless, all views are my own.

For some projects, I credit someone who especially contributed to my understanding of the idea. If I do, that doesn’t mean they have read or agree with how I present the idea (I may well have distorted it beyond recognition). If I don’t, I’m still likely to have drawn heavily on discussion with others, and I apologize for any failure to assign appropriate credit.

For general comments and discussion, thanks to Joseph Carlsmith, Paul Christiano, Jesse Clifton, Owen Cotton-Barrat, Holden Karnofsky, Daniel Kokotajlo, Linh Chi Nguyen, Fin Moorhouse, Caspar Oesterheld, and Carl Shulman.

^{^}
Nor are they primarily about reducing risks from engineered pandemics.
^{^}
My email is [last name].[first name]@gmail.com

[-]RogerDearnaley10mo10

It’s plausible that there will soon be digital minds that are sentient and deserving of rights. This raises several important issues that we don’t know how to deal with.

See the first five posts of my sequence AI, Alignment, and Ethics for a detailed analysis of exactly this, including:

how to make decisions like this
under what circumstance we should, and should not, give rights to different sorts of digital minds, how much moral weight to give them, and how this interacts with things like their ease of duplication
which rights to give them and why
whether the criterion for moral rights should be sapience, or sentience

(Some of the results were quite counterintuitive to me when I first showed them.)

LESSWRONG
LW

44

Non-alignment project ideas for making transformative AI go well

44

Acknowledgements

44