Epistemic status: Musings from the past month. Still far too vague for satisfaction.
Friends.
I have been away on a retreat this past week, seeking clarity on how to move forward with living a vibrant and beneficial life that resolves problems in the world, but I’m afraid that I have come back empty-handed. I have only vague musings of a bigger picture, but no clear sense for how to take decisive action. I’ll try to share what I have succinctly, so as not to take up too much time.
Understand alignment, not intelligence
Look, our task here is to align the systems that will most influence the future with what is actually good. To do that, we should look out at the world, identify which kinds of systems are most influential, and seek to align them to the benefit of all life on the planet. Intelligence is the means by which certain very powerful systems could have a very large influence over the future, and to that end we ought to be interested in understanding intelligence. But we need not have any particular interest in understanding intelligence for its own sake. What we should be interested in understanding is the means by which any system can exert influence over the future, and the means to align such powerful systems with that which is worth protecting.
Align systems, not AI
There has been great debate about what AI might look like. Will it look like a singleton, or like a tool, or like a set of cloud services, or like a society of competing entities? One person says that a powerful singleton might be dangerous, then another person says that AI might not look much like a powerful singleton.
Yet there is a single unifying issue to resolve here, which is this: how do we build things in the world that are and remain consistently beneficial to all life? How do we construct international treaties that are aligned in this way? How do we construct financial systems that are aligned in this way? How do we construct tools that are aligned in this way? How do we construct belief-forming, observation-making, action-taking agents that are aligned in this way? These questions are connected in a deep, not surface-level way, because they all come down to clarifying what is good and implementing it in a tangible system.
There is a hard problem of alignment
There are many difficult problems in AI alignment, but there seems to be one problem at the center that has an entirely different character of difficulty. The hard problem, as I see it, is this: how do we set up any system in a way that is aligned with what is actually good, when any particular operationalization of what is good is certain to be wrong?
The world now looks to us
In the early days of AI safety, there was a narrative that the world was mostly not on our side, that it was our job to beat the world over the head with the hard stick of difficult truths about dangers of advanced AI in order to wake people up to the impending destruction of life on this planet. This was a good narrative to have in the early days, and it served its purpose, but it is no longer serving us. I think that a better narrative to have now is the following.
The world is like an extremely wealthy but depressed person who realizes that their business empire is rapidly causing the destruction of life, and despite not finding the energy to make sweeping changes on their own, summons just enough clarity to make a large financial gift to a deputy who seems unusually agentic and trustworthy and ethical. That deputy -- that is, us, this community -- faces the difficult task of reforming an empire that is caught up in harmful patterns of politics and finance and prestige, so it is not exactly the case that everyone is "on their side", yet almost everyone in the empire sees that things are not going well, and in moments of clarity urges this deputy onwards, even if they soon return to participate in the very patterns that they hope the deputy will help to resolve.
We are the great hope of our civilization. Us, here, in this community. It is not that our civilization has woken up completely to the dangers of advanced AI. It is that our civilization has not woken up, yet wishes to wake up, and knows that it wishes to wake up, and has found just enough clarity to bestow significant power and resources to us in the hope that we will take up leadership.
In this subtle way, everyone is now on our side. Yet everyone is caught up in the very patterns that, at moments of clarity, they see are causing harm. Our job is to find the resolve to move forward with this difficult task, without getting caught up in the harmful patterns that exist in the world, and without losing track of the subtle way in which everyone is on our side.
This is the story. It is a way of seeing things, an ethos for carrying on with a difficult task that requires coordination with many people. It is a good way of seeing things to the extent that, if we chose to see things in this way, our actions would be beneficial to all life. It seems to me that seeing things this way would indeed be beneficial to all life because it calls us to befriend exactly that within everyone that seeks The Good, without giving even the tiniest accommodation to the patterns of behavior that are causing existential risk.
Agreed, but there are additional considerations here. The way that we interact with the wider world is influenced by the stories we tell ourselves about our relationship with the world, so narratives about our relationship with the world affect not just our sense of whether we are doing a good job, but also the tone with which we speak to the world, the ambition of our efforts, and the emotional impact of what we hear back from the world.
If we tell ourselves stories in which the world is mostly not on our side then we will speak to the world coercively, we'll shy away from attempting big things, and we'll be gradually worn down as we face difficulties.
But if we see, correctly, I believe, that most people actually have brief moments in which they can appreciate the dangers of powerful agentic systems being developed through ham-fisted engineering methods, and that the most switched-on people in the world seem to be turning to this particular community on these issues, then we might adopt quite a different internal demeanor as we approach these problems, not because we give ourselves some particular amount of credit for our past efforts, but because we see the world as fundamentally friendly to our efforts, without underestimating the depth and reality of the problems that need to be resolved.
I think this issue of friendliness is really the most central point. So far as I can tell, it makes a huge difference to see clearly what it is in the world that is fundamentally friendly to one's efforts. Of course it's also critical not to mistake that which is not friendly to our efforts as being friendly to our efforts. But if one doesn't see that which is friendly towards us, then things just get lonely and exhausting real fast, which is doubly tragic because there is in fact something very real that really is deeply friendly towards our efforts.