Toby Ord recently published a nice piece On the Value of Advancing Progress about mathematical projections of far-future outcomes given different rates of progress and risk levels. The problem with that and many arguments for caution is that people usually barely care about possibilities even twenty years out.
We could talk about sharp discounting curves in decision-making studies, and how that makes sense given evolutionary pressures in tribal environments. But I think this is pretty obvious from talking to people and watching our political and economic practices.
Utilitarianism is a nicely self-consistent value system. Utilitarianism pretty clearly implies longtermism. Most people don't care that much about logical consistency,[1] so they are happily non-utilitarian and non-longtermist in a variety of ways. Many arguments for AGI safety are longtermist, or at least long-term, so they're not going to work well for most of humanity.
This is a fairly obvious, but worth-keeping-in-mind point.
One non-obvious lemma of this observation is that much skepticism about AGI x-risk is probably based on skepticism about AGI happening soon. This doesn't explain all skepticism, but it's a significant factor worth addressing. When people dig into their logic, that's often a central point. They start out saying "AGI wouldn't kill humans" then over the course of a conversation it turns out that they feel that way primarily because they don't think real AGI will happen in their lifetimes. Any discussion of AGI x-risks isn't productive, because they just don't care about it.
The obvious counterpoint is "You're pretty sure it won't happen soon? I didn't know you were an expert in AI or cognition!" Please don't say this - nothing convinces your opponents to cling to their positions beyond all logic like calling them stupid.[2] Something like "well, a lot of people with the most relevant expertise think it will happen pretty soon. A bunch more think it will take longer. So I just assume I don't know which is right, and it might very well happen pretty soon".
It looks to me like discussing whether AGI might threaten humans is pretty pointless if the person is still assuming it's not going to happen for a long time. Once you're past that, it might make sense to actually talk about why you think AGI would be risky for humans.[3]
- ^
This is an aside, but you'll probably find that utilitarianism isn't that much more logical than other value systems anyway. Preferring what your brain wants you to prefer, while avoiding drastic inconsistency, has practical advantages over values that are more consistent but that clash with your felt emotions. So let's not assume humanity isn't utilitarian just because it's stupid.
- ^
Making sure any discussions you have about x-risk are pleasant for all involved is probably actually the most important strategy. I strongly suspect that personal affinity weighs more heavily than logic on average, even for fairly intellectual people. (Rationalists are a special case; I think we're resistant but not immune to motivated reasoning).
So making a few points in a pleasant way, then moving on to other topics they like is probably way better than making the perfect logical argument while even slightly irritating them.
- ^
From there you might be having the actual discussion on why AGI might threaten humans. Here are some things I've seen be convincing.
People seem to often think "okay fine it might happen soon, but surely AI smarter than us still won't have free will and make its own goals". From there you could point out that it needs goals to be useful, and if it misunderstands those goals even slightly, it might be bad. Russell's "you can't fetch the coffee if you're dead" is my favorite intuitive explanation of instrumental convergence creating unexpected consequences. This requires explaining that we wouldn't screw it up in quite such an obvious way, but the metaphor goes pretty deep into more subtle complexities of goals and logic.
The other big points, in my observation, are "people screw up complex projects a lot, especially on the first try" and "you'd probably think it was dangerous if advanced aliens were landing, right?". One final intuitive point to make is that even if they do always correctly follow human instructions, some human will accidentally or deliberately give them very bad instructions.
It seems better to ask what would people do if they had more tangible options, such that they could reach a reflective equilibrium which explicitly endorses particular tradeoffs. People mostly pick not caring about possibilities twenty years out due to not seeing how their options constrain what happens in twenty years. This points to not treating their surface preferences as central insofar as they are not following from a reflective equilibrium with knowledge about all their available options. If one knows their principal can't get that opportunity, one has a responsibility to still act on what their principal's preferences would point to given more of the context.
They would care more about logical consistency if they knew more about its implications.
If we're asking people to imagine a big empty future full of vague possibility, it's not surprising that they're ambivalent about long-termism. Describe an actual hard-for-humans-to-conceive-of-in-the-first-place utopia and how it conditions on their coordinacy, show them the joy and depth of each life which follows, the way things like going on an adventure were taken to a transcendent level, and the preferences they already had will plausibly lead them to adopt a more long-termist stance. On the surface, people care as a function of distance from how tangible the options are.
The problem is demonstrating that good outcomes are gated by what we do, and that those good outcomes are actually really good in a way hard for modern humans to conceive.
Yeah, I'm in both camps. We should do our absolute best to slow down how quickly we approach building agents, and one way is leveraging AI that doesn't rely on being agentic. It offers us a way to do something like global compute monitoring and could possibly also alleviate short-term incentives satisfiable by building agents, by offering a safer avenue. Insofar as a global moratorium stopping all large model research is feasible, we s... (read more)