I've been thinking about what implicit model of the world I use to make plans that reduce x-risk from AI. I list four main gears below (with quotes to illustrate), and then discuss concrete heuristics I take from it.
A model of AI x-risk in four parts
1. Alignment is hard.
Quoting "Security Mindset and the Logistic Success Curve" (link)
Coral: YES. Given that this is a novel project entering new territory, expect it to take at least two years more time, or 50% more development time—whichever is less—compared to a security-incautious project that otherwise has identical tools, insights, people, and resources. And that is a very, very optimistic lower bound.
Amber: This story seems to be heading in a worrying direction.
Coral: Well, I'm sorry, but creating robust systems takes longer than creating non-robust systems even in cases where it would be really, extraordinarily bad if creating robust systems took longer than creating non-robust systems.
2. Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.
Quoting "The Hidden Complexity of Wishes" (link)
There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.
[...]
There is no safe wish smaller than an entire human morality. There are too many possible paths through Time. You can't visualize all the roads that lead to the destination you give the genie... any more than you can program a chess-playing machine by hardcoding a move for every possible board position.
And real life is far more complicated than chess. You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes. Especially if you wish for something longer-term or wider-range than rescuing your mother from a burning building.
3. Our current epistemic state regarding AGI timelines will continue until we're close (<2 years from) to having AGI.
Quoting "There is No Fire Alarm for AGI" (link)
It's not that whenever somebody says "fifty years" the thing always happens in two years. It's that this confident prediction of things being far away corresponds to an epistemic state about the technology that feels the same way internally until you are very very close to the big development. It's the epistemic state of "Well, I don't see how to do the thing" and sometimes you say that fifty years off from the big development, and sometimes you say it two years away, and sometimes you say it while the Wright Flyer is flying somewhere out of your sight.
[...]
So far as I can presently estimate, now that we've had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.
By saying we're probably going to be in roughly this epistemic state until almost the end, I don't mean to say we know that AGI is imminent, or that there won't be important new breakthroughs in AI in the intervening time. I mean that it's hard to guess how many further insights are needed for AGI, or how long it will take to reach those insights. After the next breakthrough, we still won't know how many more breakthroughs are needed, leaving us in pretty much the same epistemic state as before. Whatever discoveries and milestones come next, it will probably continue to be hard to guess how many further insights are needed, and timelines will continue to be similarly murky.
4. Given timeline uncertainty, it's best to spend marginal effort on plans that assume / work in shorter timelines.
Stated simply: If you don't know when AGI is coming, you should make sure alignment gets solved in worlds where AGI comes soon.
Quoting "Allocating Risk-Mitigation Across Time" (link)
Suppose we are also unsure about when we may need the problem solved by. In scenarios where the solution is needed earlier, there is less time for us to collectively work on a solution, so there is less work on the problem than in scenarios where the solution is needed later. Given the diminishing returns on work, that means that a marginal unit of work has a bigger expected value in the case where the solution is needed earlier. This should update us towards working to address the early scenarios more than would be justified by looking purely at their impact and likelihood.
[...]
There are two major factors which seem to push towards preferring more work which focuses on scenarios where AI comes soon. The first is nearsightedness: we simply have a better idea of what will be useful in these scenarios. The second is diminishing marginal returns: the expected effect of an extra year of work on a problem tends to decline when it is being added to a larger total. And because there is a much larger time horizon in which to solve it (and in a wealthier world), the problem of AI safety when AI comes later may receive many times as much work as the problem of AI safety for AI that comes soon. On the other hand one more factor preferring work on scenarios where AI comes later is the ability to pursue more leveraged strategies which eschew object-level work today in favour of generating (hopefully) more object-level work later.
The above is a slightly misrepresentative quote; the paper is largely undecided as to whether shorter term strategies or longer term strategies are more valuable (given uncertainty over timelines), and recommends a portfolio approach (running multiple strategies, that each apply to different timelines). Nonetheless when reading it I did update toward short-term strategies as being especially neglected, both by myself and the x-risk community at large.
Concrete implications
Informed by the model above, here are heuristics I use for making plans.
- Solve alignment! Aaargh! Solve it! Solve it now!
- I nearly forgot to say it explicitly, but it's the most important: if you have a clear avenue to do good work on alignment, or field-building in alignment, do it.
- Find ways to contribute to intellectual progress on alignment
- I think that intellectual progress is very tractable.
- A central example of a small project I'd love to see more people attempt, is people writing up (in their own words) analyses and summaries of core disagreements in alignment research.
- e.g. Jessica Taylor's two posts on motivations behind MIRI's research agenda and the Paul-MIRI disagreement.
- A broader category of things that can be done to push discourse forward can be found in this talk Oliver and I have given in the past, about how to write good comments on LessWrong.
- It seems to me that people I talk to think earning-to-give is easy and doable, but pushing forward intellectual progress (especially on alignment) is impossible, or at least only 'geniuses' can do it. I disagree; there is a lot of low hanging fruit.
- Build infrastructure for the alignment research community
- The Berkeley Existential Risk Initiative (BERI) is a great example of this - many orgs (FHI, CHAI, etc) have ridiculous university constraints upon their actions, and so one of BERI's goals is to help them outsource this (to BERI) and remove the bureaucratic mess. This is ridiculously helpful. (FYI they're hiring.)
- I personally have been chatting recently with various alignment researchers about what online infrastructure could be helpful, and have found surprisingly good opportunities to improve things (will write up more on this in a future post).
- What other infrastructure could you build for better communication between key researchers?
- Avoid/reduce direct government involvement (in the long run)
- It's important that those running AGI projects are capable of understanding the alignment problem and why it's necessary to solve alignment before implementing an AGI. There's a better chance of this when the person running the project has a strong technical understanding of how AI works.
- A government-run AI project is analogous to a tech company with non-technical founders. Sure, the founders can employ a CTO, but then you have Paul Graham's design problem - how are they supposed to figure out who a good CTO is? They don't know what to test for. They will likely just pick whoever comes with the strongest recommendation, and given their info channels that will probably just be whoever has the most status.
- Focus on technical solutions to x-risk rather than political or societal
- I have an impression that humanity has a better track record of finding technical than political/social solutions to problems, and this means we should focus even more on things like alignment.
- As one datapoint, fields like computer science, engineering and mathematics seem to make a lot more progress than ones like macroeconomics, political theory, and international relations. If you can frame something as either a math problem or a political problem, do the former.
- I don't have something strong to back this up with, so will do some research/reading.
- Avoid things that (because they're social) are fun to argue about
- For example, ethics is a very sexy subject that can easily attract public outrage and attention while not in fact being useful (cf. bioethics). If we expect alignment to not be solved, the question of "whose values do we get to put into the AI?" is an enticing distraction.
- Another candidate for a sexy subject that is basically a distraction, is discussion of the high status people in AI e.g. "Did you hear what Elon Musk said to Demis Hassabis?" Too many of my late-night conversations fall into patterns like this, and I actively push back against it (both in myself and others).
- This recommendation is a negative one ("Don't do this"). If you have any ideas for positive things to do instead, please write them down. What norms/TAPs push away from social distractions?
I wrote this post to make explicit some of the thinking that goes into my plans. While the heuristics are informed by the model, they likely hide other assumptions that I didn’t notice.
To folks who have tended to agree with my object level suggestions, I expect you to have a sense of having read obvious things, stated explicitly. To everyone else, I’d love to read about the core models that inform your views on AI, and I’d encourage you to read more on those of mine that are new to you.
My thanks and appreciation to Jacob Lagerros for help editing.
[Edit: On 01/26/18, I made slight edits to this post body and title. It used to say there were four models in part I, and instead now says that part I lists four parts of a single model. Some of the comments were a response to the original, and thus may read a little funny.]
One consideration that points against this is that focusing on technical solutions will make you only think about technical problems, but if you don't also look at the societal problems, you might not realize that your proposed technical solution is unworkable due to a societal problem.
One good example is Oracle AI. People have debated the question of whether we could use a pure question-answering or "tool" AI as a way to create safe agent AI. There has been a bunch of discussion about the technical challenge of creating it, where the objections have typically focused on something like "you can't box in a superintelligent AI that wants to escape", and then sought to define ways to make the AI want to stay in the box.
But this neglects the fact that even if you manage to build an AI that wants to stay in the box, this is useless if there are others who have reasons to let their AI out of the box. (My paper "Disjunctive Scenarios of Catastrophic AI Risk" goes into detail about the various reasons that would cause people to let their AI out, in section 5.2.) Solving the technical problem of keeping the AI contained does nothing for the societal problem of making people want to keep their AIs contained.
Similarly, Seth Baum has pointed out that the challenge of creating beneficial AI is a social challenge because it seeks to motivate AI developers to choose beneficial AI designs. This is the general form of the specific example I gave above: it's not enough to create an aligned technical design, one also needs to get people to implement your aligned designs.
Of course, you can try to just be the first one to build an aligned superintelligence that takes over the world... but that's super-risky for obvious reasons, such as the fact that it involves a race to be the first one to build the superintelligence, meaning that you don't have the time to make the superintelligence safely aligned. To avoid that, you'll want to try to avoid arms races... which is again a societal problem.
In order to have a good understanding of what would work for solving the AI problem, you need to have an understanding of the whole problem, and the societal dimension represents a big part of the problem. I'm not saying that you couldn't still focus primarily on the technical aspects - after all, a single person can only do as much and we all need to specialize - but you should keep in mind what kinds of technical solutions look feasible given the societal landscape, and properly understanding the nature of the societal landscape requires spending some effort on also thinking about the societal problems and their possible solutions.
That's a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don't want "we don't see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage" to filter into public discourse: it pattern-matches too well to "trust us, you need to let us run the universe".