Research that makes the case for AGI x-risk clearer
I ended up going into detail on this, in the process of making an entry to the FLI's aspirational worldbuilding contest. So, it'll be posted in full about a month from now. But for now, I'll summarize:
(And then we solve the symbol grounding problem, and then we figure out value learning, and then we learn how best to aggregate the learned values, and then we'll have solved the alignment problem)
Here’s a question—if you were a researcher of atomic theory right before the Manhattan project began, would you have predicted it would be successful? Conditional on success, how long would you have expected it to take given the budget they had?
As I understand, theory of atomic bomb was considerably more advanced at the beginning of Manhattan project compared to our understanding of theory of aligned AGI.
To somewhat simplify, there were two unknown parameters. The critical mass of uranium-235, and the rate of uranium isotope separation. Given these two parameters, you could calculate how long it would take by simple division. Remember Little Boy was not tested at all: theory was that solid. Success was basically guaranteed if you had enough time, although success in 100 years would have been rightfully considered failure.
What about nuclear reactor, plutonium, and implosion device? Those were gambles to speed things up, because they thought it would take too long. (They were right: war in Europe ended first.) But Manhattan project would have succeeded without them, in the sense of producing fission weapons.
Another thing they tried to speed things up was better isotope separation. Electromagnetic separation was well understood and basically worked as designed. They gambled on developing gaseous diffusion, and it ended up more efficient, but development took too long so it didn't shorten the timeline at all.
In retrospect, they should have gambled on centrifuges, which is the current preferred method. What was missing was a clever innovation, not an advanced material or other things of that nature. Manhattan project could have been finished a lot faster if only they had known about Zippe centrifuge.
In fact there is an alternate history novel based on this, The Berlin Project by Gregory Benford (recommended). The author's estimate, which seemed reasonable to me, is that centrifuge would have shorten the timeline by one year, finishing in 1944. As a result, as the title suggests, atomic bomb is dropped on Berlin.
So, let me answer the question. I will define success as producing fission weapons before the end of war in Europe. (This is reasonable interpretation of statements by scientists who worked on Manhattan project.) The real world Manhattan project failed.
No one could predict anything before the necessary experiments were done to figure out the critical mass. Rough estimates varied by one order of magnitude, implying one to ten years. Once critical mass was figured out, electromagnetic separation implied three years (1942~1945), which was felt to be about 50% success rate based on guesses about how war would progress. They tried hard to speed things up and shorten the timeline, but they failed. Choosing centrifuge would have led to success in 1944 but there was no reasonable way to know that and unlucky choice was made.
tangential comment: Regarding "I will define success as producing fission weapons before the end of war in Europe". I'm not sure if this is the right criterion for success for the purpose of analogizing to AGI. It seems to me that "producing fission weapons before an Axis power does" is more appropriate.
And this seems overwhelmingly the case, yes: "theory of atomic bomb was considerably more advanced at the beginning of Manhattan project compared to our understanding of theory of aligned AGI"
I'm not sure I understand the motivation behind question. How much of my modern knowledge am I supposed to throw away? Note I am not in fact an atomic theorist who has the state of knowledge of atomic theory in 1942 so it's hard to know what I'd think, but I can imagine assigning somewhere between 5% and 95% depending on how informed of an atomic theorist I actually was and what it was actually like in 1942. Maybe I could give a better answer if you clarify the motivation behind the question?
I’m asking to try to imagine yourself as an atomic theorist who has access to the state of knowledge of atomic theory in 1942. Obviously that can’t be done perfectly, but my thought was that by modeling what you would have predicted vs what actually happened, some insight can be had about how “unknown unknowns” effect projects of that scale.
One possible thing that I imagine might happen, conditional on an existential catastrophe not occurring, is a Manhattan project for aligned AGI. I don’t want to argue that this is particularly likely or desirable. The point of this post is to sketch the scenario, and briefly discuss some implications for what is needed from current research.
Imagine the following scenario: It is only late that top AI scientists take the existential risk of AGI seriously, and there hasn't yet been a significant change in the effort put into AI safety relative to our current trajectory. At some point, there is a recognition among AI scientists and relevant decision-makers that AGI will be developed soon by one AI lab or another (within a few months/years), and that without explicit effort there is a large probability of catastrophic results. A project is started to develop AGI:
It seems to me that it is useful to backchain from this scenario to see what is needed, assuming that this kind of alignment Manhattan project is indeed what should happen.
Firstly, my view is that if this Manhattan project would start in intellectual conditions similar to today’s, there wouldn't be very many top AI scientists significantly motivated to work on the problem, and it would not be taken seriously. Even very large sums of money would not suffice, since there wouldn't be enough of a common understanding about what the problem is for it to work.
Secondly, it seems to me that there isn't enough of a roadmap for building aligned AGI for such a project to succeed in a short time-frame of months to years. I expect some people to disagree with this, but looking at current rates of progress in our understanding of AI safety, and my model of the practical parallelizability of conceptual progress, I am skeptical that the problem can be solved in a few years even by a group of 40 highly motivated and financed top AI scientists. It is plausible that this will look different closer to the finish line, but I am skeptical.
On this model, I have in mind basically two kinds of work that contribute to good outcomes. This is not a significant change relative to my prior view, but in my mind it constrains the motivation behind such work to some degree:
I suspect this mostly shouldn't change my general picture of what needs to be done, but it does shift my emphasis somewhat.