Has anyone seen MI7? I guess Tom is not the most popular guy in this forum but the storyline of a rogue AI as presented (within the limits of a mission impossible block buster) sounds not only plausible but also a great story to bring awareness to crowds about the dangers. It talks about the inability of governments to stop it (although obviously it will be stopped in the upcoming movie) and also their eagerness to control it to rule over the world while the AI just wants to bring chaos (or does it have an ultimate goal?) and also how some humans will be aligned with and obey it regardless if it takes them to their own doom too. Thoughts?
What is the duration of P(doom)?
What do people mean by that metric? What is x-risk for the century? Forever? For the next 10 years? Until we figured out AGI or after AGI on the road to superintelligence?
To me it's fundamentally different because P(doom) forever must be much higher than doom over the next 10-20 years. Or is it implied that if we survive the next period means only that we figured out alignment eternally for all the next generation AIs? It's confusing.
It does seem likely to me that a large fraction of all "doom from unaligned AGI" comes relatively soon after the first AGI that is better at improving AGI than humans are. I tend to think of it as a question having multiple bundles of scenarios:
Only in scenario 4 do you see a steady increase in P(doom) over long time spans, and even that bundle of timelines probably converges fairly rapidly toward timelines in which no AGI ever exists for some reason or other.
This is why I think it's meaningful to ask for P(doom) without a specified time span. If we somehow found out that scenario 4 was actually true, then it might be worth asking in more detail about time scales.
I think this is an important equivocation (direct alignment vs. transitive alignment). If first AGIs such as LLMs turn out to be aligned at least in the sense of keeping humanity safe, that by itself doesn't exempt them from the reach of Moloch. The reason alignment is hard is that it might take longer to figure out than developing misaligned AGIs. This doesn't automatically stop applying when the researchers are themselves aligned AGIs. While AGI-assisted (or more likely, AGI-led) alignment research is faster than human-led alignment research, so is AGI capability research.
Thus it's possible that P(first AGIs are misaligned) is low, that is first AGIs are directly aligned, while P(doom) is still high, if first AGIs fail to protect themselves (and by extension humanity) from future misaligned AGIs they develop (they are not transitively aligned, same as most humans), because they failed to establish strong coordination norms required to prevent deployment of dangerous misaligned AGIs anywhere in the world.
At the same time, this is not about the timespan, because as soon as first AGIs develop nanotech, they are going to operate on many orders of magnitude more custom hardware that's going to increase both serial speed and scale of available computation to the point where everything related to settling into an alignment security equilibrium is going to happen within a very short span of physical time. It might take first AGIs a couple of years to get there (if they manage to restrain themselves and not build a misaligned AGI even earlier), but then in a few weeks it's all going to get settled, one way or the other.
I think it's an all-of-time metric over a variable with expected decay baked into the dynamics. a windowing function on the probability might make sense to discuss; there are some solid P(doom) queries on manifold markets, for example.
After Hinton's and Bengio's articles that I consider a moment in history, I struggle to understand how most people in tech dismiss them. If Einstein wrote an article about the dangers of nuclear weapons in 1939 you wouldn't have people saying "nah, I don't understand how such a powerful explosion can happen" without a physics background. Hacker News is supposed to be *the place for developers, startups and such and you can see comments that despare me. The comments go from "alarmism is boring" to "I have programmed MySQL databases and I know tech and this can't happen". Should I update my view on the intelligence and biases of humans right now I wonder much.