Richard Hollerith. 15 miles north of San Francisco. hruvulum@gmail.com
My probability that AI research will end all human life is .92. It went up drastically when Eliezer started going public with his pessimistic assessment in April 2022. Till then my confidence in MIRI (and knowing that MIRI has enough funding to employ many researchers) was keeping my probability down to about .4. (I am glad I found out about Eliezer's assessment.)
Currently I am willing to meet with almost anyone on the subject of AI extinction risk.
Last updated 26 Sep 2023.
An influential LW participant, Jim Miller, who I think is a professor of economics, has written here that divestment does little good because any reduction in the stock price caused by pulling the investments can be counteracted by profit-motivated actors. For publicly-traded stocks, there is a robust supply of of profit-motivated actors scanning for opportunities. I am eager for more discussion on this topic.
I am alarmed to see that I made a big mistake in my previous comment: where I wrote that "contributing more money to AI-safety charities has almost no positive effects", I should have written "contributing to technical alignment research has almost no positive effects". I have nothing bad to say about contributing money to groups addressing the AI threat in other ways, e.g., by spreading the message that AI research is dangerous or lobbying governments to shut it down.
money generated by increases in AI stock could be used to invest in efforts into AI safety, which receives comparably less money
In the present situation, contributing more money to AI-safety charities has almost no positive effects and does almost nothing to make AI "progress" less dangerous. (In fact, in my estimation, the overall effect of all funding for alignment research so far has make the situation a little worse, by publishing insights that will tend to be usable by capability researchers without making non-negligible progress towards an eventual practical alignment solution.)
If you disagree with me, then please name the charity you believe can convert donations into an actual decrease in p(doom) or say something about how a charity would spend money to decrease the probability of disaster.
Just to be perfectly clear: I support the principle, which I believe has always been operative on LW, that people who are optimistic about AI or who are invested in AI companies are welcome to post on LW.
So let's do it first, before the evil guys do it, but let's do it well from the start!
The trouble is no one knows how to do it well. No one knows how to keep an AI aligned as the AI's capabilities start exceeding human capabilities, and if you believe experts like Eliezer and Connor Leahy, it is very unlikely that anyone is going to figure it out before the lack of this knowledge causes human extinction or something equally dire.
It is only a slight exaggeration to say that the only thing keeping the current crop of AI systems from killing us all (or killing most of us and freezing some of us in case we end up having some use in the future) is simply that no AI or coalition of AIs so far is capable of doing it.
Actually there is a good way to do it: shut down all AI research till humanity figures out alignment, which will probably require waiting for a generation of humans significantly smarter than the current generation, which in turn will require probably at least a few centuries.
The ruling coalition can disincentivize the development of a semiconductor supply chain outside the territories it controls by selling world-wide semiconductors that use "verified boot" technology to make it really hard to use the semiconductor to run AI workloads similar to how it is really hard even for the best jailbreakers to jailbreak a modern iPhone.
Out of curiosity, would you agree with this being the most plausible path, even if you disagree with the rest of my argument?
The most plausible story I can imagine quickly right now is the US and China fight a war and the US wins and uses some of the political capital from that win to slow down the AI project, perhaps through control over the world's leading-edge semiconductor fabs plus pressuring Beijing to ban teaching and publishing about deep learning (to go with a ban on the same things in the West). I believe that basically all the leading-edge fabs in existence or that will be built in the next 10 years are in the countries the US has a lot of influence over or in China. Another story: the technology for "measuring loyalty in humans" gets really good fast, giving the first group to adopt the technology so great an advantage that over a few years the group gets control over the territories where all the world's leading-edge fabs and most of the trained AI researchers are.
I want to remind people of the context of this conversation: I'm trying to persuade people to refrain from actions that on expectation make human extinction arrive a little quicker because most of our (sadly slim) hope for survival IMHO flows from possibilities other than our solving (super-)alignment in time.
People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
If I believed that, then maybe I'd believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I'm not saying scaling and hardware improvement have not been important: I'm saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of "A Fast Learning Algorithm for Deep Belief Nets", basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper "Attention is all you need".
In contrast, we've seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012's AlexNet or (emphatically) the 2017 paper "Attention is all you need". In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like "attention is all you need" is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn't help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
But what's even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.
This assumes that alignment success is the mostly likely avenue to safety for humankind whereas like I said, I consider other avenues more likely. Actually there needs to be a qualifier on that: I consider other avenues more likely than the alignment project's succeeding while the current generation of AI researchers remain free to push capabilities: if the AI capabilities juggernaut could be stopped for 150 years, giving the human population time to get smarter and wiser, then alignment is likely (say p = .7) to succeed in my estimation. I am informed by Eliezer in his latest interview that such a success would probably use some technology other than deep learning to create the AI's capabilities; i.e., deep learning is particularly hard to align.
Central to my thinking is my belief that alignment is just a significantly harder problem than the problem of creating an AI capable of killing us all. Does any of the reasoning you do in your section "the comparision" change if you started believing that alignment is much much harder than creating a superhuman (unaligned) AI?
It will probably come as no great surprise that I am unmoved by the arguments I have seen (including your argument) that Anthropic is so much better than OpenAI that it helps the global situation for me to support Anthropic (if it were up to me, both would be shut down today if I couldn't delegate the decision to someone else and if I had to decide now with the result that there is no time for me to gather more information) but I'm not very certain and would pay attention to future arguments for supporting Anthropic or some other lab.
Thanks for engaging with my comments.
we're probably doomed in that case anyways, even without increasing alignment research.
I believe we're probably doomed anyways.
I think even you would agree what P(1) > P(2)
Sorry to disappoint you, but I do not agree.
Although I don't consider it quite impossible that we will figure out alignment, most of my hope for our survival is in other things, such as a group taking over the world and then using their power to ban AI research. (Note that that is in direct contradiction to your final sentence.) So for example, if Putin or Xi were dictator of the world, my guess is that there is a good chance he would choose to ban all AI research. Why? It has unpredictable consequences. We Westerners (particularly Americans) are comfortable with drastic change, even if that change has drastic unpredictable effects on society; non-Westerners are much more skeptical: there have been too many invasions, revolutions and peasant rebellions that have killed millions in their countries. I tend to think that the main reason Xi supports China's AI industry is to prevent the US and the West from superseding China and if that consideration were removed (because for example he had gained dictatorial control over the whole world) he'd choose to just shut it down (and he wouldn't feel that need to have a very strong argument for that shutting it down like Western decision-makers would: non-Western leader shut important things down all the time or at least they would if the governments they led had the funding and the administrative capacity to do so).
Of course Xi's acquiring dictatorial control over the whole world is extremely unlikely, but the magnitude of the technological changes and societal changes that are coming will tend to present opportunities for certain coalitions to gain and to keep enough power to shut AI research down worldwide. (Having power in all countries hosting leading-edge fabs is probably enough.) I don't think this ruling coalition necessarily need to believe that AI presents a potent risk of human extinction for them to choose to shut it down.
I am aware that some reading this will react to "some coalition manages to gain power over the whole world" even more negatively than to "AI research causes the extinction of the entire human race". I guess my response is that I needed an example of a process that could save us and that would feel plausible -- i.e., something that might actually happen. I hasten add that there might be other processes that save us that don't elicit such a negative reaction -- including processes the nature of which we cannot even currently imagine.
I'm very skeptical of any intervention that reduces the amount of time we have left in the hopes that this AI juggernaut is not really as potent a threat to us as it currently appears. I was much much less skeptical of alignment research 20 years ago, but since then a research organization has been exploring the solution space and the leader of that organization (Nate Soares) and its most senior researcher (Eliezer) are reporting that the alignment project is almost completely hopeless. Yes, this organization (MIRI) is kind of small, but it has been funded well enough to keep about a dozen top-notch researchers on the payroll and it has been competently led. Also, for research efforts like this, how many years the team had to work on the problem is more important than the size of the team, and 22 years is a pretty long time to end up with almost no progress other than some initial insights (around the orthogonality thesis, the fragility of value, convergent instrumental values, CEV as a solution to if the problem were solvable by the current generation of human beings.
OK, if I'm being fair and balanced, then I have to concede that it was probably only in 2006 (when Eliezer figured out how to write a long intellectually-dense blog post every day) or even only in 2008 (when Anna Salamon join the organization -- she was very good at recruiting and had a lot of energy to travel and to meet people) that Eliezer's research organization could start to pick and choose among a broad pool of very talented people, but still between 2008 and now is 17 years, which again is a long time for a strong team to fail to make even a decent fraction of the progress humanity would seem to need to make on the alignment problem if in fact the alignment problem is solvable by spending more money on it. It does not appear to me to be the sort of problem than can be solved with 1 or 2 additional insights; it seems a lot more like the kind of problem where insight 1 is needed, but before any mere human can find insight 1, all the researchers need to have already known insight 2, and to have any hope of finding insight 2, they all would have had to know insight 3, and so on.
AI safety spending is only $0.1 billion while AI capabilities spending is $200 billion. A company which adds a comparable amount of effort on both AI alignment and AI capabilities should speed up the former more than the latter
There is very little hope IMHO in increasing spending on technical AI alignment because (as far as we can tell based on how slow progress has been on it over the last 22 years) it is a much thornier problem than AI capability research and because most people doing AI alignment research don't have a viable story about how they are going to stop any insights / progress they achieve from helping with AI capability research. I mean, if you have a specific plan that avoids these problems, then let's hear it, I am all ears, but advocacy in general of increasing work on technical alignment is counterproductive IMHO.
The trouble with the choice of phrase "hyperintelligent machine sociopath" is that it gives the other side of the argument and easy rebuttal, namely, "But that's not what we are trying to do: we're not trying to create a sociopath". In contrast, if the accusation is that (many of) the AI labs are trying to create a machine smarter than people, then the other side cannot truthfully use the same easy rebuttal. Then our side can continue with, "and they don't have a plan for how to control this machine, at least not any plan that stands up to scrutiny". The phrase "unaligned superintelligence" is an extremely condensed version of the argument I just outlined (where the verb "control" has been replaced with "align" to head off the objection that control would not even be desirable because people are not wise enough and not ethical enough to be given control over something so powerful).