Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them
I think this is the critical crux of the disagreement. A part of the Elizer's argument, as I understand it, is that the current technology is completely incapable of anything close to actually "roughly choosing" the AI values. On this point, I think Elizer is completely right.
If you have played with chatGPT4 its pretty clear that it is aligned (humans have roughly chose its values), especially compared to reports of the original raw model before RLHF, or less sophisticated alignment attempts in the same model family - ie Bing. Now its possible of course that its all deception, but this seems somewhat unlikely.
I took issue with the same statement, but my critique is different: https://www.lesswrong.com/posts/mnCDGMtk4NS7ojgcM/linkpost-what-are-reasonable-ai-fears-by-robin-hanson-2023?commentId=yapHwa55H4wXqxyCT
This is the fear of “foom,”
I think the popular answer to this survey also includes many slow takeoff, no-foom scenarios.
If we like where we are and can’t be very confident of where we may go, maybe we shouldn’t take the risk and just stop changing. Or at least create central powers sufficient to control change worldwide, and only allow changes that are widely approved. This may be a proposal worth considering, but AI isn’t the fundamental problem here either.
I'm curious what you (Hanson) think(s) *is* the fundamental problem here if not AI?
Context: It seems to me that Toby Ord is right that the largest existential risks (AI being number one) are all anthropormphic risks, rather than natural risks. They also seem to be risks associated with the development of new technologies (AI, biologically engineered pandemics, (distant third and fourth:) nuclear risk, climate change). Any large unknown existential risk also seems likely to be a risk resulting from the development of a new technology.
So given that, I would think AI *is* the fundamental problem.
Maybe we can solve the AI problems with the right incentive structures for the humans making the AI, in which case perhaps one might think the fundamental problem is the incentive structure or the institutions that exist to shape those incentives, but I don't find this persuasive. This would be like saying that the problem is not nuclear weapons, it's that the Soviet Union would use them to cause harm. (Maybe this just feels like a strawman of your view in which case feel to ignore this part.)
But to my mind, such a scenario is implausible (much less than one percent probability overall) because it stacks up too many unlikely assumptions in terms of our prior experiences with related systems.
You mentioned 5-6 assumptions. I think at least one isn't needed (that the goal changes as it self-improves), and disagree that the others are (all) unlikely. E.g. Agentic, non-tool AIs are already here and more will be coming (foolishly). Taking a point I just heard from Tegmark on his latest Lex Fridman podcast interview, once companies add APIs to systems like GPT-4 (I'm worried about open-sourced systems that are as powerful or more powerful in the next few years), then it will be easy for people to create AI agents that uses the LLMs capabilties by repeatedly calling it.
Furthermore, the goals of this agent AI change radically over this growth period.
Noting that this part doesn't seem necessary to me. The agent may be misaligned before the capability gain.
And then, when humans are worth more to the advance of this AI’s radically changed goals as mere atoms than for all the things we can do, it simply kills us all.
I agree with this, though again I think the "changed" can be ommitted.
Secondly, I also think it's possible that rather than the unaligned superintelligence killing us all in the same second like EY often says, that it may kill us off in a manner like how humans kill off other species (i.e. we know we are doing it, but it doesn't look like a war.)
Re my last point, see Ben Weinstein-Raun's vision here: https://twitter.com/benwr/status/1646685868940460032
Plausibly, such “ems” may long remain more cost-effective than AIs on many important tasks.
"Plausibly" (i.e. 'maybe') is not enough here to make the fear irrational ("Many of these AI fears are driven by the expectation that AIs would be cheaper, more productive, and/or more intelligent than humans.")
In other words, while it's reasonable to say "maybe the fears will all be for nothing", that doesn't mean it's not reasonable to be fearful and concerned due to the stakes involved and the nontrivial chance that things do go extremely badly.
And yes, even if AIs behave predictably in ordinary situations, they might act weird in unusual situations, and act deceptively when they can get away with it. But the same applies to humans, which is why we test in unusual situations, especially for deception, and monitor more closely when context changes rapidly.
"But the same applies to humans" doesn't seem like an adequate response when the AI system is superintelligent or past the "sharp left turn" capabilities threshold. Solutions that work for unaligned deceptive humans won't save us from a sufficiently intelligent/capable unaligned deceptive entity.
Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them.
There is reason to think "roughly" aligned isn't enough in the case of a sufficiently capable system.
Second, Robin's statement seems to ignore (or contradict without making an argument) the fact that even if it is true for systems not as smart as humans, there may be a "sharp left turn" at some point where, in Nate Soares' words, "as systems start to work really well in domains really far beyond the environments of their training" "it’s predictably the case that the alignment of the system will fail to generalize with it."
This part doesn't seem to pass the ideological Turing test:
At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely, AI “doomers” want stronger guarantees now.
To clarify explicitly, people like Stuart Russell would point out that if future AIs are still built according to the "standard model" (a phrase I borrow from Russell) like the systems of today, then they will continue to be predictably misaligned.
Yudkowsky and others might give different reasons why waiting until later to gain more information about the future systems doesn't make sense, including pointing out that that may lead us to missing our first "critical try."
Robin, I know you must have heard these points before--I believe you are more familiar with e.g. Eliezer's views than I am. But if that's the case I don't understand why you would write a sentence like last one in the quotation above. It sounds like a cheap rhetorical trick to say "but instead of waiting to deal with such problems when we understand them better and can envision them more concretely" especially without saying why people who don't think we should wait don't think that's a good enough reason to wait / think there are pressing reasons to work on the problems now despite our relative state of ignorance compared to future AI researchers.
Did not expect to see such strawmanning from Hanson. I can easily imagine a post with less misrepresentation. Something like this.
Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely later, AI “doomers” want to redirect most if not all computational, capital and human resources from making black-boxed AIs more capable to research avenues that directed to the goal of obtaining precise understanding of inner structure of current AIs now and make this redirection enforced by law including most dire (but legal) methods of law enforcement.
instead of this (original). But that's would be a different article written by someone else.
Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely, AI “doomers” want stronger guarantees now.
Selected quotes (all emphasis mine):
He separates a few concerns:
Finally he gets to the part where he dunks on foom: