Counterargument: People will be able to cause significant destruction far before they are able to cause the end of the world, and if people start using powerful AI to do significant destruction for the lulz then that will motivate a lot of lockdown on AI access.
Yeah, that is part of the silver lining, I should have been clearer. That we will have a chance to iterate over issues like that, including a potential lockdown before superintelligence is inevitable.
I've referred to the possibility of significant destruction in this context as the first 'Chernobyl-type event' involving AI, it will greatly inform policy and legislation regarding AI tech. We'd have to make predictions about what form the disaster takes to discuss the efficacy of such legislation or lockdown on access.
It could be, for example, that many self driving cars malfunction all at once and cause a lot of damage and grief. This first scenario would probably lead to specific policies but little if any broad oversight. Another example: the AI disaster could be psychological (see virtual romance in China) or economic in nature causing much suffering over a long period that goes unnoticed for a time.
If it's the latter scenario I can see strong political lines forming over AI safety with pro- and anti- tech/lulz supporters. The prospect of humanity's destruction, then, is at least partially dependant on our ability to govern ourselves. So I can't blame the alignment community for focusing more on the technical aspects of alignment, as difficult as they are, instead of the social aspects. The social aspects may be easier, all things considered, but are emotionally exhausting which is why so many are firmly resigned to doom.
I don't think this is the foremost existential risk because I think it's reasonable to believe that:
if aligned AGIs (or anything able to destroy the world) exist, the most powerful ones will still be under control of governments or large companies. Even if the model itself was leaked, anyone with a lot of resources would have an advantage by holding a lot more compute than the average Joe;
if these aligned AGIs were not so superhuman and transformative to instantaneously destroy the world (which they shouldn't be, or we'd already be in some kind of singularity anyway), then holding more of them with more compute should constitute a significant defensive advantage.
Yes, in general offence is easier than defence and destruction easier than creation. But what precisely would this hypothetical terrorist or nihilistic madman do to be so unstoppable that nothing else - not even all the other AGIs - stand a chance to counter it? Bioweapons can be fought, contained, countered; even more so if you have lots of artificial smarts on your side. Any attempt at FOOMing should be detectable by sharper intellects, and anyway, if AIs could FOOM that way, the bigger ones likely would have already (for good or bad). Pretty simple measures can be taken to protect things like control over nuclear weapons, and again, aligned government AGIs would be on the front line of defence against any attempts at hacking or such, so the individual AGI would still find itself outgunned.
So yeah, I think this really falls into the "we must be really stupid and drop the ball to go extinct to this sort of mishap" category. That said, people would still be able to do a lot of damage and I don't like what that would do to our society as a whole. Instead of school shooters we'd have the occasional guy who managed to turn a city into nanites before being stopped, or some such insanity. You'd soon have everyone asking for total AGI surveillance to stop that sort of thing, and goodbye freedom and privacy. But I wouldn't expect extinction from it.
The agent software (as it matures, collecting all the tricks from all the papers) makes it more likely that the first agents capable of autonomous survival are at barely human level and still incapable of doing open-ended research (because this way autonomous survival wouldn't need to be overdetermined by having way-more-than-sufficient capabilities in the underlying LLM). Then some uncontrolled barely-AGIs go on to live on the Internet without being an extraordinary threat, perhaps for years while the labs are still working on the research capabilities, perhaps even using APIs rather than local models to think. And people get used to that.
Getting clear, impossible to ignore warning shots first would be a good thing on net, even if unpleasant in the moment. Unless you're suggesting that simple(non-AGI) AI tools are going to be civilization-threatening - but I'm not seeing it and you didn't argue it.
Right, that is the silver lining. Whether it is enough to counterbalance people actively trying to set the world on fire, I am doubtful.
I suppose the depends a lot on how hard anyone is trying to cause mischief, and how much easier it's going to get to do anything of consequence. 4-chan is probably a good prototype of your typical troll "in it for the lulz", and while they regularly go past what most would call harmless fun, there's not a body count.
The other thing people worry about (and the news has apparently decided is the thing we all need to be afraid of this month...) is conventional bad actors using new tools to do substantially whatever they were trying to do before, but more; mostly confuse, defraud, spread propaganda, what have you. I'm kind of surprised I don't already have an inbox full of LLM composed phishing emails... On some level it's a threat, but it's also not a particularly hard one to grasp, it's getting lots of attention, and new weapons and tactics are a constant in conflicts of all types.
I'm still of the mind that directly harmful applications like the above are going to pale next to the economic disruption and social unrest that's going to come from making large parts of the workforce redundant very quickly. Talking specific policy doesn't look like it's going to be in the Overton window until after AI starts replacing jobs at scale, and the "we'll have decades to figure it out" theory hasn't been looking good of late. And when that conversation starts it's going to suck all the air out of the room and leave little mainstream attention for worrying about AGI.
This also may happen with biological viruses and other GMO organisms – and can partly explain the proliferation of gain-of-function research, which is mostly useless, but dangerous. We like to play God and like the end of the world. (But with biological virus there is a limitation: biohackers will expect that new printed smallpox virus will kill them and their family first. But they can start with personal vaccines.)
And this already happened in1980s when people started to create computer viruses for fun.
This seems like the most obvious short term scenario that will occur. We have doomsday cults right now today.
Counterpoint , the once a century pandemic happened before now. So we can make vaccines much faster thsn ever thought possible but given the...material...timeline and all the factors for virility and debility / lethality at play with bioweapons i'm not sure thats much comfort.
It seems like the kind of thing where we'll almost assuredly be reacting to such an event vs whatever guardrails can be put in place.
Counterargument: you can just defend against these AIs running amuck.
As long as most AIs are systematically trying to further human goals you don't obviously get doomed (though the situation is scary).
There could be offense-defense inbalances, but there are also 'tyranny of the majority' advantages.
That's not the point though. Humans don't want to defend, they want to press the big red button and will gain-of-function an AI to make the button bigger and redder.
Yes, sorry, some definitely will. But if you look at what is going on now, people are pushing in all kinds of dangerous directions with reckless abandon, even knowing logically that it might be a bad idea.
Rather than figure out what each of those means exactly, I'll say "I don't expect the psychological forces pushing towards research and release of more capabilities faster, to actually resist building the sort of tools that'd be useful for defending against AI."
As someone who's been pinning his hopes on a 'survivable disaster' to wake people up to the dangers, this is good news.
I doubt anything capable of destroying the world will come along significantly sooner than superintelligent AGI, and a world in which there are disasters due to AI feels like a world that is much more likely to survive compared to a world in which the whirling razorblades are invisible.
EDIT: "no fire alarm for AGI." Oh I beg to differ, Mr. Yudkowsky. I beg to differ.
Seeing this frantic race from random people to give GPT-4 dangerous tools and walking-around-money, I agree: the risk is massively exacerbated by giving the "parent" AI's to humans.
Upon reflection, should that be surprising? Are humans "aligned" how we would want AI to be aligned? If so, we must acknowledge the fact that humanity regularly produces serial killers and terrorists (etc). Doesn't seem ideal. How much more aligned can we expect a technology we produce, vs our own species?
If we view the birth of AGI as the birth of a new kind of child, to me, there really is no regime known to humanity that will guarantee that child will not grow up to become an evil monster: we've been struggling with that question for millenia as humans. One thing we definitely have found is that super-evil parents are way better than average at producting super-evil children, but sometimes it seems like super-evil children just come into being, despite their parents. So a super-evil person controlling/training/scripting an AI to me is a huge risk factor, but so are the random factors that created super-evil humans despite good parents. So IMO the super-evil scammer/script kiddie/terrorist is the primary (but not only) risk factor when opening access to these new models.
I'm coming around to this argument that it's good right now that people are agent-ifying GPT-4 and letting it have root access, try to break CAPTCHAs, speak to any API etc, because that will be the canary in the coal mine -- I just hope that the canary in the coal mine will give us ample notice to get out of the mine!
Even if true, it has disturbing implications. Such as making the general population dumber a la Brave New World or absolutely controlled akin to 1984 would be highly beneficial.
Personally it doesn't seem likely since there will be many actors designing their own agents and presumably the bulk of them want to continue existing so there will more likely be competition among various designs and especially competition against designs perceived to be ominicidal.
Version 1 (adopted):
Thank you, shminux, for bringing up this important topic, and to all the other members of this forum for their contributions.
I hope that our discussions here will help raise awareness about the potential risks of AI and prevent any negative outcomes. It's crucial to recognize that the human brain's positivity bias may not always serve us well when it comes to handling powerful AI technologies.
Based on your comments, it seems like some AI projects could be perceived as potentially dangerous, similar to how snakes or spiders are instinctively seen as threats due to our primate nature. Perhaps, implementing warning systems or detection-behavior mechanisms in AI projects could be beneficial to ensure safety.
In addition to discussing risks, it's also important to focus on positive projects that can contribute to a better future for humanity. Are there any lesser-known projects, such as improved AI behavior systems or initiatives like ZeroGPT, that we should explore?
Furthermore, what can individuals do to increase the likelihood of positive outcomes for mankind? Should we consider creating closed island ecosystems with the best minds in AI, as Eliezer has suggested? If so, what would be the requirements and implications of such places, including the need for special legislation?
I'm eager to hear your thoughts and insights on these matters. Let's work together to strive for a future that benefits all of humanity. Thank you for your input!
Version 0:
Thank you shminux for this topic. And other gentlements for this forum!
I hope I will not died with AI in lulz manner after this comment) Human brain need to be positive. Without this it couldn't work well.
According to your text it looks like any OPEN AI projects buttons could look like SNAKE or SPIDER at least to warning user that there is something danger in it on gene level.
You already know many things about primate nature. So all you need is to use it to get what you want
We have last mind journeey of humankind brains to win GOOD future or take lost!
What other GOOD projects we could focus on?
What projects were already done but noone knows about them? Better AI detect-behaviour systems? ZeroGPT?
What people should do to make higher probability of good scenarios for mankind?
Should we make close island ecosystems with best minds in AI as Eliezar said on Bankless youtube video or not?
What are the requirements for such places? Because then we need to create special legislation for such semiindependant places. It's possible. But talking with goverments is a hard work. Do you REALLY need it? Or this is just emotional words of Eliezar.
Thank you for answers!
This seems untrue. For one thing, high-powered AI is in a lot more hands than nuclear weapons. For another, nukes are well-understood, and in a sense boring. They won’t provoke as strong of a “burn it down for the lolz” response as AI will.
Even experts like Yann LeCun often do not merely not understand the danger, they actively rationalize against understanding it. The risks are simply not understood or accepted outside of a very small number of people.
Remember the backlash around Sydney/Bing? Didn’t stop her creation. Also, the idea that governments are working in their nations’ interests does not survive looking at history, current policy or evolutionary psychology (think about what motivations will help a high-status tribesman pass on his genes. Ruling benevolently ain’t it.)
You think RLHF solves alignment? That’s an extremely interesting idea, but so far it looks like it Goodharts it instead. If you have ideas about how to fix that, by all means share them, but there is as yet no theoretical reason to think it isn’t Goodharting, while the frequent occurrence of jailbreaks on ChatGPT would seem to bear this out.
Maybe. The point of intelligence is that we don’t know what a smarter agent can do! There are certainly limits to the power of intelligence; even an infinitely powerful chess AI can’t beat you in one move, nor in two unless you set yourself up for Fool’s Mate. But we don’t want to make too many assumptions about what a smarter mind can come up with.
AI-powered robots without super intelligence are a separate question. An interesting one, but not a threat in the same way as superhuman AI is.
Ever seen an inner city? People are absolutely shooting each other for the lolz! It’s not everyone, but it’s not that rare either. And if the contention is that many people getting strong AI results in one of them destroying the world just for the hell of it, inner cities suggest very strongly that someone will.
I disagree with point 4; I wouldn't say that means "the alignment problem is solved" in any meaningful way, because:
what works with chatGPT will likely be much harder to get to working with smarter agents, and
RLHF doesn't "work" with chatGPT for the purposes of what's discussed here. If you can jailbreak it with something as simple as DAN, then it's a very thin barrier.
I agree with the rest of your points and don't think this would be an existential danger, but not because I trust these hypothetical systems to just say "no, bad human!" to anyone trying to get them to do something dangerous with a modicum of cleverness.
For example, I would be shocked if there aren't multiple serious groups working, in various levels of secrecy, on automated penetration of computer networks using all kinds of means, including but NOT limited to self-found zero-days. Building, and especially deploying, an attack agent is much easier than building or deploying the corresponding defensive systems. Not only will such capabilities probably be abused by those who develop them, but they could easily leak to others, even to the general public. Apocalypse? I don't think so. A lot of Very Bad Days for a lot of people? Very, very likely. And that's just one thing people are probably working on. ↩︎
I'm not arguing that grey goo is feasible, just pointing out that it's not like one actor choosing to build military robots keeps another actor from doing anything else. ↩︎
Before a detailed response. You appear to be disregarding my reasoning consistently without presenting a valid counterargument or making an attempt to comprehend my perspective. Even if you were to develop an AGI that aligns with your values, it would still be weaker than the AGI possessed by larger groups like governments. How do you debunk this claim? You seem to be afraid of even a single AGI in the wrong hands, why?
Regarding grey goo - I agree it might be a threat, but if you agree that AGI problem is redundant to the grey goo problem - like is someone build a tiny robot with AGI, and this tiny robot builds an army of tiny robots, and this army is building a larger army of even smaller AGIs robots, until they all become grey goo - yes this is interesting possibility. I would guess aligned grey goo, would somehow look more like a natural organism than something that consumes humans, as their alignment algorithm will probably propagate, and it's designed to protect humans and the nature, but on the other hand they need material to survive, so they will balance the two. Anyway superhuman gray goo, which is aligned although very interesting probability, as long as it's aligned and propagates its alignment to newer versions of itself, although they work faster they will not do something against their previous alignment. I would say that if the grey goo first robot was aligned then the whole grey goo will be aligned. But I believe they will stop somewhere and will be more like small ants trying to find resources, in a very competitive environment, rather than a goo, competing with other colonies for resources, and with target function to help humans.
And yes we have a GI for long time now, humanity is a GI. We saw the progress of technology, and how fast its accelerates, faster than any individual might conceive. Acceleration will very probably not reach infinity and will stop at some physical boundary, when most of the resources will be used. And humans could upload their minds and other sci-fi stuff to be part of this new reality. I mean the possibilities are endless in general. But we can decide to limit it as well, and keep it smarter than us for everything we need, but not smart enough so we don't understand it at all. I don't think we are there yet to make this specific decision, and for now - we can surely benefit from the current LLMs and those to come for developing new technologies, in many fields like medicine, software development, education, traffic safety, pollution, political decision making, courts and much more.
Forget complicated "sharp left turn" schemes, nefarious nanobots, lists of lethalities, out-of-distribution actions, failed AI boxing. As Zvi pointed out in multiple posts, like this one, if humans get unrestricted access to a powerful enough tool, it is all over. People will intentionally twist even the most aligned Tool AI into an Agent of Death, long before it becomes superintelligent and is able to resist. You can find examples of it online.
In that sense, Eliezer was wrong in the worst possible way: we have a lot less time to get our act together, because capabilities advance faster than intelligence and humans are very inventive at finding ways to misuse these capabilities. We will push these capabilities in the "gain of function" direction mercilessly and without regard for safety or anything else. We are worse than toddlers playing with matches. True, like with toddlers, our curiosity far outstrips our sense of self-preservation, probably because our brains are not wired to be afraid of something that is not like a snake or a spider or a steep cliff. But it is worse than that. People will try to do the worst thing imaginable because they do not "alieve" potential harm, even if they can track it logically, unlike a toddler.
I guess the silver lining is that we have a bit of time to iterate. The AI tools are not yet at the level of causing widespread destruction, and probably will not be for some time. It does not mean that if and when some superintelligence emerges we will be well prepared, but if humanity survives until then without self-annihilation, we might have a better chance, compared to the "one shot at getting it right" before we are all wiped out, as Eliezer emphasized. It might not be an "alignment manual from the surviving future", but at least some wisdom and discipline of avoiding the early pitfalls, and if we die, we might die with "more dignity". The odds are not great, but maybe they are there.
Edit: quanticle pointed out that Bostrom predicted it in the paper The Vulnerable World Hypothesis: