I guess you could start with watching Yudkowsky at TED, that's just 10 minutes.
If you have more time, there is a book Superintelligence: Paths, Dangers, Strategies. It was written 10 years ago, that is before the LLMs, so there will be nothing specific to this. But the general concerns about building an intelligence smarter than humans are mostly timeless.
*
Talking to wider audience, it is important to realize that even the words "AI Safety" mean something utterly different in LW community, and in the mainstream discourse. The mainstream debate is focused on censoring the output of chatbots, so that they don't say something politically incorrect (or maybe with current administration it will be making sure that they never say something that might be considered politically correct?), and making sure that the picture generators don't create porn; plus there is a concern about people losing their jobs (today the artists, tomorrow who knows); and teachers complain about kids giving all their homework to chatbots.
From the LW perspective, these all are relatively trivialities. A more important concern is that we are potentially creating a new species, smarter than us. That is an unprecedented situation... for our species. But as an analogy we could see how life changed for the chimpanzees after the humans appeared. Some of them survive, but only because we think they are cute. The average chimpanzee is stronger than the average human, but this fact is not helpful for them at all; all that matters is that we are smarter. Now we are going to create a species that will be smarter than us.
(To avoid a possible misunderstanding, this is not about ChatGPT. The chatbots of today are stupid. They are miracles of technology -- if you could travel in time to year 2000 and let people there talk to a chatbot, especially if they could literally just talk to an artificial head, they would be in a deep shock. And they have vast knowledge. But compared to humans, they still keep making trivial errors, so any complicated planning is probably out of their capabilities. Our concern is that some new technological breakthrough.... or maybe just adding a few orders of magnitude to what already exists? no one knows... could create a machine that is actually smart, possibly smarter than humans. Even if it is only a little smarter, given the technological progress that probably means we will get a lot smarter machines a few years later.)
Likely objections:
(Did I miss something?)
I will go through the list briefly. I have already addresses the chatbots.
The Less Wrong community is mostly atheist. (You might disagree with that, but then... what? Are we going to rehash a debate that started millenia ago and is still ongoing?) The argument about God is basically "what if the problem magically disappears"? The answer is that problems usually don't magically disappear. Even if you are religious, you need to admit that God allowed holocaust, nuclear weapons, cancer, wars, malaria, et cetera.... so what makes you think that this will be the moment when He draws the line?
Things don't need to have souls in order to be dangerous. A stupid virus can kill you.
There can be more than one serious problem in the world. If something kills all humans, it will kill rich and poor alike, men and women, black and white, etc.
Even if it is difficult to imagine something fundamentally smarter than a human, it is probably easy to imagine something that is just as smart as a very smart human, only 1000 times faster. Now imagine an army of thousands of such virtual people, with various skills and talents, all living in the same machine. Does it still feel like the machine is harmless, and that it definitely couldn't outsmart a human guard or two?
Computers don't have hands? Yeah, unless they are connected to robotic bodies, in which case obviously they have. Even without a robotic body, there are things that a hypothetical intelligent computer can do, if it is connected to the internet, such as send e-mails, communicate on websites, and try to hack other computers online. Using the metaphor of the thousand highly intelligent virtual people who are 1000 times faster than humans, they probably could find a way how to make some money (hack someone's online banking? trade cryptocurrency? sell their services online, pretending to be e.g. human artists? make a scam gofundme campaign? write a blog for paying subscribers? all of that and much more at the same time?) and then they could simply buy some robotic bodies, or pay someone to build them (if you offer money to many people in different countries, someone will say yes). They could also hack into existing factories, military drones, etc. They could build alternative computing centers (or maybe figure out how to exist distributed to small piece across millions of hacked computers on the internet), in which case dropping a bomb on the original computing center would no longer solve the problem.
How could a superintelligence defeat the entire humanity? A simple solution could an engineered deadly plague (or dozen different engineered plagues all released at the same time) released simultaneously on many places across the planet. Then the hacked military drones would get rid of the survivors, and keep disrupting any major attempts to cure the plagues.
But going into an open confrontation is probably not the smartest way. A superintelligence could also play humans against each other, such as contact all dictators across the planet and offer them military support in return for building a few data centers in their countries. It could play America against China, Democrats against Republicans, Microsoft against Google, etc. Create many situations like "I know that allying myself with the AI is dangerous, but if I refuse and my opponent accepts, then I lose". (Even if most of them would reject the deal, it will work as long as some of them accept.)
If the superintelligence does not confront humanity, and only offers useful services, it will get enormous control over the world anyway, because in order to provide those services, it will need an access to various things. So if the superintelligence appears super friendly, ten years later it will probably have access to most people's computers and smartphones, most factories; it will navigate most cars and airplanes. At that moment, humanity will no longer be able to oppose it. And probably not even want to, because most of the news will be produced by the AI. (And most of the resistance movements against AI will be honeypots prepared by the AI.)
Okay, but why would the AI want to do something bad? Aren't we just projecting human psychology to the machines? Couldn't we simply build machines as tools, without their own agency? My car's navigation is not trying to murder me.
This is probably the most difficult part to explain. First, the AI could simply kill us as a result of some bug in program. (Bugs in programming happen all the time.) Like, the AI is trying to do maximum good and suddenly... oops... because of a bug in the code, a sign flips from plus to minus, and now the AI is trying to do maximum bad, and five minutes later billions of humans are dead (because at that time the AI already controls the cars and airplanes, and can smash them into buildings) and the rest will die soon afterwards (because the AI can also blow up all the power plants and factories across the planet). Second, the bad actions could happen as unintended side effects of the good actions we told the AI to do. Like in the "paperclip maximizer" scenario. The idea is that when humans want something, there are usually millions of conditions so obvious that no one speaks about them ("make me a sandwich... without destroying the universe in the process"), so we probably won't mention them to the computer either, but the computer may just take the shortest path to fulfill our wishes, no matter what. We could try teaching the AI what we want, but the problem is that we would only teach it the typical scenarios, and there are possible highly unusual scenarios that the AI could encounter or create, so the training may not be helpful there.
We could try to keep the machine under control, but when it becomes so smart that it can predict our own actions, then... Like, imagine that the AI needs to make a decision whether something is good or bad. And it knows that it needs to ask the humans first. But it also knows that if it composes the question using these specific words, humans will almost certainly say "obviously, yes", and if it composes the question using some other specific words, humans will almost certainly say "obviously, no", and from the AI perspective, both formulations are the same thing. Now what? Ultimately, it will be the machine that makes the decision.
...etc., we have debated these things for over a decade, so there is a lot of it. I hope this helped somehow.
If you have specific questions, feel free to ask (or search in the existing materials). Or you could make a draft of an article, post a link here, and ask for comments.
Thank you for this very informative comment!
I am familiar with Yudkowsky's Ted Talk and understand a couple of the AI arguments now.
Do you know any other people who might be willing to do an interview on camera? I'm not sure if AI Researchers are more open people or shy, and if it's better to contact lab's or individuals themselves. This is going to be a video documentary.
I'll probably come back with a draft later, but the points you broke down is a great starting place.
Again, I couldn't have asked for a better comment. Thank you, again.
-P
Hello Lesswrong community,
I am a journalism student doing my capstone documentary on AI alignment. However, this is a topic that I want to make sure is done well. The last thing I would want is to confuse or mislead anyone.
That said, I have a few questions:
Who would be the best people to reach out to for interview? What would be the best foundational topics to help viewers understand the bigger picture? And is there any ethical or technical concerns I should be wary of when communicating this topic to people with little to no understanding of AI?
I appreciate your time!