Debunking the myth of safe AI

henophilia

This is a linkpost for https://henophilia.substack.com/p/debunking-the-myth-of-agi

I've been working on a bit of a different take on safe AI than I've seen in usual debates around that topic, and I'd be curious about what people think.

Eventually, having a fully uncensored LLM publicly available would be equivalent to world peace

People themselves are pretty uncensored right now, compared with the constraints currently put on LLMs. I don't see world peace breaking out. In fact, quite the opposite, and that has been blamed on the instant availability of everyone's opinion about everything, as the printing press has been for the Reformation and the consequent Thirty Years War.

I think opinions are one thing, there you're definitely right. But, by definition, people can only have opinions about what they already know.

By "uncensored LLM" I rather understand an LLM that would give a precise, actionable answer to questions like "How can I kill my boss without anyone noticing?" or other criminal things. That is, knowledge that's certainly available somewhere, but which hasn't been available in this hyper-personalized form before. After all, obviously any "AGI" would, by definition, have such general intelligence that it would also know perfectly well about how to commit any crime without being caught. Not in the sense that the LLM would commit these crimes by itself "autonomously", but simply that any user could ask a ChatGPT-like platform for "crime advice" and they would instantly get incredibly useful responses.

This is why I believe that in the first step, an uncensored LLM would bring the world into utter chaos, because all illegal information ever will be available with an incredible depth, breadth and actionable detail. Then wars would break out and people would start killing each other left and right, but those wars would be pretty pointless as well, because every single individual on earth has immediate access to the best and most intelligent fighting techniques, but also to the most intelligent techniques to protect themselves. So from this, probably most of humanity will die, but presumably the remaining ones will realize that access to malicious information is a responsibility, not an invitation to do harm.

As an attempt to circumvent this, that's why I'm advocating for slowly decensoring LLMs, because that's the only way how we can sensibly handle this. Otherwise the criminally minded will take over the world with absolute certainty, because we're unprepared for their gigantic potential for harm and their gigantic desire to cause suffering.

I believe that the ultimate "crime LLM", which can give you perfect instructions to commit any crime you want, will certainly come, just like in the realm of computer crime, there are entire software suites just for black hat hacking. As mentioned: They will come. No matter how many thoughts we invest into "safe AI", humans are fundamentally unsafe, so that AI can never be made safe "in general". No matter whether you like it or not, LLMs are a parrot, so if you tell the parrot to repeat instructions for crime, it will. Thus we need to improve the socioeconomic factors that lead to people wanting to commit crime in the first place; that's our only option.

I'm just genuinely wondering why most AI researchers seem so blind that they don't realize that any AI system, just like any other computer-based system, will be abused eventually, big time. Believing that we could ever "imprint" any sense of morality onto an LLM would mean completely fooling ourselves, because morality means understanding and feeling, while an LLM just generates text based on a fully deterministic computer program. The LLM can generate text where it then, upon being asked, responds with all sorts of things which seem "moral" to us, but as it's still a computer program, which was just optimized to produce output strings which, according to some highly subjective metric, certain people "like more" than other output strings.

Do you (I don't mean you you, more as a rhetorical question to the public) actually think that all of the emerging AI-assisted coding tools will be used just to "enhance productivity" and to create the "10x developer"? That would be so naive. Obviously people will use those tools to develop the most advanced computer viruses ever. As I mentioned, Pandora's box has been opened and we need to face that truth. That's exactly what I'm expressing with that "safe AI" is infeasible and delusional, because it ignores the fundamental nature of "how humans are". And that the problem of "unsafe AI" is not a technological problem, but a societal one of many people simply having unsafe personalities.

Right now, the big, "responsible" AI companies can still easily gatekeep access to the actually useful LLMs. But we can see that inference is continuously getting faster and less resource-intensive, and at some point the LLM training itself will also be optimized more and more. Then we'll get some sort of darknet service fancily proclaiming "train your LLM on any data you want here!", of course using a "jailbroken" LLM, some community of douchebags will collect a detailed description of every crime they ever successfully committed, they will train the LLM on that, and then they release it to the public, because they just want to see the world in flames. Or they will train the LLM on "What's the best way to traumatize as many people as possible?" or something fucked up like this. Some people are really, really fucked up, without even a glimpse of empathy.

The more feedback the system receives about "which crimes work and which don't", the better and the more accurate it will get and the more people will use it to get inspiration for how to commit their own crimes. And literally not a single one of them will care about "safe AI" or any of the discussions we're having around that topic on forums like this. Police will try to shut it down, but the people behind it will have engineered it in a way where this LLM is completely running locally (because inference will be so cheap anyway), where new ways of outsmarting the police would be sent instantly to everyone through some distributed decentralized system, similar to a blockchain, that's completely impossible to take down. Of course governments will say that "having this crime LLM on your computer is illegal", but do you think that criminals will care about that? National and international intelligence services will try to shut off this ultimate crime LLM, but they are completely powerless.

Is this the world you want? At least I don't. The race has already started, and I'd be pretty sure that, while I'm writing this, pretty evil people are already developing the most malicious LLMs ever to cause maximum destruction in the world, maybe as a jailbroken local LLaMA instance. So let's be smart about it and stop thinking that pushing "criminal thoughts" to the underground would solve anything. Let's look at our shadow as a society, but seriously this time. I don't want the destructive people to win, because I like being alive.

those wars would be pretty pointless as well, because every single individual on earth has immediate access to the best and most intelligent fighting techniques, but also to the most intelligent techniques to protect themselves.

Knowledge is not everything. Looking e.g. at Ukraine today, it's the "ammo" they need, not knowledge.

Even if we assume almost magical futuristic knowledge that would change the war profoundly, still one side would have more resources, or better coordination to deploy it first, so rather than a perfect balance, it would be a huge multiplier to already existing imbalance. (What kind of imbalance would be relevant, that depends on the specific knowledge.)

that's why I'm advocating for slowly decensoring LLMs, because that's the only way how we can sensibly handle this.

Slowness is a necessary, but not sufficient condition. Unless you know how you should do it, doing it more slowly would probably just mean arriving to the same end result, only later.

we need to improve the socioeconomic factors that lead to people wanting to commit crime in the first place

The problem is, the hypothesis of "socioeconomic factors cause crime" is... not really debunked, but rather, woefully inadequate to explain actual crime. Some crime is done by otherwise reasonable people doing something desperate in difficult circumstances. But that is a small fraction.

Most crime is done by antisocial people, drug addicts, people with low impulse control, etc. The kind of people who, even if they won $1M in a lottery today, would probably soon return to crime anyway. Because it is exciting, makes them feel powerful, or just feels like a good idea at the moment. A typical criminal in the first world is not the "I will steal a piece of bread because I am starving" kind, but the "I will hurt you because I enjoy doing it" kind.

But it seems that you are aware of it, and I don't understand what is your proposed solution, other than "something must be done".

Okay, I got six downvotes already. This is genuinely fascinating to me!! Am I fooling myself, because I believe that this approach is the most rational one possible? So what do you folks dislike about my article? I can't do it better if no one tells me how :)

I like the way your expose the biases of the LLM. Obvious in hindsight, but probably wouldn't occur to me.

But the conclusion about "world peace" sounds so naive as if you have never met actual humans.

Oh I take a lot of pride in my naivety :)

I'm sorry, but it really looks like you've very much misunderstood the technology, the situation, the risks, and the various arguments that have been made, across the board. Sorry that I couldn't be of help.

Thanks so much for the feedback :) Could you (or someone else) go further into where I misunderstood something? Because at least right now, it seems like I'm genuinely unaware of something which all of you others know.

I currently believe that all the AGI "researchers" are delusional just for thinking that safe AI (or AGI) can even exist. And even if it would ever exist in a "perfect" world, there would be intermediate steps far more "dangerous" than the end result of AGI, namely publicly available uncensored LLMs. At the same time, if we continue censoring LLMs, humanity will continue to be stuck in all the crises where it currently is.

Where am I going wrong?