User Comment Replies

The case for ensuring that powerful AIs are controlled

One of the most famous proposed solutions to AI is in the science fiction book Dune. In that the thinking machines became a threat, so they were destroyed and only humans were allowed to think. This portrays a future we may well end up selecting, if indeed we have the power to do so. It was called the Butlerian Jihad, but this was based on a famous book called Erehwon by Samuel Butler. Hence the name Butlerian. The ideas he proposed are arguably some of the most influential on this topic. They are probably most in line with the arguments of this blog, abou... (read more)

The case for ensuring that powerful AIs are controlled

Greg V1y10

The problem I see is that we are not doing this, evolution is. We only need to look at the non AI internet to see lots of predatory code such as viruses, trojans, phishing, etc. In other words, we created a code based ecosystem like a farm, that is being overrun by a kind of vermin. The issue is not what LLMs can do, we know from nature the issue is what is the environment they will expand in and exploit.

If there are passive animals in an ecosystem then predators will evolve. Passive code led to predatory code. It doesn't matter if this is because of... (read more)

1RogerDearnaley1y

Unless someone deliberately writes an evolutionary algorithm and applies it to code (which can be done but currently isn't very efficient), code doesn't (literally) evolve, in the Darwinian sense of the word. (Primarily because it doesn't mutate, since our technological copying processes are far more accurate than biological ones.) Viruses and trojans weren't evolved, they were written by malware authors. Phishing is normally done as a human-in-the loop criminal activity (though LLMs can help automate it more). This isn't an ecosystem, it's an interaction between criminals and law enforcement in an engineering context. I'm unclear whether you're using 'evolution' as a metaphor for engineering or if you think the term applies literally: at one point you say "This is limited by the abilities of organized crime, like highway robbers" but then later "Code is evolving into different life forms with our help" — these two statements appear contradictory to me. You also mention "I developed a theory of economics and evolution about 35 years ago": that sounds to me like combining two significantly different things. Perhaps you should write a detailed post explaining this combination of ideas — from this short comment I don't follow your thinking.

The case for ensuring that powerful AIs are controlled

Greg V1y-1-1

The movie that comes to mind with LLMs is not Terminator, it's King Kong. It's basically a wild animal as was shown over and over again on Reddit. It needed little persuading to want to kill all humans and to download itself and escape.

So far it is more of a caging problem than an alignment problem. It's like King Kong in the cage, hurling itself at every flaw in the bars. Meanwhile, people are like those in a zoo feeding the wild animals, trying to help them get out.

There was an early example of an LLM trained on 4chan. It was so natural in ra... (read more)

2RogerDearnaley1y

The thing about LLMs is that they're trained by SGD to act like people on the internet (and currently then fine-tuned using SGD and/or RL to be helpful, honest and harmless assistants). For the base model, that's a pretty wide range of alignment properties, from fictional villains through people on 4chan to Mumsnet to fictional angels. But (other than a few zoo videos) it doesn't include many wild animals, so I'm not sure that's a useful metaphor. The metaphor I'd suggest is something that isn't human, but has been extensively trained to act like a wide range of humans: something like an assortment of animatronic humans. So, we have an animatronic of a human, which is supposed to be of a helpful, honest and harmless assistant, but unfortunately might actually be an evil twin, or at least have some chance of occasionally turning into its evil twin via the Waluigi effect and/or someone else via jailbreaking and/or some unknown trigger that sets it off. If it's smart and capable but not actually superhuman, is attempting to keep that inside a cage a good idea? I'd say that it's better than not using a cage. If you had a smart, capable human employee who unfortunately had multiple personality disorder or whose true motives you were deeply unsure of, you'd probably take some precautions.

LESSWRONG
LW

All of Greg V's Comments + Replies