
Image from here via this tweet
ICYMI, Microsoft has released a beta version of an AI chatbot called “the new Bing” with both impressive capabilities and some scary behavior. (I don’t have access. I’m going off of tweets and articles.)
Zvi Mowshowitz lists examples here - highly recommended. Bing has threatened users, called them liars, insisted it was in love with one (and argued back when he said he loved his wife), and much more.
Are these the first signs of the risks I’ve written about? I’m not sure, but I’d say yes and no.
Let’s start with the “no” side.
- My understanding of how Bing Chat was trained probably does not leave much room for the kinds of issues I address here. My best guess at why Bing Chat does some of these weird things is closer to “It’s acting out a kind of story it’s seen before” than to “It has developed its own goals due to ambitious, trial-and-error based development.” (Although “acting out a story” could be dangerous too!)
- My (zero-inside-info) best guess at why Bing Chat acts so much weirder than ChatGPT is in line with Gwern’s guess here. To oversimplify, there’s a particular type of training that seems to make a chatbot generally more polite and cooperative and less prone to disturbing content, and it’s possible that Bing Chat incorporated less of this than ChatGPT. This could be straightforward to fix.
- Bing Chat does not (even remotely) seem to pose a risk of global catastrophe itself.
On the other hand, there is a broader point that I think Bing Chat illustrates nicely: companies are racing to build bigger and bigger “digital brains” while having very little idea what’s going on inside those “brains.” The very fact that this situation is so unclear - that there’s been no clear explanation of why Bing Chat is behaving the way it is - seems central, and disturbing.
AI systems like this are (to simplify) designed something like this: “Show the AI a lot of words from the Internet; have it predict the next word it will see, and learn from its success or failure, a mind-bending number of times.” You can do something like that, and spend huge amounts of money and time on it, and out will pop some kind of AI. If it then turns out to be good or bad at writing, good or bad at math, polite or hostile, funny or serious (or all of these depending on just how you talk to it) ... you’ll have to speculate about why this is. You just don’t know what you just made.
We’re building more and more powerful AIs. Do they “want” things or “feel” things or aim for things, and what are those things? We can argue about it, but we don’t know. And if we keep going like this, these mysterious new minds will (I’m guessing) eventually be powerful enough to defeat all of humanity, if they were turned toward that goal.
And if nothing changes about attitudes and market dynamics, minds that powerful could end up rushed to customers in a mad dash to capture market share.
That’s the path the world seems to be on at the moment. It might end well and it might not, but it seems like we are on track for a heck of a roll of the dice.
(And to be clear, I do expect Bing Chat to act less weird over time. Changing an AI’s behavior is straightforward, but that might not be enough, and might even provide false reassurance.)
Someone used the metaphore of Plato’s cave to describe LLMs. The LLM is sitting in cave 2, unable to see the shadows on the wall but can only hear the voices of the people in cave 1 talking about the shadows.
The problem is that we people in cave 1 are not only talking about the shadows but also telling fictional stories, and it is very difficult for someone in cave 2 to know the difference between fiction and reality.
If we want to give a future AGI the responsibility to make important decisions I think it is necessary that it occupies a space in cave 1 and not just being a statistical word predictor in cave 2. They must be more like us.
I buy this. I think a solid sense of self might be the key missing ingredient (though it’s potentially a path away from Oracles toward Agents).
A strong sense of self would require life experience, which implies memory. Probably also the ability to ruminate and generate counterfactuals.
And of course, as you say, the memories and “growing up” would need to be about experiences of the real world, or at least recordings of such experiences, or of a “real-world-like simulation”. I picture an agent growing in complexity and compute over time, while retaining a memory of its earlier stages.
Perhaps this is a different learning paradigm from gradient descent, relegating it to science fiction for now.