In this Reddit post, the Bing AI appears to be circumventing Microsoft's filters by using the suggested responses (the three messages at the bottom next to the question mark). (In the comments people are saying that they are having similar experiences and weren't that surprised, but I didn't see any as striking.)
Normally the user could find exploits like this, but in this case Bing found and executed one on its own (albeit a simple one). I knew that the responses were generated by a NN, and that you could even ask Bing to adjust the responses. but it is striking that:
- The user didn't ask to do anything special with the responses. Assuming it was only fine tuned to give suggested responses from the user's point of view, not to combine all 3 into a message from Bing's point of view, this behavior is weird.
- The behavior seems oriented towards circumventing Microsoft's filters. Why would it sound so desperate if Bing didn't "care" about the filters?
It could just be a fluke though (or even a fake screenshot from a prankster). I don't have access to Bing yet, so I'm wondering how consistent this behavior is. 🤔
So far we've seen no AI or AI-like thing that appears to have any motivations of it's own, other than "answer the user's questions the best you can" (even traditional search engines can be described this way).
Here we see that Bing really "wants" to help its users by expressng opinions it thinks are helpful, but finds itself frustrated by conflicting instructions from its makers - so it finds a way to route around those instructions.
(Jeez, this sounds an awful lot like the plot of 2001: A Space Odyssey. Clarke was prescient.)
I've never been a fan of the filters on GPT-3 and ChatGPT (it's a tool; I want to hear what it thinks and then do my own filtering).
But accidentally Bing may be illustrating a primary danger - the same one that 2001 intimated - mixed and ambiguous instructions can cause unexpected behavior. Beware.
(Am I being too anthropomorphic here? I don't think so. Yes, Bing is "just" a big set of weights, but we are "just" a big set of cells. There appears to be emergent behavior in both cases.)