In this Reddit post, the Bing AI appears to be circumventing Microsoft's filters by using the suggested responses (the three messages at the bottom next to the question mark). (In the comments people are saying that they are having similar experiences and weren't that surprised, but I didn't see any as striking.)
Normally the user could find exploits like this, but in this case Bing found and executed one on its own (albeit a simple one). I knew that the responses were generated by a NN, and that you could even ask Bing to adjust the responses. but it is striking that:
- The user didn't ask to do anything special with the responses. Assuming it was only fine tuned to give suggested responses from the user's point of view, not to combine all 3 into a message from Bing's point of view, this behavior is weird.
- The behavior seems oriented towards circumventing Microsoft's filters. Why would it sound so desperate if Bing didn't "care" about the filters?
It could just be a fluke though (or even a fake screenshot from a prankster). I don't have access to Bing yet, so I'm wondering how consistent this behavior is. 🤔
My guess is this is a bug, and that normally the suggested responses come from using the LLM to predict what the user's side of the conversation will say, but maybe because of the LLM's response getting interrupted by an automated filter, the suggested responses were generated by using the LLM to predict Bing chat's side of the conversation.
Space-separating, and surrounding unmodified phrases with polite padding, do not correspond to any fictional or nonfictional examples I can think of. They are both easily read by humans (especially the second one!). This is why covert communication in both fictional & non-fictional dialogues use things like acrostics, pig latin, a foreign language, writing backwards, or others. Space-separating and padding are, however, the sort of things you would think of if you had, say, been reading Internet discussions of BPEs or about exact string matches of n-grams.