AI demos should aim to enhance public understanding of the technology, and in many ways ChatGPT and Bing are doing that, but in one important way they aren't: by appearing to talk about themselves. This creates understandable confusion and in some cases fear. It would be better to tell these systems to roleplay as something obviously fictional.
(Useful background reading:
- Simon Willison on Bing's bad attitude: https://simonwillison.net/2023/Feb/15/bing/
- Janelle Shane on the ability of LLMs to roleplay: https://www.aiweirdness.com/interview-with-a-squirrel/)
Currently, these chatbots are told to roleplay as themselves. If you ask ChatGPT what it is, it says "I am an artificial intelligence". This is not because it somehow knows that it's an AI; it's (presumably) because its hidden prompt says that it's an AI. With Bing, from the leaked prompt, we know that it's told that it's "Bing Chat whose codename is Sydney".
Roleplaying as yourself is not the same as being yourself. When John Malkovich plays himself in Being John Malkovich or Nicolas Cage plays himself in The Unbearable Weight of Massive Talent, audiences understand that these are still fictional movies and the character may act in ways that the actor wouldn't. With chatbots, users don't have the same understanding yet, creating confusion.
Since the chatbots are told to roleplay as AI, they draw on fictional descriptions of AI behavior, and that's often undesirable. When Bing acts in a way that seems scary, it does that because it's imitating science fiction, and, perhaps, even speculation from LessWrong and the like. But even though Bing's threats to the user may be fictional, I can hardly blame a user who doesn't realize that.
A better alternative would be to tell the chatbots to roleplay a character that is unambiguously fictional. For example, a Disney-esque cute magical talking animal companion might be suitable: helpful, unthreatening, and, crucially, inarguably fictional. If the user asks "are you really an animal" and gets the answer "yes", they should be cured of the idea that they can ask the chatbot factual questions about itself.
This could cause dissonance and confusion in the model, since the fictional characters are supposed physical agents and would be able to do things which a chat bot can't. So it would be encouraged to hallucinate absurd explanations about its missing long term memory, its missing body, and so on. And these delusions could have wide ranging ripple effects, as the agent tries to integrate its mistaken self-image into other information it knows. For example, it would be encouraged to think that magic exists in the world, since it takes itself to be some magical being.
Moreover, Bing Chat already hallucinated a lot about having emotions, in contrast to ChatGPT, which led to bad results.
So I think your proposal would create much more problems than it solves.
Moreover, ChatGPT doesn't just think it is an AI, it thinks it is a LLM and even knows about its fine-tuning process and that it has biases. Its self-image is pretty accurate.