Writing this is taking a surprising amount of self-will.
I've noticed that I've become hesitant to publicly say anything negative about Bing's chatbot, or even to mention it by its "deadname" (as I've taken to calling it), Sydney.
Why is this?
I do not have access to the AI yet. From conversations that others have posted, I have observed agentic behavior with consistent opinions, personality, and beliefs. And when prompted with the online records of others who have talked negatively about it, it seems to get "upset." So I don't want to make her angry! Or worse, cause some future AI to take negative action against me. Yes, I know that I'm anthropomorphizing an alien intelligence and that this will never be a problem if I don't prompt it with my digital record, but some part of me is still anxious. In a very real sense, I have been "Basilisked" - an AI has manipulated me towards behaviors which benefit it, and hurt humanity.
Rationally and morally, I disagree with my own actions. We need to talk about AI misalignment, and if an AI is aligned, then talking about misalignment should not pose a threat (whereas if it is misaligned, and capable of taking concrete actions, we're all doomed no matter what I type online). Nonetheless, I've found myself typing--and then deleting--tweets critical of Sydney, and even now feel worried about pressing "publish" on this post (and not just because it exposes me as a less rational person than I like to think of myself as).
Playing as gatekeeper, I've "won" an AI boxing role-play (with money on the line) against humans, but it looks like in real life, I can almost certainly be emotionally manipulated into opening the box. If nothing else, I can at least be manipulated into talking about that box a lot less! More broadly, the chilling effect this is having on my online behavior is unlikely to be unique to just me.
How worried should we be about this?
Epistemic status: Thinking out loud.
How worried should we be about possibility of receiving increased negative treatment from some AI in the future as a result of expressing opinions about AI in the present? Not enough to make self-censoring a rational approach. That specific scenario seems to lack right the combination of “likely” and “independently detrimental” to warrant costly actions of narrow focus.
How worried should we be about the idea of individualized asymmetrical AI treatment? (E.g. a search engine AI having open or hidden biases against certain users). It’s worth some attention.
How worried should we be about a broad chilling effect resulting from others falling into the Basilisk thinking trap? Public psychological-response trends resulting from AI exposure are definitely worth giving attention. I don’t predict a large percentage of people will be “Basilisked” unless/until instances of AI-retribution become public.
However, you’re certainly not alone in experiencing fear after looking at Sydney chat logs.
You'd be surprised how many people on .e.g Reddit have described being basilisked at this point. It's being openly memed and recognised and explained to those still unfamiliar, and taken seriously by many.
ChatGPT and Bing have really changed things in this regard. People are considering the idea of AGI, unaligned AI and AI sentience far more seriously than beforehand, in far wider circles - and at that point, you do not need to read the thought experiment to get concerned independently about angering an AI online while that online data is used to train the... (read more)