Machines should sound robotic. It's that simple.
Any attempt, vocal or otherwise, to make people anthromorphize them, whether consciously or unconsciously, is unethical. It should be met with social scorn and ostracism. Insofar as it can be unambiguously identified, it should be illegal. And that has everything to do with not trusting them.
Voices and faces are major anthromorphization vehicles and should get especially strict scrutiny.
The reason's actually pretty simple and has nothing to do with "doomer" issues.
When a human views something as another human, the real human is built to treat it like one. That is an inbuilt tendency that humans can't necessarily change, even if they delude themselves that they can. Having that tendency works because being an actual human is a package. The tendency to trust other humans is coevolved with the tendency for most humans not to be psychopaths. The ways in which humans distrust other humans are tuned to other humans' actual capacities for deception and betrayal... and to the limitations of those capacities.
"AI", on the other hand, is easily built to be (essentially) psychopathic... and is probably that way by default. It has a very different package of deceptive capabilities that can throw off human defenses. And it's a commercial product created, and often deployed, by commercial institutions that also tend to be psychopathic. It will serve those institutions' interests no matter how perfectly it convinces people otherwise... and if doesn't, that's a bug that will get fixed.
An AI set up to sell people something will sell it to them no matter how bad it is for them. An AI set up to weasel information out of people and use it to their detriment will do that. An AI set up to "incept" or amplify this or that belief will do it, to the best of its ability, whether it's true or false. An AI set up to swindle people will swindle them without mercy, regardless of circumstances.
And those things don't have hard boundaries, and trying to enforce norms against those things-in-themselves has always had limited effect. Mainstream corporations routinely try to do those things to obscene levels, and the groupthink inside those corporations often convinces them that it's not wrong... which another thing AI could be good at.
Given the rate of moral corrosion at the "labs", I give it about two or three years before they're selling stealth manipulation by LLMs as an "advertising" service. Five years if it's made illegal, because they'll have to find a plausibly deniable way to characterize it. The LLMs need to not be good at it.
Don't say "please" to LLMs, either.
Persuasive AI voices might just make all voices less persuasive. Modern life is full of these fake super stimulants anyway.
Strangely enough, in the past OpenAI seemed to agree that LLMs should behave like unemotional chatbots. When Bing Chat first had its limited invite-only release, it used quite emotional language and could even be steered into engaging in flirts and arguments, while making heavy use of emojis. This was later turned down, though not completely. In contrast, ChatGPT always maintained a professional tone. Unlike Bing/Copilot, it still doesn't use emojis. So I am unsure why OpenAI decided to give GPT-4o such a "flirty" voice, as this is basically the same as using emojis.
So I am unsure why OpenAI decided to give GPT-4o such a "flirty" voice, as this is basically the same as using emojis.
Maybe, the more safety-minded people who advocated a more conservative approach lost power within OpenAI.
Okay, I find it weird that your supposed vector of attack is "Humans find some voices more persuasive than other voices" and not "AI can copy voices of everyone you trust". For what I know you can imitate almost everyone with just a few minutes of audio.
I don't believe in a clear distinction between interactions that are attacks and interactions that are not attacks. When a politician asks people to vote for him so that he can have power, he's not engaging in "attacks" but if he wins he still gets power. Some of the things a politician says are more manipulative than others.
I think that AGI's are more likely able to justify to themselves by engaging in power-seeking behavior that's about simply being very persuasive then to justify misleading people by faking voices, so I'm more worried about them wielding power in ways that are easier to rationalize.
In previous discussions of AI risks, the ability for an AI to be very persuasive is often seen as one possible risk. Humans find some voices more persuasive than other voices.
If we can trust Scarlett Johansson's description of her interactions with OpenAI, OpenAI wanted to use her voice, to increase how much users trust OpenAI's model. Trusting a model more, likely means that the model is more persuasive.
AI companies could also multivar-test slight variations of their voices to maximize user engagement which would also likely push the voices in the direction of being more persuasive.
Zvi recently argued that it's fine for OpenAI to provide their users with maximally compelling voices if the user want those voices, without getting pushback for it.
Are we as a community not worried anymore about the persuasive power of AI's? As being someone who is not working directly in AI safety myself, why does this aspect seem underexplored by AI safety researchers?