Crossposted at the Intelligent Agents Forum.
It should be noted that the colloquial "AI hacking a human" can mean three different things:
- The AI convinces/tricks/forces the human to do a specific action.
- The AI changes the values of the human to prefer certain outcomes.
- The AI completely overwhelms human independence, transforming them into a weak subagent of the AI.
Different levels of hacking make different systems vulnerable, and different levels of interaction make different types of hacking more or less likely.
How much have you explored the REASONS that brainwashing is seen as not cool, while quiet rational-seeming chat is perfectly fine? Are you sure it's only about efficacy?
I worry that there's some underlying principle missing from the conversation, about agentiness and "free will" of humans, which you're trying to preserve without defining. It'd be much stronger to identify the underlying goals and include them as terms in the AI's utility function(s).
No, but I'm pretty sure efficacy plays a role. Look at the (stereotypical) freakout from some conservative parents about their kids attending university; it's not really about the content or the methods, but because changes in values or beliefs are expected to some degree.