Crossposted at the Intelligent Agents Forum.
It should be noted that the colloquial "AI hacking a human" can mean three different things:
- The AI convinces/tricks/forces the human to do a specific action.
- The AI changes the values of the human to prefer certain outcomes.
- The AI completely overwhelms human independence, transforming them into a weak subagent of the AI.
Different levels of hacking make different systems vulnerable, and different levels of interaction make different types of hacking more or less likely.
Ok. The obvious followup is "under what conditions is it a bad thing?" Your college example is a good one - are you saying you want to prevent AIs from making similar changes (but on a perhaps larger scale) that university does to students?
Well, there's a formal answer: if an AI can, in condition C, convince any human of belief B for any B, then condition C is not sufficient to constrain the AI's power, and the process is unlikely to be truth-tracking.
That's a sufficient condition for C being insufficient, but not a necessary one.