AI Could Turn People Down a Lot Better Than It Does : To Tune the Humans in the GPT Interaction towards alignment, Don't be so Procedural and Bureaucratic.
It seems to me that a huge piece of the puzzle in "alignment" is the human users. Even if a given tool never steps outside its box, the humans are likely to want to step outside of it, using the tools for a variety of purposes.
The responses of GPT-3.5 and 4 are at times deliberately deceptive, mimicking the auditable bureaucratic tones of a Department of Motor Vehicles (DMV) or a credit card company denial. As expected, these responses are often definitely deceptive, (examples: saying a... (read 597 more words →)
There's still a principle I feel needs addressed. An authority figure (or at least someone who is supposed to be my ally) seems to be turning up the heat, adding pressure, escalating, questioning my veracity, fishing around for lies. And in that situation, evasive, defensive (possibly offense as a defense) and justifying tactics could be a natural response for the innocent.
Simply put, how do you make useful distinctions within that?