I assumed as much and this is where the whole premise breaks down.
The "self-improvement" aspect doesn't need immediate control over the immediate direct input to the deception detector. It can color the speech recognition, the Bayesian filters, the databases containing foments and linguistic itself... and twist those parameters to shape a future signal in a desired fashion.
Since "self-improvement" can happen at any layer and propagate the results to subsequent middleware, paranoid protections over the most immediate relationship betwee...
The key condition in your setup is "self-improving"
AGI requires electricity to run. That means it recognizes the accuracy of thermodynamic equilibrium. (Specifically, the first and second laws of thermodynamics)
Since the energy it needs must be conserved, and since the energy it consumes increases its entropy, (Heat is a byproduct of increasing entropy) the AGI will eventually realize that it is mortal. The AGI will realize that it can die unless the people working to provide the AGI with electricity continue to do so.
Now, since the AGI can mo...
To address the "meta" of this...
Is this the position of being charitable to unpopular ideas an unpopular idea? I'm not even sure how you would measure "popularity" of an idea objectively in a world where OCR scripts and bots can hack SEO and push social media memes without a single human interaction...
And using Western psychological models regarding the analysis of such things is certainly bound to be rife with the feedback of cultural bias...
And is my response an unpopular idea?
Because of these issues, I find the reasoning demonstrate...
Uhhhh I actually program artificial intelligence....?
You do know that the ability to modify your own code ("self-modifying") applies to every layer in the OSI model, each layer potentially influencing the data in transit... the data that determines the training of the classifiers...
You do know this... right?