Finally, we will also assume that the AI does not possess the ability to manually rewire the human brain to change what a human values. In other words, the ability for the AI to manipulate another person's values is limited by what we as humans are capable of today. Given all this, is there any concern we should have about making this AI; would it succeed in being a friendly AI?
AAaiiiieee!
Can you please think about the emphasised phrase by yourself for 5 minutes (as measured by a physical clock)?
That was approximately my reaction. Considering how easy it is to influence humans, it would be a nontrivial task to avoid unintentionally changing those values. I suspect that a significantly subhuman AI could change human values. One might argue that by increasing and redistributing wealth and improving medical care, our expert systems today are changing human value unintentionally. Google ads, which are automated and "learn" what the user likes, are specifically tailored to influence us, and do a remarkable job considering the relative simplicity of the algorithms used.
I put "trivial" in quotes because there are obviously some exceptionally large technical achievements that would still need to occur to get here, but suppose we had an AI with a utilitarian utility function of maximizing subjective human well-being (meaning, well-being is not something as simple as physical sensation of "pleasure" and depends on the mental facts of each person) and let us also assume the AI can model this "well" (lets say at least as well as the best of us can deduce the values of another person for their well-being). Finally, we will also assume that the AI does not possess the ability to manually rewire the human brain to change what a human values. In other words, the ability for the AI to manipulate another person's values is limited by what we as humans are capable of today. Given all this, is there any concern we should have about making this AI; would it succeed in being a friendly AI?
One argument I can imagine for why this fails friendly AI is the AI would wire people up to virtual reality machines. However, I don't think that works very well, because a person (except Cypher from the Matrix) wouldn't appreciate being wired into a virtual reality machine and having their autonomy forcefully removed. This means the action does not succeed in maximizing their well-being.
But I am curious to hear what arguments exist for why such an AI might still fail as a friendly AI.