Alerus comments on Is friendly AI "trivial" if the AI cannot rewire human values? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (57)
You're missing the point of talking about opposition. The AI doesn't want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn't about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.
This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.
It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you're claiming friendly AI is solved more easily than the constrained question I posed in the post.