So I read this other post which talked about possible S-risk from an AI being aligned to human values. It contains the following statement:
Whoever controls the AI will most likely have somebody whose suffering they don’t care about, or that they want to enact, or that they have some excuse for, because that describes the values of the vast majority of people.
I do not dispute this statement. However, I think that it simply doesn't matter. Why?
Take an uncontroversially horrible person. (As an example, I will use the first person who comes to mind, but this logic probably applies for most people.) If said person[1] were to introspect on why he did the horrible things he did, he would probably realize that his hatred of minorities arose from a fear of minorities that ultimately was not grounded in reality. Additionally, any political support that he would have received from signaling hatred of minorities, including by acting as if he had a hatred of minorities, would not be necessary if he was a literal superintelligence. A literal superintelligence probably has much better ways to garner support for whatever values it may have. Therefore, a superintelligence trained on his values would probably not start an S-risk, as his hatred of minorities does not affect his utility function.
So I read this other post which talked about possible S-risk from an AI being aligned to human values. It contains the following statement:
I do not dispute this statement. However, I think that it simply doesn't matter. Why?
Take an uncontroversially horrible person. (As an example, I will use the first person who comes to mind, but this logic probably applies for most people.) If said person[1] were to introspect on why he did the horrible things he did, he would probably realize that his hatred of minorities arose from a fear of minorities that ultimately was not grounded in reality. Additionally, any political support that he would have received from signaling hatred of minorities, including by acting as if he had a hatred of minorities, would not be necessary if he was a literal superintelligence. A literal superintelligence probably has much better ways to garner support for whatever values it may have. Therefore, a superintelligence trained on his values would probably not start an S-risk, as his hatred of minorities does not affect his utility function.
Honestly, typing his name makes me uncomfortable, so I won't.