Oscar_Cunningham comments on The curse of identity - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (296)
I don't understand why you call this a problem. If I understand you correctly, you are proposing that people constantly and strongly optimize to obtain signalling advantages. They do so without becoming directly aware of it, which further increases their efficiency. So we have a situation where people want something and choose an efficient way to get it. Isn't that good?
More directly, I'm confused how you can look at an organism, see that it uses its optimization power in a goal-oriented and efficient way (status gains in this case) and call that problematic, merely because some of these organisms disagree that this is their actual goal. What would you want them to do - be honest and thus handicap their status seeking?
Say you play many games of Diplomacy against an AI, and the AI often promised you to be loyal, but backstabbed you many times to its advantage. You look at the AI's source code and find out that it has backstabbing as a major goal, but the part that talks to people isn't aware of that so that it can lie better. Would you say that the AI is faulty? That it is wrong and should make the talking module aware of its goals, even though this causes it to make more mistakes and thus lose more? If not, why do you think humans are broken?
I want people to work toward noble efforts like charity work, but don't care much about whether they attian high status. So it's useful to aid the bit of their brain that wants to do what I want it to do.
People who care about truth might spot that part of your AI's brain wants to speak the truth, and so they will help it do this, even though this will cost it Diplomacy games. They do this because they care more about truth than Diplomacy.
By "caring about truth" here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn't talk about what it wants at all, just plays chess, would a person who "cares about truth" would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?
Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who "cares about truth" might also be motivated to remove muflax's AI's ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we've already stipulated that isn't a showstopper.)
I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn't important to my argument, which was just that they want what they want.