Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Oscar_Cunningham comments on The curse of identity - Less Wrong

125 Post author: Kaj_Sotala 17 November 2011 07:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (300)

You are viewing a single comment's thread. Show more comments above.

Comment author: Oscar_Cunningham 17 November 2011 02:33:26PM 4 points [-]

I want people to work toward noble efforts like charity work, but don't care much about whether they attian high status. So it's useful to aid the bit of their brain that wants to do what I want it to do.

People who care about truth might spot that part of your AI's brain wants to speak the truth, and so they will help it do this, even though this will cost it Diplomacy games. They do this because they care more about truth than Diplomacy.

Comment author: TheOtherDave 17 November 2011 03:04:18PM 1 point [-]

By "caring about truth" here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn't talk about what it wants at all, just plays chess, would a person who "cares about truth" would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?

Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who "cares about truth" might also be motivated to remove muflax's AI's ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we've already stipulated that isn't a showstopper.)

Comment author: Oscar_Cunningham 17 November 2011 06:38:35PM 0 points [-]

I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn't important to my argument, which was just that they want what they want.

Comment author: CG_Morton 18 November 2011 06:12:05PM -1 points [-]

I don't see how this is admirable at all. This is coercion.

If I work for a charitable organization, and my primary goal is to gain status and present an image as a charitable person, then efforts by you to change my mind are adversarial. Human minds are notoriously malleable, so it's likely that by insisting I do some status-less charity work you are likely to convince me on a surface level. And so I might go and do what you want, contrary to my actual goals. Thus, you have directly harmed me for the sake of your goals. In my opinion this is unacceptable.