Manfred comments on Where do selfish values come from? - Less Wrong

27 Post author: Wei_Dai 18 November 2011 11:52PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (57)

You are viewing a single comment's thread.

Comment author: Manfred 19 November 2011 07:29:26AM 1 point [-]

Option (3) seems like a value learning problem that I can parrot back Eliezer's extension to :P

So basically his idea was that we could give the AI a label to a value, "selfishness" in this case, as if it was something the AI had incomplete information on. Now the AI doesn't want to freeze its values, because that wouldn't maximize the incompletely-known goal of "selfishness," it would only maximize the current best estimate of what selfishness is. The AI could learn more about this selfishness goal by making observations and then not caring about agents that didn't make those observations.

This is a bit different than the example of "friendliness" because you don't hit diminishing returns - there's an infinity of agents to not be. So you don't want the agent to do an exploration/exploitation tradeoff like you would with friendliness, you just want to have various possible "selfishness" goals possible at a given moment, with different possibilities assigned. The possible goals would correspond to the possible agents you could turn out to share observations with, and the probabilities of those goals would be the probabilities of sharing those observations. This interpretation of selfishness appears to basically rederive option (2).