Stuart_Armstrong comments on Proper value learning through indifference - Less Wrong

16 Post author: Stuart_Armstrong 19 June 2014 09:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 20 June 2014 09:43:58AM *  1 point [-]

Every value loading agent I've considered (that pass the naive cake-or-death problem, at least) can be considered equivalent to a UDT agent.

I'm just not sure it's a useful way of thinking about it, because the properties that we want - "conservation of moral evidence" and "don't manipulate your own moral changes" - are not natural UDT properties, but dependent on a particular way of conceptualising a value loading agent. For instance, the kid that doesn't ask whether eating cookies is bad, has a sound formulation as a UDT agent, but this doesn't seem to capture what we want.

EDIT: this may be relevant http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/

Comment author: Wei_Dai 21 June 2014 12:26:11AM 3 points [-]

It seems to me that there are natural ways to implement value loading as UDT agents, with the properties you're looking for. For example, if the agent values eating cookies in universes where its creator wants it to eat cookies, and values not eating cookies in universes where its creator doesn't want it to eat cookies (glossing over how to define "creator wants" for now), then I don't see any problems with the agent manipulating its own moral changes or avoiding asking whether eating cookies is bad. So I'm not seeing the motivation for coming up with another decision theory framework here...