Wei_Dai comments on The Blue-Minimizing Robot - Less Wrong

162 Post author: Yvain 04 July 2011 10:26PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (159)

You are viewing a single comment's thread. Show more comments above.

Comment author: steven0461 05 July 2011 09:56:17PM *  13 points [-]

What does this robot "actually want", given that the world is not really a 2D grid of cells that have intrinsic color?

Who cares about the question what the robot "actually wants"? Certainly not the robot. Humans care about the question what they "actually want", but that's because they have additional structure that this robot lacks. But with humans, you're not limited to just looking at what they do on auto-pilot; instead, you can just ask the aforementioned structure when you run into problems like this. For example, if you asked me what I really wanted under some weird ontology change, I could say, "I have some guesses, but I don't really know; I would like to defer to a smarter version of me". That's how I understand preference extrapolation: not as something that looks at what your behavior suggests that you're trying to do and then does it better, but as something that poses the question of what you want to some system you'd like to answer the question for you.

It looks to me like there's a mistaken tendency among many people here, including some very smart people, to say that I'd be irrational to let my stated preferences deviate from my revealed preferences; that just because I seem to be trying to do something (in some sense like: when my behavior isn't being controlled much by the output of moral philosophy, I can be modeled as a relatively good fit to a robot with some particular utility function), that's a reason for me to do it even if I decide that I don't want to. But rational utility maximizers get to be indifferent to whatever the heck they want, including their own preferences, so it's hard for me to see why the underdeterminedness of the true preferences of robots like this should bother me at all.

Insert standard low confidence about me posting claims on complicated topics that others seem to disagree with.

Comment author: Wei_Dai 06 July 2011 01:37:42AM 6 points [-]

In other words, our "actual values" come from our being philosophers, not our being consequentialists.

It seems plausible to me, and I'm not sure that "many" others do disagree with you.

Comment author: cousin_it 04 August 2011 12:13:22PM 4 points [-]

That would imply a great diversity of value systems, because philosophical intuitions differ much more from person to person than primitive desires. Some of these value systems (maybe including yours) would be simple, some wouldn't. For example, my "philosophical" values seem to give large weight to my "primitive" values.