HamletHenna comments on The Blue-Minimizing Robot - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (159)
Who cares about the question what the robot "actually wants"? Certainly not the robot. Humans care about the question what they "actually want", but that's because they have additional structure that this robot lacks. But with humans, you're not limited to just looking at what they do on auto-pilot; instead, you can just ask the aforementioned structure when you run into problems like this. For example, if you asked me what I really wanted under some weird ontology change, I could say, "I have some guesses, but I don't really know; I would like to defer to a smarter version of me". That's how I understand preference extrapolation: not as something that looks at what your behavior suggests that you're trying to do and then does it better, but as something that poses the question of what you want to some system you'd like to answer the question for you.
It looks to me like there's a mistaken tendency among many people here, including some very smart people, to say that I'd be irrational to let my stated preferences deviate from my revealed preferences; that just because I seem to be trying to do something (in some sense like: when my behavior isn't being controlled much by the output of moral philosophy, I can be modeled as a relatively good fit to a robot with some particular utility function), that's a reason for me to do it even if I decide that I don't want to. But rational utility maximizers get to be indifferent to whatever the heck they want, including their own preferences, so it's hard for me to see why the underdeterminedness of the true preferences of robots like this should bother me at all.
Insert standard low confidence about me posting claims on complicated topics that others seem to disagree with.
That might be a procedure that generates human preference, but it is not a general preference extrapolation procedure. E.g suppose we replace Wei Dai's simple consequentialist robot with a robot that has similar behavior, but that also responds to the question, "What system do you want to answer the question of what you want for you?" with the answer, "A version of myself better able to answer that question. Maybe it should be smarter and know more things and be nicer to strangers and not have scope insensitivity and be less prone to skipping over invisible moral frameworks and have conecepts that are better defined over attribute space and be automatically strategic and super commited and stuff like that? But since I'm not that smart and I pass over moral frameworks and stuff, eveything I just said is probably insufficient to specify the right thing. Maybe you can look at my source code and figure out what I mean by right and then do the thing that a person who better understood that would do?" And then goes right back to zapping blue.