Review

Q: Are all of a person's values comparable with each other? For example, is a candlelit dinner comparable to a sunset walk on a beach?

A: Of course. You can ask the person to choose between these two things. Their answer will give you information about what they value more.

Q: What if the person can't choose?

A: Then they probably value these two things about equally.

Q: Okay, I have another question. Are all abstract concepts comparable to each other by weight?

A: Come again?

Q: I mean, we can ask a person: "Is one mile heavier than one hour or vice versa?" That will give you information about that person's weight function, do they assign more weight to one mile or one hour.

A: The person can't choose, because the question is nonsense.

Q: But by your own argument above, doesn't that mean they weigh these things about equally?

A: It's different, because the question about value feels more meaningful to the person, even if they can't give an answer.

Q: But a question can feel meaningful without being about anything real. For example, questions about gods and demons feel meaningful to many people. What if questions about which thing is more valued are also like that?

A: The difference is that value doesn't only manifest in answers to questions, it also manifests in what actions people choose.

Q: Do you mean, for a specific binary choice you can imagine a person in a room faced with two buttons and so on?

A: Exactly.

Q: Very well. Imagine a person in a room faced with two buttons, saying "one mile is heavier than one hour" and "vice versa".

A: Screw you!


Tedious explanation of the joke: I've long been puzzled by the argument that we can ask people to choose between things, therefore people have preferences. Today I realized how to kick that argument down: by pointing out that you can ask people anything at all. So the mere act of asking can't be evidence that the question is meaningful. Very quickly this dialogue was born, I hope you like it.

New Comment
7 comments, sorted by Click to highlight new comments since:

Q: Very well. Imagine a person in a room faced with two buttons, saying “one mile is heavier than one hour” and “vice versa”.

Really it should be three buttons, the third one saying “they are of equal weights”… ;)

(Great post!)

I'm not sure I understand the claim or hypothesis behind this post.  It's something about the meanings of "value" and "believe" in terms of evidence from statements or button-pushes, but I don't see how it's confusing in the first place.

In my view, people have preferences, and have beliefs about causality, which are expressed through actions that (are intended to) influence future world-states.  This is VERY NOISY, because brains kind of suck, and because the complexity of the real world really is too big to fit into anyone's models.  Instead of

we can ask people to choose between things, therefore people have preferences

I'd say something like "people take actions and have behaviors, and to the extent they are consistent, this implies preferences".  No part of "we ask" or verbalizing those preferences is required.  Preferences and values are the choices people make, not the things they say.

Inferring preferences from actions is also philosophically tricky. My favorite reference is this old comment thread.

Wei:

let’s say it models the world as a 2D grid of cells that have intrinsic color... What does this robot “actually want”, given that the world is not really a 2D grid of cells that have intrinsic color?

steven0461:

Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks. But with humans, you’re not limited to just looking at what they do on auto-pilot; instead, you can just ask

So with my post I'm trying to continue that line. It was understood (I hope!) that inferring preferences from actions would lead to something very evolutionary-messy and selfish and you wouldn't endorse it when shown the description. And now I try to show that inferring preferences by asking is also kind of meaningless.

Hmm. I guess I start with the knowledge that humans don't seem to be VNM-consistent, so it's quite reasonable to start by tabooing "want" and "prefer", because they don't apply in the way that's usually studied and analyzed.

I disagree with steven0461 that "just ask" provides any more information than watching an artificial choice.  Both are trying to infer something that doesn't exist from something easily observable.  

For many humans, we CAN say they "currently prefer" the expected outcome of an actual choice they make, but that's a pretty weak and circular definition.

So - what do you hope to actually model about an individual human that you're using the word "want" for?

The overarching problem is figuring out human preferences so that AI can fulfill them. We're all on the same page that humans aren't VNM-consistent.

Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.

Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.

Search for meaning can be part of the activity. I think there is a sensible illustration from the old model of UDT where there's agent A() and world U(), and we want to look for dependencies D(-) such that D(A) serves as a proxy for U(), and also such that D has A factored out of it, so that it itself doesn't depend on A (not a spurious dependence) to prevent cyclic reasoning when A makes decisions based on D. Here, we start with A and U as given, and then figure out D, which serves as the correspondence meaning of A in terms of its acausal influence on U. So the meaning of A is logically downstream of the definition of A.

When we label buttons with "2+2=5" and "2+2=7", the physical world outcomes of pressing them are not on the way to the U() of their A(), so they are not relevant. But those outcomes are on the way to the human's U(), even as the human still doesn't know the meaning of their actions, since that meaning is downstream of knowing the scope of the semantic outcomes they do already know to care about. This difference in scopes of intended outcomes is the disanalogy.