Preference conditional on circumstances and past preference satisfaction

Stuart_Armstrong

11 Preference conditional on circumstances and past preference satisfaction

by Stuart_Armstrong

17th Jun 2019

AI Alignment Forum

1 min read

1

11 Ω 5

I've mentioned conditional preferences before. These are preferences that are dependent on facts about the world, for example "I'd want to believe X if there are strong argument for X".

But there is another type of preference that is conditional: my tastes can vary depending on circumstances and on my past experience. For example, I might prefer to eat apples during the week and oranges on weekends. Or, because of the miracle of boredom, I might prefer oranges if (but only if) I've been eating apples all week so far.

What if I currently want apples, would want oranges tomorrow, but falsely believe (today) that I would want apples tomorrow? This is a known problem with "one-step hypotheticals", and a strong argument in practice for assessing preferences over time rather than at a single moment $t$ .

In theory, there are meta-preferences that allow one to get this even at a single moment $t$ , such as "I want to be able to follow my different tastes at different times" or a more formalised desire for variety and exploration.

Personal Blog

11 Ω 5

Mentioned in

70Research Agenda v0.9: Synthesising a human's preferences into a utility function

12Towards deconfusing values

12Values, Valence, and Alignment

4Let Values Drift

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 10:53 PM

[-]Dagon6y50

I strongly suspect those meta-preferences are both critical for correct extrapolation of human values/preferences, AND are the place where we'll find a fair bit of actual inconsistency of human desires.

"I want to be able to follow my illegible whims" seems like a very common and strong meta-preference, and I haven't seen it modeled well in any discussions.

Reply

Moderation Log