[link] Thoughts on defining human preferences

Kaj_Sotala

https://docs.google.com/document/d/1jDGpIT3gKZQZByO6A036dojRKMv62KEDEfEz87VuDoY/

Abstract: Discussion of how we might want to define human preferences, particularly in the context of building an AI intended to learn and implement those preferences. Starts with actual arguments about the applicability of the VNM utility theorem, then towards the end gets into hypotheses that are less well defended but possibly more important. At the very end, suggests that current hypothesizing about AI safety might be overemphasizing “discovering our preferences” over “creating our preferences”.

https://docs.google.com/document/d/1jDGpIT3gKZQZByO6A036dojRKMv62KEDEfEz87VuDoY/

I like a lot of this paper. I disagree with the extent and scope of your "personal hypothesis" in section 8. I just think that, as a matter of empirical fact, there usually is something we "really want", or would want with more rationality and information. For most people facing your country vs city living decision, for example, I think that trying both lifestyles would lead to a clear winner (and not in a radically path-dependent way).

But I think you've got to be right at least to some extent - sometimes we have to create values, we can't just discover them.

On VNM rationality, let me recommend the book Decision Theory and Rationality by Bermúdez. He raises some similar worries as you, pointing out that decision theory is often offered as (a) a predictive tool, (b) a prescription (follow this recipe to make better decisions!), and/or (c) a normative theory. He then claims that the same theory can't do all three; yet if it tries to make do with less than all three, that raises troubles too.

Thanks for the post.

Thanks for the recommendation!

9

[link] Thoughts on defining human preferences

9

9

9

[link] Thoughts on defining human preferences

9

9