Wei_Dai comments on Work on Security Instead of Friendliness? - Less Wrong

29 Post author: Wei_Dai 21 July 2012 06:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (103)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 23 July 2012 01:09:08PM 2 points [-]

In my original UDT post, I suggested

In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.

Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I've not seen), Vaniver's characterization of how much the domain of the utility function is underspecified ("Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?") is just wrong.

Comment author: Vladimir_Nesov 23 July 2012 02:46:59PM 2 points [-]

Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.

The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don't exist. Since the alternatives that are not chosen are therefore made inconsistent, it's not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.