timtyler comments on Work on Security Instead of Friendliness? - Less Wrong

29 Post author: Wei_Dai 21 July 2012 06:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (103)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 22 July 2012 03:53:30AM 7 points [-]

It seems to me that vagueness is different from having competing definitions (e.g., AIXI's notion of utility function vs UDT's) that may turn out to be wrong.

AIXI's utility function is useless, the fact that it can be called "utility function" notwithstanding. UDT's utility function is not defined formally (its meaning depends on "math intuition"). For any real-world application of a utility function, we don't have a formal notion of its domain. These definitions are somewhat vague, even if not hopelessly so. They are hopelessly vague for the purpose of building a FAI.

Comment author: Wei_Dai 22 July 2012 05:26:06AM 5 points [-]

These definitions are somewhat vague, even if not hopelessly so.

Perhaps I shouldn't have implied or given the impression that we have fully non-vague definitions of "utility function". What if I instead said that our notions of utility function are not as vague as Vaniver makes them out to be? That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain, and that we can see some past ideas (e.g., just over sensory inputs) as definitely wrong?

Comment author: Vladimir_Nesov 22 July 2012 05:43:30AM *  5 points [-]

That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain

Given that the standard of being "fairly clear" is rather vague, I don't know if I disagree, but at the moment I don't know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don't work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT's "utility function" is more of a component of definition of something that is not a utility function.)

ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it's not clear what it means even in these toy examples, it doesn't work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn't have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of "clear" to me...

Comment author: Wei_Dai 23 July 2012 01:09:08PM 2 points [-]

In my original UDT post, I suggested

In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.

Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I've not seen), Vaniver's characterization of how much the domain of the utility function is underspecified ("Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?") is just wrong.

Comment author: Vladimir_Nesov 23 July 2012 02:46:59PM 2 points [-]

Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.

The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don't exist. Since the alternatives that are not chosen are therefore made inconsistent, it's not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.