22 21 July 2012 06:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Sort By: Best

Comment author: 22 July 2012 03:53:30AM 5 points [-]

It seems to me that vagueness is different from having competing definitions (e.g., AIXI's notion of utility function vs UDT's) that may turn out to be wrong.

AIXI's utility function is useless, the fact that it can be called "utility function" notwithstanding. UDT's utility function is not defined formally (its meaning depends on "math intuition"). For any real-world application of a utility function, we don't have a formal notion of its domain. These definitions are somewhat vague, even if not hopelessly so. They are hopelessly vague for the purpose of building a FAI.

Comment author: 22 July 2012 05:26:06AM 2 points [-]

These definitions are somewhat vague, even if not hopelessly so.

Perhaps I shouldn't have implied or given the impression that we have fully non-vague definitions of "utility function". What if I instead said that our notions of utility function are not as vague as Vaniver makes them out to be? That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain, and that we can see some past ideas (e.g., just over sensory inputs) as definitely wrong?

Comment author: 22 July 2012 05:43:30AM *  3 points [-]

That our most promising approach for how to define "utility function" gives at least fairly clear conceptual guidance as to the domain

Given that the standard of being "fairly clear" is rather vague, I don't know if I disagree, but at the moment I don't know of any approach to a potentially FAI-grade notion of preference of any clarity. Utility functions seem to be a wrong direction, since they don't work in the context of the idea of control based on resolution of logical uncertainty (structure). (UDT's "utility function" is more of a component of definition of something that is not a utility function.)

ADT utility value (which is a UDT-like goal definition) is somewhat formal, but only applies to toy examples, it's not clear what it means even in these toy examples, it doesn't work at all when there is uncertainty or incomplete control over that value on part of the agent, and I have no idea how to treat physical world in its context. (It also doesn't have any domain, which seems like a desirable property for a structuralist goal definition.) This situation seems like the opposite of "clear" to me...

Comment author: 23 July 2012 01:09:08PM 0 points [-]

In my original UDT post, I suggested

In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory.

Of course there are enormous philosophical and technical problems involved with this idea, but given that it has more or less guided all subsequent decision theory work by our community (except possibly work within SI that I've not seen), Vaniver's characterization of how much the domain of the utility function is underspecified ("Is it valuing sensory inputs? Is it valuing mental models? Is it valuing external reality?") is just wrong.

Comment author: 23 July 2012 02:46:59PM 0 points [-]

Right, preference over possible logical consequences of given situations is a strong unifying principle. We can also take physical world to be a certain collection of mathematical structures, possibly heuristically selected based on observations according with being controllable and morally relevant in a tractable way.

The tricky thing is that we are not choosing a structure among some collection of structures (a preferred possible world from a collection of possible worlds), but instead we are choosing which properties a given fixed class of structures will have, or alternatively we are choosing which theories/definitions are consistent or inconsistent, which defined classes of structures exist vs. don't exist. Since the alternatives that are not chosen are therefore made inconsistent, it's not clear how to understand them as meaningful possibilities, they are the mysterious logically impossible possible worlds. And there we have it, the mystery of the domain of preference.

Comment author: 23 July 2012 02:12:06PM *  -3 points [-]

Well, the publicly visible side of work does not seem to refer to specifically this, hence it is not so specified.

With regards to your idea, if you have to make that sort of preference function to really motivate the AI in sufficiently general manner that it may just be hell bent on killing everyone, I think the 'risk' is very far fetched exactly per my suspicion that viable super-intelligent UFAI is much too hard to worry about it (while all sorts of AIs that work on well specified mathematical problems would be much much simpler) . If it is the only way to motivate AI that is effective over evolution from seed to super-intelligence, then the only people who are working on something like UFAI are the FAI crowd. Keep in mind that if I want to cure cancer via the 'software tools' route and I am not signed up for cryonics then I'll just go for the simplest solution that works, which will be some sort of automated reasoning over formal systems (not over real world states). Especially as the general AI would require the technologies from the former anyway.