Notion of Preference in Ambient Control

Vladimir_Nesov

Notion of Preference in Ambient Control — LessWrong

21 Notion of Preference in Ambient Control

7th Oct 2010

9 min read

21

This post considers ambient control in a more abstract setting, where controlled structures are not restricted to being programs. It then introduces a notion of preference, as an axiomatic definition of constant (actual) utility. The notion of preference subsumes possible worlds and utility functions traditionally considered in decision theory.

Followup to: Controlling Constant Programs.

In the previous post I described the sense in which one program without parameters (the agent) can control the output of another program without parameters (the world program). These programs define (compute) constant values, respectively actual action and actual outcome. The agent decides on its action by trying to prove statements of a certain form, the moral arguments, such as [agent()=1 => world()=1000000]. When the time is up, the agent performs the action associated with the moral argument that promises the best outcome, thus making that outcome actual.

Let's now move this construction into a more rigorous setting. Consider a first-order language and a theory in that language (defining the way agent reasons, the kinds of concepts it can understand and the kinds of statements it can prove). This could be a set theory such as ZFC or a theory of arithmetic such as PA. The theory should provide sufficient tools to define recursive functions and/or other necessary concepts. Now, extend that theory by definitions of two constant symbols: A (the actual action) and O (the actual outcome). (The new symbols extend the language, while their definitions, obtained from agent and world programs respectively by standard methods of defining recursively enumerable functions, extend the theory.) With new definitions, moral arguments don't have to explicitly cite the code of corresponding programs, and look like this: [A=1 => O=10000].

Truth, provability, and provability by the agent

Given a model, we can ask whether a statement is true in that model. If a statement is true in all models, it's called valid. In first-order logic, all valid statements are also provable by a formal syntactic argument.

What the agent can prove, however, is different from what is provable from the theory it uses. In principle, the agent could prove everything provable (valid), but it needs to stop at some point and decide what action to perform, thus being unable to actually prove the rest of the provable statements. This restriction could take any one of many possible forms: a limit on the total number of proof steps used before making a decision, a "time limit" that maps to the proof process and stops it at some point, a set of statements ("sufficient arguments"), such that if any of the statements get proved, the process stops.

Overall, the agent is able to prove less than is provable (valid). This in particular means that for certain sets of statements that are inconsistent, the agent won't be able to derive a contradiction.

Sense and denotation

A and O are ordinary constant symbols, so for some specific values, say 2 and 1000, it's true that A=2 and O=1000 (more generally, in each model A and U designate two elements). There is little interesting structure to the constants themselves. The agent normally won't even know "explicit values" of actual action and actual outcome. Knowing actual value would break the illusion of consistent consequences: suppose the agent is consistent, knows that A=2, and isn't out of time yet, then it can prove [A=1 => O=100000], even if in fact O=1000, use that moral argument to beat any other with worse promised outcome, and decide A=1, contradiction. Knowing actual outcome would break the same illusion in two steps, if the agent ever infers an outcome different from the one it knows to hold: suppose the agent is consistent, knows that O=1000, and isn't out of time, then if it proves [A=1 => O=500], it can also prove [A=1 => (O=1000 AND O=500)], and hence [A=1 => (O=1000 AND O=500) => O=100000], using that to beat any other moral argument, making A=1 true and hence (O=1000 AND O=500) also true, contradiction.

Thus, the agent has to work with indirect definitions of action and outcome, not with action and outcome themselves. For the agent, actual action doesn't describe what the agent is, and actual outcome doesn't describe what the world is, even though moral arguments only mention actual action and actual outcome. Details of the definitions matter, not only what they define.

Abstract worlds

There seems to be no reason for definition of outcome O to be given by a program. We can as easily consider constant symbol O defined by an arbitrary collection of axioms. The agent doesn't specifically simulate the definition of O in order to obtain its specific value (besides, obtaining that value corresponds to the outcome not being controllable by the choice of action), it merely proves things about O. Thus, we can generalize world programs to world concepts, definitions of outcomes that are not programs. Furthermore, if definition of O is not a program, O itself can be more general than a finite number. Depending on the setting, O's interpretation could be an infinite set, a real number, or generally any mathematical structure.

Surprisingly, the same applies to action. It doesn't matter how the agent thinks about its actual action (defines A), so long as the definition is correct. One way to define the output of a program is by straightforwardly transcribing the program, as when defining a recursively enumerable function, but any other definition of the same value will do, including non-constructive ones.

Possible actions and possible outcomes

By way of axiomatic definitions of A and O, statements of the form [A=X => O=Y] can be proved by the agent. Each such statements defines a possible world Y resulting from a possible action X. X and Y can be thought of as constants, just like A and O, or as formulas that define these constants, so that the moral arguments take the form [X(A) => Y(O)].

The sets of possible actions and possible outcomes need to be defined syntactically: given a set of statement of the form [A=X => O=Y] for various X and Y, the agent needs a way of picking one with the most preferable Y, and to actually perform associated X. This is unlike the situation with A and O, where the agent can't just perform action A, since it's not defined in the way the agent knows how to perform (even though A is (provably) equivalent to one of the constants, the agent can't prove that for any given constant).

We can assume that sets of possible actions and possible outcomes (that is, formulas syntactically defining them) are given explicitly, and the moral arguments are statements of the form [A=X => O=Y] where X has to be a possible action and Y a possible outcome (not some other formulas). In this sense, A (as a formula, assuming its definition is finite) can't be a possible action, O won't be a possible outcome in interesting cases, and [A=A => O=O] is not a moral argument.

For each possible action, only one possible world gets defined in this manner. For the possible action that is equal to the actual action (that is, X such that (A=X) is provable in agent's theory for such X, although it's not provable by the agent), the corresponding possible outcome is equal to the actual outcome.

Possible worlds

Given a set of moral arguments [A=X => O=Y] that the agent managed to prove, consider the set of all possible outcomes that are referred by these moral arguments. Call such possible outcomes possible worlds (to distinguish them from possible outcomes that are not referred by moral arguments provable by the agent). Of all possible outcomes, the possible worlds could constitute only a small subset.

This makes it possible for the possible worlds to have more interesting structure than possible outcomes in general, for example possible outcomes could be just integers, while possible worlds prime integers. Thus, definitions of A and O define not just the actual outcome O, but a whole collection of possible worlds corresponding to possible actions.

Controlling axiomatic definitions

While the previous post discussed the sense in which the output of a constant program can be controlled, this one describes how to control a given (fixed) axiomatic definition into defining as desirable mathematical structure as possible. This shows that in principle, nothing is exempt from ambient control (since in principle one can give an axiomatic definition to anything), some definitions are just constant with respect to given agents (generate only one possible world, as defined above).

Determinism is what enables control, but ambient control relies only on "logical" determinism, the process of getting from definition to the defined concept, not on any notion of determinism within the controlled concept (actual outcome) itself. We can thus consider controlling concepts more general than our physical world, including the ones that aren't structured as (partially) deterministic processes.

Preference and utility

Possible outcomes are only used to rank moral arguments by how good the actual outcome O will be if the corresponding possible action is taken. Thus, we have an order defined on the possible outcomes, and the action is chosen to maximize the outcome according to this order. Any other properties of possible outcomes are irrelevant. This suggests directly considering utility values instead of outcomes, and using a utility symbol U instead of outcome symbol O in moral arguments.

As with actual outcome and its definition, we then have actual utility and its definition. Since definition supplies most of the relevant structure, I call definition of actual utility preference. Thus, agent is axiomatic definition of actual action A, and preference is axiomatic definition of actual utility U. Both agent and preference can be of arbitrary form, so long as they express the decision problem, and actual utility U could be interpreted with an arbitrary mathematical structure. Moral arguments are statements of the form [A=A1 => U=U1], with A1 a possible action and U1 a possible utility.

Merging axioms

Above, action and utility are defined separately, with axioms that generally don't refer to each other. Axioms that define action don't define utility, and conversely. Moral arguments, on the other hand, define utility in terms of action. If we are sure that one of the moral arguments proved by the agent refers to the actual action (without knowing which one; if we have to choose an actual action based on that set of moral arguments, this condition holds by construction), then actual utility is defined by the axioms of action (the agent) and these moral arguments, without needing preference (axioms of utility).

Thus, once moral arguments are proved, we can discard the now redundant preference. More generally, statements that the agent proves characterize actual action and actual utility together, where their axiomatic definitions characterized them separately. New statements can be equivalent to the original axioms, allowing to represent the concepts differently. The point of proving moral arguments is in understanding how actual utility depends on actual action, and using that dependence to control utility.

Utility functions

Let utility function be a functions F such that the agent proves [F(A)=U], and for each possible action X, there is a possible utility Z such that the agent can prove [F(X)=Z]. The second requirement makes utility functions essentially encodings of moral arguments; without it, a constant utility function defined by F(-)=U would qualify, but it's not useful to the agent, since it can't reason about U.

Given a utility function F and a possible action X, [A=X => U=F(X)] is a moral argument (provable by the agent). Thus, a utility function generates the whole set of moral arguments, with one possible outcome assigned to each possible action. Utility function restricted to the set of possible actions is unique. For, if F and G are two utility functions, X a possible action, then [A=X => (U=F(X) AND U=G(X))], proving contradiction in consequences of a possible action if F and G disagree at X.

Utility functions allow generalizing the notion of moral argument: we no longer need to consider only small sets of possible actions (because only small sets of moral arguments can be proved). Instead, one utility function needs to be found and then optimized. Since utility function is essentially unique, the problem of finding moral arguments can be recast as a problem of proving properties of the utility function, and more generally decision-making can be seen as maximization of utility function implicitly defined by agent program and preference.

Note that utility function is recognized by its value at a single point, but is uniquely defined for all possible actions. The single point restriction is given by the actual action and utility, while the rest of it follows from axiomatic definitions of those action and utility. Thus again, most of the structure of utility function comes from agent program and preference, not actual action and actual utility.

Connecting this to discussion of explicit/implicit dependence in the previous post, and the previous section of this one, utility function is the expression of explicit dependence of utility on agent's action, and decision problem shouldn't come with this dependence already given. Instead, most of the problem is figuring out this explicit dependence (utility function) from separate definitions of action and utility (agent program and preference).

Open problems

Formalize the limitations on the ability of the agent to prove statements ("agent-provability"). Use this formalization to prove impossibility of deriving a contradiction from assumption of possible action [A=X => FALSE], agent-provability and uniqueness of utility function.
Explore the more general case where the agent is unable to find a utility function, and can only find a set of possible outcomes for some possible actions.
Generalize utility functions to dependencies between arbitrary concepts (in the context of an agent's theory and decision-making conditions that limit what can be proved, just as here). Use that generalization to formalize instrumental utility (instrumental concepts).
Based on results in the previous item, describe control over other agents in games (through controlling their preference, as opposed to directly controlling their action).
(The impossible item.) Given an agent program, define its preference.

Personal Blog

21

New Comment

47 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:06 PM

[-]Tyrrell_McAllister16y10

This is unlike the situation with A and O, where the agent can't just perform action A, since it's not defined in the way the agent knows how to perform (even though A is (provably) equivalent to one of the constants, the agent can't prove that for any given constant).

It's probably a good idea to maintain the distinction between a constant symbol c and the element v(c) assigned to c by the interpretation's valuation map v into the domain of discourse.

For example, I found the quote above confusing, but I think that you meant the following: "This is unlike the situation with A and O, where the agent can't just perform action v(A), since it's not defined in the way the agent knows how to perform. It is true that we can prove, in the metalanguage, that there exists an action X such that v(A) = X. However, there is no action X such that, for some constant symbol 'X' such that v('X') = X, the agent can prove [A = 'X']."