Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Consequentialist Formal Systems

12 Post author: Vladimir_Nesov 08 May 2012 08:38PM

This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.

On spurious arguments

UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.

The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form [(A=A1 => U=U1) and U<=U1]. Here, A is a symbol defined to be the actual action chosen by the agent, U is a similar symbol defined to be the actual value of world's utility, and A1 and U1 are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action A1, thereby making A1 the actual action.

The use of this statement introduces the problem of spurious arguments: if A1 is a bad action, but for some reason it's still chosen, then [(A=A1 => U=U1) and U<=U1] is true, since utility value U will in that case be in fact U1, which justifies (by the decision rule) choosing the bad action A1. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.

Deciding which theorems to make valid

Given that an inference system can overpower decision rules, causing an agent to follow a preference other than its own, perhaps decision-making should be happening inside the inference system in the first place, with agent only following its decisions. What does an inference system decide? Directly, it decides which theorems to have. The set of valid theorems follows deterministically from the axioms, but this is not really a problem, it's possible to make decisions in deterministic settings.

Suppose an inference system wants to decide whether it should have a theorem S. How does it evaluate the consequences of S being its theorem? It can assume that it proved S, see what that would imply, and if it likes the consequences (in comparison to the consequences of proving Not-S, for example), then it concludes S. Decision rules that an inference system follows are the axioms of the theory it works with, so this discussion suggests the following axiom schema (of moral axioms):

For all statements S and possible utility values u,
[(Prf(S) => U=u) and U<=u] => S is an axiom.

(This particular schema has a lot of problems, as discussed below, but seems adequate for communicating the general idea. [Edit: Stuart points out an even worse problem that makes these axioms break for any easily-provably-false S. Not sure what can be salvaged from this problem yet.]) A moral axiom from this schema states that if statement S being provable implies that the best possible utility gets realized, then that statement is declared to be valid.

Suppose that an agent has to choose an action A among possible actions 1 and 2, and wants to follow this theory's decisions. Then all it needs to do is pick a new propositional symbol B and establish the following decision rules:

Prf(B) => A=1
Prf(~B) => A=2

B remains otherwise undefined, its only effect is on our agent, or on definition of A. If it's true that, say, [A=1 => U=10], [A=2 => U=5], and also that in general [U<=10], then Prf(B) implies [U=10], which triggers the moral axiom for statement B and makes it valid/provable. As a result, the agent finds a proof of B and performs action 1.

Agent-less decision theory

This formulation is different from the usual ones in that the consequentialist loop is operated entirely from within an abstract formal system (i.e. not an algorithm). The formal system doesn't have an intended interpretation or a privileged agent (definition of an action) that would enact its decisions. Instead, it looks for all possible agents (actions, facts) that respond to its arguments (and affect its utility value), and supplies the arguments (theorems) according to how those agents respond to various hypothetical arguments. If there are multiple agents that have to be coordinated, that calls for proving a theorem that simultaneously establishes the strategies of all agents involved. And the agents could well use their own inference systems or proof search algorithms.

For an agent, such formal system plays a role of preference, it is an abstract computation that answers the questions about what should be done in each particular situation.

Open problems

The axiom schema [(Prf(S) => U=u) and U<=u] => S is not adequate for many reasons. First, it's only capable of making knowably perfect decisions (which in particular requires utility value to have a reachable upper bound). Second, it introduces a different kind of spurious arguments that make the formal system inconsistent: once a statement S triggers its moral axiom, it follows that Prf(S), and so U=u, which triggers the other moral axioms all at once. This isn't necessarily too bad, since it's irrelevant what happens once utility value is optimal, but it also makes it harder to trigger moral axioms prior to making a decision.

For example, in Wei Dai's variant of coordination game, an agent is given indexical identification 1 or 2, and has to pick among actions A and B in such a way that its versions that observe 1 and 2 pick different actions. A natural way of setting up the agent using a consequentialist theory is to introduce propositional symbols T1 and T2, and establish decision rules

If I observe 1, Prf(T1) => action=A; Prf(~T1) => action=B
If I observe 2, Prf(T2) => action=A; Prf(~T2) => action=B

In this case, if either [T1 and ~T2] or [T2 and ~T1] is a theorem of the formal system, then the two versions of the agent (observing 1 and 2) will achieve the optimal utility value. The problem is that moral axioms for both theorems can be triggered, and if both do get triggered, then quickly absurdity is proved, which makes it hard to predict which actions the agents will actually perform, and what utility would follow from that. And if the formal system can't predict the effect of triggering its moral axioms on utility, it won't trigger the moral axioms, so it's unclear what would actually happen. Perhaps some different clever statement will get proved that would predictably lead to the agent choosing the right actions.

Another issue is that the moral axiom schema should probably only consider theorems of some special kind, and compare their consequences with those of specific other theorems (not just with an unconditional upper bound).

What really changed?

The main technical difference appears to be that instead of using moral arguments of the form [A=A1 => U=U1], this approach uses moral arguments of the form [Prf(A=A1) => U=U1]. As a result, proving A=A2 (for A2<>A1) no longer allows inferring a false antecedent, which in this case is ~Prf(A=A1), and so the usual path to spurious arguments is closed. Perhaps focusing on just this distinction might be more fruitful than paying attention to the surrounding philosophical bells and whistles.

Comments (20)

Comment author: Stuart_Armstrong 09 May 2012 01:54:24PM *  8 points [-]

Given that axiom schema, it seems easy for the agent to prove Prf(S) for all S.

Assumption: U is bounded, by some v that is easy to calculate.

Let C=(0=1), a contradiction.

Then consider [(Prf(C) -> U=v) and U<=v] -> C. By assumption U<=v is true and easy, so if Prf(C) is false, then (Prf(C) -> U=v) would be true and so would C. Hence ¬Prf(C) -> C. Taking the contrapositive: ¬C -> Prf(C). Since ¬C is a tautology, this implies Prf(C).

A short search will also produce Prf(¬C). Then for any S, since (¬C and C) -> S, the system can show Prf(S) (I'm assuming it's expressive enough that from Prf(A) and Prf(A->B) it can get Prf(B)).

Don't know if this blows up the system yet, but the fact that the system can prove all Prf(S) hints that something weird may be going on...

EDIT: Actually, here is how you blow up the system. Since it can demonstrate that Prf(S) is true, the axiom [(Prf(S) -> U=u) and U<=u] -> S reduces to U=u -> S. So as long as you can show that U takes one of finitely many values, you can prove any S (and if the system is omega-consistent, it's already blown up).

Comment author: Vladimir_Nesov 09 May 2012 02:55:54PM *  2 points [-]

You're right, this shows that the moral axioms as stated don't work. Essentially [(Prf(C) -> U=v) and U<=v] -> C simplifies to (Prf(C) -> U=v) -> C, and if C is absurdity, then ~(Prf(C) -> U=v), that is (~U=v and Prf(C)). Both Prf(C) and ~U=v shouldn't hold. Thus, moral axioms in the present form shouldn't be added for any easily-provably-false statements. Will try to figure out if the damage can be contained.

(Updated the post and its summary to mention the problem.)

Comment author: Stuart_Armstrong 09 May 2012 04:21:54PM 1 point [-]

One immediate idea is to replace the conditional [(Prf(S) -> U=u) and U<=u] -> S with the rule of inference "from [(Prf(S) -> U=u) and U<=u], deduce S". That way you can't get a contrapositive, and you probably need to get Loebian to hope to find a contradiction.

Not confident at all that would work, though.

Comment author: Vladimir_Nesov 09 May 2012 04:40:17PM 0 points [-]

Yes, that was the intention, and the problem is that the implication can be tugged from the wrong side, but implication can't be one-sided. I'd prefer to stay with standard inference rules though, if at all possible.

Comment author: Stuart_Armstrong 11 May 2012 08:05:32AM 0 points [-]

Pulling on one side but not the other seems textbook of what relevance logics were designed for.

Comment author: gRR 09 May 2012 06:37:12PM 1 point [-]

Would restricting the axiom schema to content-less proposition symbols like "B" solve the problem?

Comment author: shminux 08 May 2012 09:12:42PM *  6 points [-]

*stares at a huge inferential expanse... *Wonders if there is a simple example or two that can make it a bit less so.

Comment author: Multiheaded 09 May 2012 04:17:00AM *  -1 points [-]

“I was like a boy playing on the sea-shore, and diverting myself now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.”

(Now why the hell would that be downvoted?)

Comment author: CuSithBell 09 May 2012 06:01:06AM 1 point [-]

(Now why the hell would that be downvoted?)

It doesn't seem to address the comment it is a response to?

Comment author: Multiheaded 09 May 2012 01:26:16PM *  -1 points [-]

Um, it does: I was ironically comparing Newton's sentiment to the frustration seemingly expressed in shminux's comment?

Comment author: CuSithBell 09 May 2012 03:53:19PM 2 points [-]

I didn't vote on your comments, but I assume the person (?) who did thought you should have given an example of the kind shminux wanted instead.

Comment author: wantstojoinin 09 May 2012 09:53:32AM *  3 points [-]

Why isn't building a decision theory equivalent to building a whole AI from scratch?

Comment author: cousin_it 09 May 2012 08:31:59AM *  0 points [-]

I don't understand how "undefined propositional symbols" make coordination happen. In a PD played between two similar agents, will they figure out that their actions depend on the same undefined symbol?

Comment author: Alerus 08 May 2012 11:51:46PM 0 points [-]

So I think my basic problem here is I'm not familiar with this construct for decision making or why it would be favored over others. Specifically, why make logical rules about which actions to take? Why not take an MDP value-learning approach where the agent chooses an action based on which action has the highest predicted utility. If the estimate is bad, it's merely updated and if that situation arises again, the agent might choose a different action as a result of the latest update to it.

Comment author: gRR 08 May 2012 09:46:45PM 0 points [-]

But, doesn't the whole setup mean that Prf(S)=>S for any statement S of the form A=Ai? Won't it immediately Löb-explode?

Comment author: Vladimir_Nesov 08 May 2012 09:52:37PM *  0 points [-]

How do you get to Prf(S)=>S? It works with something more like [Prf(S)=>Q]=>S for non-obvious Q. This setup does have a problem with exploding after figuring out how to maximize utility, but it doesn't seem to explode prior to that point (and I hope there might be a less perfectionist variant that gradually improves the situation without exploding at any point, instead of making a single leap to maximal utility value).

Comment author: gRR 08 May 2012 10:06:23PM 0 points [-]

No, I mean, the agent actually performs the action when it proved it, right? So Prf(A=A1) implies that A will indeed be =A1. Assuming the agent has its own source, it will know that.

Comment author: Vladimir_Nesov 08 May 2012 10:12:14PM 1 point [-]

The thing being proved is not the action, it's an undefined propositional constant symbol. The action responds to the fact that the propositional symbol gets proved. See the example at the end of the second section, and keep track of the distinction between A and B.

Comment author: gRR 08 May 2012 10:45:19PM 0 points [-]

Thanks, I understand now. The implications:

Prf(B) => A=1
Prf(~B) => A=2

mislead me. I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:

A():
generate various proofs
if found a proof of B return 1
if found a proof of ~B return 2

And then there's no Löbean issues. Cool! This agent can prove A=Ai without any problems. This should work great for ASP.

Comment author: Vladimir_Nesov 08 May 2012 10:49:44PM *  0 points [-]

I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:

They are both, since triggering the moral axiom for B requires having those implications in the consequentialist theory (they are part of the definition of A, and A is part of the definition of U, so the theory knows them).

It does seem that a consequentialist theory could prove what its agents' actions are, if we somehow modify the axiom schema so that it doesn't explode as a result of proving the maximality of U following from the statements (like B) that trigger those actions. At least the old reasons for why this couldn't be done seem to be gone, even if now there are new reasons for why this currently can't be done.