# Consequentialist Formal Systems

*This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.*

### On spurious arguments

UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.

The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form **[(A=A1 => U=U1) and U<=U1]**. Here, **A** is a symbol defined to be the actual action chosen by the agent, **U** is a similar symbol defined to be the actual value of world's utility, and **A1** and **U1** are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action **A1**, thereby making **A1** the actual action.

The use of this statement introduces the problem of spurious arguments: if **A1** is a bad action, but for some reason it's still chosen, then **[(A=A1 => U=U1) and U<=U1]** is true, since utility value **U** will in that case be in fact **U1**, which justifies (by the decision rule) choosing the bad action **A1**. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.

### Deciding which theorems to make valid

Given that an inference system can overpower decision rules, causing an agent to follow a preference other than its own, perhaps decision-making should be happening inside the inference system in the first place, with agent only following its decisions. What does an inference system decide? Directly, it decides which theorems to have. The set of valid theorems follows deterministically from the axioms, but this is not really a problem, it's possible to make decisions in deterministic settings.

Suppose an inference system wants to decide whether it should have a theorem **S**. How does it evaluate the consequences of **S** being its theorem? It can assume that it proved **S**, see what that would imply, and if it likes the consequences (in comparison to the consequences of proving **Not-S**, for example), then it concludes **S**. Decision rules that an inference system follows are the axioms of the theory it works with, so this discussion suggests the following axiom schema (of *moral axioms*):

For all statementsSand possible utility valuesu,[(Prf(S) => U=u) and U<=u] => Sis an axiom.

(This particular schema has a lot of problems, as discussed below, but seems adequate for communicating the general idea. [**Edit**: Stuart points out an even worse problem that makes these axioms break for any easily-provably-false **S**. Not sure what can be salvaged from this problem yet.]) A moral axiom from this schema states that if statement **S** being provable implies that the best possible utility gets realized, then that statement is declared to be valid.

Suppose that an agent has to choose an action **A** among possible actions **1** and **2**, and wants to follow this theory's decisions. Then all it needs to do is pick a new propositional symbol **B** and establish the following decision rules:

Prf(B) => A=1

Prf(~B) => A=2

**B** remains otherwise undefined, its only effect is on our agent, or on definition of **A**. If it's true that, say, **[A=1 => U=10]**, **[A=2 => U=5]**, and also that in general **[U<=10]**, then **Prf(B)** implies **[U=10]**, which triggers the moral axiom for statement **B** and makes it valid/provable. As a result, the agent finds a proof of **B** and performs action **1**.

### Agent-less decision theory

This formulation is different from the usual ones in that the consequentialist loop is operated entirely from within an abstract formal system (i.e. not an algorithm). The formal system doesn't have an intended interpretation or a privileged agent (definition of an action) that would enact its decisions. Instead, it looks for all possible agents (actions, facts) that respond to its arguments (and affect its utility value), and supplies the arguments (theorems) according to how those agents respond to various hypothetical arguments. If there are multiple agents that have to be coordinated, that calls for proving a theorem that simultaneously establishes the strategies of all agents involved. And the agents could well use their own inference systems or proof search algorithms.

For an agent, such formal system plays a role of preference, it is an abstract computation that answers the questions about what should be done in each particular situation.

### Open problems

The axiom schema **[(Prf(S) => U=u) and U<=u] => S** is not adequate for many reasons. First, it's only capable of making knowably perfect decisions (which in particular requires utility value to have a reachable upper bound). Second, it introduces a different kind of spurious arguments that make the formal system inconsistent: once a statement **S** triggers its moral axiom, it follows that **Prf(S)**, and so **U=u**, which triggers the other moral axioms all at once. This isn't necessarily too bad, since it's irrelevant what happens once utility value is optimal, but it also makes it harder to trigger moral axioms prior to making a decision.

For example, in Wei Dai's variant of coordination game, an agent is given indexical identification **1** or **2**, and has to pick among actions **A** and **B** in such a way that its versions that observe **1** and **2** pick different actions. A natural way of setting up the agent using a consequentialist theory is to introduce propositional symbols **T1** and **T2**, and establish decision rules

If I observe1,Prf(T1) => action=A;Prf(~T1) => action=B

If I observe2,Prf(T2) => action=A;Prf(~T2) => action=B

In this case, if either **[T1 and ~T2]** or **[T2 and ~T1]** is a theorem of the formal system, then the two versions of the agent (observing **1** and **2**) will achieve the optimal utility value. The problem is that moral axioms for both theorems can be triggered, and if both do get triggered, then quickly absurdity is proved, which makes it hard to predict which actions the agents will actually perform, and what utility would follow from that. And if the formal system can't predict the effect of triggering its moral axioms on utility, it won't trigger the moral axioms, so it's unclear what would actually happen. Perhaps some different clever statement will get proved that would predictably lead to the agent choosing the right actions.

Another issue is that the moral axiom schema should probably only consider theorems of some special kind, and compare their consequences with those of specific other theorems (not just with an unconditional upper bound).

### What really changed?

The main technical difference appears to be that instead of using moral arguments of the form **[A=A1 => U=U1]**, this approach uses moral arguments of the form **[Prf(A=A1) => U=U1]**. As a result, proving **A=A2** (for **A2<>A1**) no longer allows inferring a false antecedent, which in this case is **~Prf(A=A1)**, and so the usual path to spurious arguments is closed. Perhaps focusing on just this distinction might be more fruitful than paying attention to the surrounding philosophical bells and whistles.

## Comments (20)

Best*8 points [-]Given that axiom schema, it seems easy for the agent to prove

Prf(S)for allS.Assumption:

Uis bounded, by somevthat is easy to calculate.Let

C=(0=1), a contradiction.Then consider

[(Prf(C) -> U=v) and U<=v] -> C. By assumptionU<=vis true and easy, so ifPrf(C)is false, then(Prf(C) -> U=v)would be true and so wouldC. Hence¬Prf(C) -> C. Taking the contrapositive:¬C -> Prf(C). Since¬Cis a tautology, this impliesPrf(C).A short search will also produce

Prf(¬C). Then for anyS, since(¬C and C) -> S, the system can showPrf(S)(I'm assuming it's expressive enough that fromPrf(A)andPrf(A->B)it can getPrf(B)).Don't know if this blows up the system yet, but the fact that the system can prove all

Prf(S)hints that something weird may be going on...EDIT: Actually, here is how you blow up the system. Since it can demonstrate thatPrf(S)is true, the axiom[(Prf(S) -> U=u) and U<=u] -> Sreduces toU=u -> S. So as long as you can show thatUtakes one of finitely many values, you can prove anyS(and if the system is omega-consistent, it's already blown up).*2 points [-]You're right, this shows that the moral axioms as stated don't work. Essentially

[(Prf(C) -> U=v) and U<=v] -> Csimplifies to(Prf(C) -> U=v) -> C, and ifCis absurdity, then~(Prf(C) -> U=v), that is(~U=v and Prf(C)). BothPrf(C)and~U=vshouldn't hold. Thus, moral axioms in the present form shouldn't be added for any easily-provably-false statements. Will try to figure out if the damage can be contained.(Updated the post and its summary to mention the problem.)

One immediate idea is to replace the conditional

[(Prf(S) -> U=u) and U<=u] -> Swith the rule of inference "from[(Prf(S) -> U=u) and U<=u], deduceS". That way you can't get a contrapositive, and you probably need to get Loebian to hope to find a contradiction.Not confident at all that would work, though.

Yes, that was the intention, and the problem is that the implication can be tugged from the wrong side, but implication can't be one-sided. I'd prefer to stay with standard inference rules though, if at all possible.

Pulling on one side but not the other seems textbook of what relevance logics were designed for.

Would restricting the axiom schema to content-less proposition symbols like "B" solve the problem?

*6 points [-]*stares at a huge inferential expanse... *Wonders if there is a simple example or two that can make it a bit less so.

*-1 points [-]“I was like a boy playing on the sea-shore, and diverting myself now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.”

(Now why the hell would

thatbe downvoted?)It doesn't seem to address the comment it is a response to?

*-1 points [-]Um, it does: I was ironically comparing Newton's sentiment to the frustration seemingly expressed in shminux's comment?

I didn't vote on your comments, but I assume the person (?) who did thought you should have given an example of the kind shminux wanted instead.

*3 points [-]Why isn't building a decision theory equivalent to building a whole AI from scratch?

*0 points [-]I don't understand how "undefined propositional symbols" make coordination happen. In a PD played between two similar agents, will they figure out that their actions depend on the same undefined symbol?

So I think my basic problem here is I'm not familiar with this construct for decision making or why it would be favored over others. Specifically, why make logical rules about which actions to take? Why not take an MDP value-learning approach where the agent chooses an action based on which action has the highest predicted utility. If the estimate is bad, it's merely updated and if that situation arises again, the agent might choose a different action as a result of the latest update to it.

But, doesn't the whole setup mean that Prf(S)=>S for any statement S of the form A=Ai? Won't it immediately Löb-explode?

*0 points [-]How do you get to Prf(S)=>S? It works with something more like [Prf(S)=>Q]=>S for non-obvious Q. This setup does have a problem with exploding after figuring out how to maximize utility, but it doesn't seem to explode prior to that point (and I hope there might be a less perfectionist variant that gradually improves the situation without exploding at any point, instead of making a single leap to maximal utility value).

No, I mean, the agent actually performs the action when it proved it, right? So Prf(A=A1) implies that A will indeed be =A1. Assuming the agent has its own source, it will know that.

The thing being proved is not the action, it's an undefined propositional constant symbol. The action responds to the fact that the propositional symbol gets proved. See the example at the end of the second section, and keep track of the distinction between

AandB.Thanks, I understand now. The implications:

mislead me. I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:

And then there's no Löbean issues. Cool! This agent

canprove A=Ai without any problems. This should work great for ASP.*0 points [-]They are both, since triggering the moral axiom for

Brequires having those implications in the consequentialist theory (they are part of the definition ofA, andAis part of the definition ofU, so the theory knows them).It does seem that a consequentialist theory could prove what its agents' actions are, if we somehow modify the axiom schema so that it doesn't explode as a result of proving the maximality of

Ufollowing from the statements (likeB) that trigger those actions. At least the old reasons for why this couldn't be done seem to be gone, even if now there are new reasons for why this currently can't be done.