This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.

On spurious arguments

UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.

The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form [(A=A1 => U=U1) and U<=U1]. Here, A is a symbol defined to be the actual action chosen by the agent, U is a similar symbol defined to be the actual value of world's utility, and A1 and U1 are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action A1, thereby making A1 the actual action.

The use of this statement introduces the problem of spurious arguments: if A1 is a bad action, but for some reason it's still chosen, then [(A=A1 => U=U1) and U<=U1] is true, since utility value U will in that case be in fact U1, which justifies (by the decision rule) choosing the bad action A1. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.

Deciding which theorems to make valid

Given that an inference system can overpower decision rules, causing an agent to follow a preference other than its own, perhaps decision-making should be happening inside the inference system in the first place, with agent only following its decisions. What does an inference system decide? Directly, it decides which theorems to have. The set of valid theorems follows deterministically from the axioms, but this is not really a problem, it's possible to make decisions in deterministic settings.

Suppose an inference system wants to decide whether it should have a theorem S. How does it evaluate the consequences of S being its theorem? It can assume that it proved S, see what that would imply, and if it likes the consequences (in comparison to the consequences of proving Not-S, for example), then it concludes S. Decision rules that an inference system follows are the axioms of the theory it works with, so this discussion suggests the following axiom schema (of moral axioms):

For all statements S and possible utility values u,
[(Prf(S) => U=u) and U<=u] => S is an axiom.

(This particular schema has a lot of problems, as discussed below, but seems adequate for communicating the general idea. [Edit: Stuart points out an even worse problem that makes these axioms break for any easily-provably-false S. Not sure what can be salvaged from this problem yet.]) A moral axiom from this schema states that if statement S being provable implies that the best possible utility gets realized, then that statement is declared to be valid.

Suppose that an agent has to choose an action A among possible actions 1 and 2, and wants to follow this theory's decisions. Then all it needs to do is pick a new propositional symbol B and establish the following decision rules:

Prf(B) => A=1
Prf(~B) => A=2

B remains otherwise undefined, its only effect is on our agent, or on definition of A. If it's true that, say, [A=1 => U=10], [A=2 => U=5], and also that in general [U<=10], then Prf(B) implies [U=10], which triggers the moral axiom for statement B and makes it valid/provable. As a result, the agent finds a proof of B and performs action 1.

Agent-less decision theory

This formulation is different from the usual ones in that the consequentialist loop is operated entirely from within an abstract formal system (i.e. not an algorithm). The formal system doesn't have an intended interpretation or a privileged agent (definition of an action) that would enact its decisions. Instead, it looks for all possible agents (actions, facts) that respond to its arguments (and affect its utility value), and supplies the arguments (theorems) according to how those agents respond to various hypothetical arguments. If there are multiple agents that have to be coordinated, that calls for proving a theorem that simultaneously establishes the strategies of all agents involved. And the agents could well use their own inference systems or proof search algorithms.

For an agent, such formal system plays a role of preference, it is an abstract computation that answers the questions about what should be done in each particular situation.

Open problems

The axiom schema [(Prf(S) => U=u) and U<=u] => S is not adequate for many reasons. First, it's only capable of making knowably perfect decisions (which in particular requires utility value to have a reachable upper bound). Second, it introduces a different kind of spurious arguments that make the formal system inconsistent: once a statement S triggers its moral axiom, it follows that Prf(S), and so U=u, which triggers the other moral axioms all at once. This isn't necessarily too bad, since it's irrelevant what happens once utility value is optimal, but it also makes it harder to trigger moral axioms prior to making a decision.

For example, in Wei Dai's variant of coordination game, an agent is given indexical identification 1 or 2, and has to pick among actions A and B in such a way that its versions that observe 1 and 2 pick different actions. A natural way of setting up the agent using a consequentialist theory is to introduce propositional symbols T1 and T2, and establish decision rules

If I observe 1, Prf(T1) => action=A; Prf(~T1) => action=B
If I observe 2, Prf(T2) => action=A; Prf(~T2) => action=B

In this case, if either [T1 and ~T2] or [T2 and ~T1] is a theorem of the formal system, then the two versions of the agent (observing 1 and 2) will achieve the optimal utility value. The problem is that moral axioms for both theorems can be triggered, and if both do get triggered, then quickly absurdity is proved, which makes it hard to predict which actions the agents will actually perform, and what utility would follow from that. And if the formal system can't predict the effect of triggering its moral axioms on utility, it won't trigger the moral axioms, so it's unclear what would actually happen. Perhaps some different clever statement will get proved that would predictably lead to the agent choosing the right actions.

Another issue is that the moral axiom schema should probably only consider theorems of some special kind, and compare their consequences with those of specific other theorems (not just with an unconditional upper bound).

What really changed?

The main technical difference appears to be that instead of using moral arguments of the form [A=A1 => U=U1], this approach uses moral arguments of the form [Prf(A=A1) => U=U1]. As a result, proving A=A2 (for A2<>A1) no longer allows inferring a false antecedent, which in this case is ~Prf(A=A1), and so the usual path to spurious arguments is closed. Perhaps focusing on just this distinction might be more fruitful than paying attention to the surrounding philosophical bells and whistles.

New Comment
21 comments, sorted by Click to highlight new comments since: Today at 9:38 AM

Given that axiom schema, it seems easy for the agent to prove Prf(S) for all S.

Assumption: U is bounded, by some v that is easy to calculate.

Let C=(0=1), a contradiction.

Then consider [(Prf(C) -> U=v) and U<=v] -> C. By assumption U<=v is true and easy, so if Prf(C) is false, then (Prf(C) -> U=v) would be true and so would C. Hence ¬Prf(C) -> C. Taking the contrapositive: ¬C -> Prf(C). Since ¬C is a tautology, this implies Prf(C).

A short search will also produce Prf(¬C). Then for any S, since (¬C and C) -> S, the system can show Prf(S) (I'm assuming it's expressive enough that from Prf(A) and Prf(A->B) it can get Prf(B)).

Don't know if this blows up the system yet, but the fact that the system can prove all Prf(S) hints that something weird may be going on...

EDIT: Actually, here is how you blow up the system. Since it can demonstrate that Prf(S) is true, the axiom [(Prf(S) -> U=u) and U<=u] -> S reduces to U=u -> S. So as long as you can show that U takes one of finitely many values, you can prove any S (and if the system is omega-consistent, it's already blown up).

You're right, this shows that the moral axioms as stated don't work. Essentially [(Prf(C) -> U=v) and U<=v] -> C simplifies to (Prf(C) -> U=v) -> C, and if C is absurdity, then ~(Prf(C) -> U=v), that is (~U=v and Prf(C)). Both Prf(C) and ~U=v shouldn't hold. Thus, moral axioms in the present form shouldn't be added for any easily-provably-false statements. Will try to figure out if the damage can be contained.

(Updated the post and its summary to mention the problem.)

One immediate idea is to replace the conditional [(Prf(S) -> U=u) and U<=u] -> S with the rule of inference "from [(Prf(S) -> U=u) and U<=u], deduce S". That way you can't get a contrapositive, and you probably need to get Loebian to hope to find a contradiction.

Not confident at all that would work, though.

Yes, that was the intention, and the problem is that the implication can be tugged from the wrong side, but implication can't be one-sided. I'd prefer to stay with standard inference rules though, if at all possible.

Pulling on one side but not the other seems textbook of what relevance logics were designed for.

Would restricting the axiom schema to content-less proposition symbols like "B" solve the problem?

*stares at a huge inferential expanse... *Wonders if there is a simple example or two that can make it a bit less so.

“I was like a boy playing on the sea-shore, and diverting myself now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.”

(Now why the hell would that be downvoted?)

[This comment is no longer endorsed by its author]Reply

(Now why the hell would that be downvoted?)

It doesn't seem to address the comment it is a response to?

Um, it does: I was ironically comparing Newton's sentiment to the frustration seemingly expressed in shminux's comment?

[This comment is no longer endorsed by its author]Reply

I didn't vote on your comments, but I assume the person (?) who did thought you should have given an example of the kind shminux wanted instead.

Why isn't building a decision theory equivalent to building a whole AI from scratch?

[-][anonymous]12y00

When I write the program, "If a sensor is activated, then energize a light", I'm using a counterfactual to compactly specify a causal dependency that I want a software system to build into a hardware circuit. If the computation by which an agent influences a physical observable (corresponding to its decision) takes into account the environment's dependence on the value of that observable, the agent can similarly model the environment's dependence on its decision via counterfactuals of the form "if decision=d, then outcome=u". It can also model how its decision depends on environmental consequences via its decision computation using a counterfactual: "if the computation which finds the decision value resulting in a world with maximal utility (though the environment's dependence on my decision)" = Make burritos, then decision=Make burritos".

Material implication is usually read as "if x, then y", which is superficially similar to "if sensor activated, then energize a light", but as near as I can tell, it doesn't capture counterfactual relations at all: "antecedent or not consequent" has practically nothing to do with causal relations by which values are assigned to observables. But If you want to reason about consequential decisions, you need consequence simulating machinery.

It looks like you're trying to rebuild that consequential machinery in logic from scratch. I don't understand provability logic, and can't competently evaluate whether giving an agent procedures which work by never being evaluated or the trick you're using in this post will allow material implication to behave usefully. I have heard however, that the Curry-Howard isomorphism establishes strong correspondences between proofs and programs, so it seems to me that if evaluating counterfactuals is not as a problem for decision theories that use Pearl's causal calculus or the counterfactuals built into programming languages (or universal logic gates, or whatever), then it shouldn't be a problem for logical decision theories: just directly represent Pearl's already mathematically well specified causal calculus (or maybe lambda calculus, Hoare logic, or circuits of controlled not gates) in logic.

Is that possible? Sensible? More difficult? Something?

[This comment is no longer endorsed by its author]Reply

I don't understand how "undefined propositional symbols" make coordination happen. In a PD played between two similar agents, will they figure out that their actions depend on the same undefined symbol?

So I think my basic problem here is I'm not familiar with this construct for decision making or why it would be favored over others. Specifically, why make logical rules about which actions to take? Why not take an MDP value-learning approach where the agent chooses an action based on which action has the highest predicted utility. If the estimate is bad, it's merely updated and if that situation arises again, the agent might choose a different action as a result of the latest update to it.

But, doesn't the whole setup mean that Prf(S)=>S for any statement S of the form A=Ai? Won't it immediately Löb-explode?

How do you get to Prf(S)=>S? It works with something more like [Prf(S)=>Q]=>S for non-obvious Q. This setup does have a problem with exploding after figuring out how to maximize utility, but it doesn't seem to explode prior to that point (and I hope there might be a less perfectionist variant that gradually improves the situation without exploding at any point, instead of making a single leap to maximal utility value).

No, I mean, the agent actually performs the action when it proved it, right? So Prf(A=A1) implies that A will indeed be =A1. Assuming the agent has its own source, it will know that.

The thing being proved is not the action, it's an undefined propositional constant symbol. The action responds to the fact that the propositional symbol gets proved. See the example at the end of the second section, and keep track of the distinction between A and B.

Thanks, I understand now. The implications:

Prf(B) => A=1
Prf(~B) => A=2

mislead me. I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:

A():
   generate various proofs
      if found a proof of B return 1
      if found a proof of ~B return 2

And then there's no Löbean issues. Cool! This agent can prove A=Ai without any problems. This should work great for ASP.

I thought they were another set of on-the-fly axioms, but if they are decision rules, this means something like:

They are both, since triggering the moral axiom for B requires having those implications in the consequentialist theory (they are part of the definition of A, and A is part of the definition of U, so the theory knows them).

It does seem that a consequentialist theory could prove what its agents' actions are, if we somehow modify the axiom schema so that it doesn't explode as a result of proving the maximality of U following from the statements (like B) that trigger those actions. At least the old reasons for why this couldn't be done seem to be gone, even if now there are new reasons for why this currently can't be done.