LDT (and everything else) can be irrational

Christopher King

LESSWRONG
LW

LDT (and everything else) can be irrational — LessWrong

14 LDT (and everything else) can be irrational

by Christopher King

6th Nov 2024

2 min read

14

you should not reject the 'offer' of a field that yields an 'unfair' amount of grain! - Ultimatum Game (Arbital)

In this post, I demonstrate a problem in which there is an agent that outperforms Logical Decision Theory, and show how for any agent you can construct a problem and competing agent that outperforms it. Defining rationality as winning, this means that no agent is rational in every problem.

Symmetrical Ultimatum Game

We consider a slight variation on the ultimatum game to make it completely symmetrical. The symmetrical ultimatum game is a two-player game in which each players says how much money they want. The amount is a positive integer number of dollars. If the sum is ≤$10, both players get the amount of money they choose. Otherwise, they both get nothing.

Now consider the decision problem of playing the symmetrical ultimatum game against a logical decision theorist.

A casual decision theorist does particularly poorly in this problem, since the LDT agent always chooses $9 leaving the casual decision theorist with $1.

How does a LDT agent fare? Well, logical decision theory is still a bit underspecified. However, notice that this question reduces to "how does a LDT agent do against a LDT agent in a symmetrical game?". Without knowing any details about LDT, we must conclude that the expected value is at most $5.

What about a rock with $9 painted on it? The LDT agent in the problem reasons that the best action is to choose $1, so the rock gets $9.

Thus, $9 rock is more rational than LDT in this problem. □

You can't become $9 rock

Now, what makes this problem particularly difficult is how picky the LDT agent in the problem is. If based on the previous you decide to "become $9 rock", the LDT agent will defect against you. If based on the previous section you build a robot that always chooses $9, the LDT agent will defect against that robot. Only a truly natural $9 rock can win.

No agent is rational in every problem

Consider an agent X. There are two cases:

Against $9 rock, X always chooses $1. Consider the problem "symmetrical ultimatum game against X". By symmetry, X on average can get at most $5. But $9 rock always gets $9. So $9 rock is more rational than X.

Against $9 rock, X sometimes chooses more than $1 (thus getting nothing). Consider the problem "symmetrical ultimatum game against $9 rock". X on average gets less than $1. But an agent that always picks $1 (that is, a $1 rock) always gets $1. So $1 rock is more rational than X.

□

Implications

I still have an intuition that LDT is the "best" decision theory so far. See
Integrity for consequentialists for practical benefits of a LDT style of decision making.

However, there can be no theorem that LDT is always rational, since it isn't. And replacing LDT with a different agent can not fix the problem. Notice that, as a special case, humans can never be rational.

This seems to suggest some sort of reformulation of rationality is needed. For example, given LDT's reasonableness, one option is to violate the thesis of Newcomb's Problem and Regret of Rationality and simply define rationality to be LDT.

Decision theoryCoordination / CooperationFairnessParadoxesRationalityAI

Frontpage

14

LDT (and everything else) can be irrational

New Comment

15 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:27 AM

[-]Mikhail Samin1y125

Playing ultimatum game against an agent that gives in to $9 from rocks but not from us is not in the fair problem class, as the payoffs depend directly on our algorithm and not just on our choices and policies.

https://arbital.com/p/fair_problem_class/

A simpler game is “if you implement or have ever implemented LDT, you get $0; otherwise, you get $100”.

LDT decision theories are probably the best decision theories for problems in the fair problem class.

(Very cool that you’ve arrived at the idea of this post independently!)

[-]Christopher King1y1-3

LDT decision theories are probably the best decision theories for problems in the fair problem class.

The post demonstrates why this statement is misleading.

If "play the ultimatum game against a LDT agent" is not in the fair problem class, I'd say that LDT shouldn't be in the "fair agent class". It is like saying that in a tortoise-only race, the best racer is a hare because a hare can beat all the tortoises.

So based on the definitions you gave I'd classify "LDT is the best decision theory for problems in the fair problem class" as not even wrong.

In particular, consider a class of allowable problems S, but then also say that an agent X is allowable only if "play a given game with X" is in S. Then the proof in the No agent is rational in every problem section of my proof goes through for allowable agents. (Note that that argument in that section is general enough to apply to agents that don't give into $9 rock.)

Practically speaking: if you're trying to follow decision theory X, than playing against other X is a reasonable problem

[-]Mikhail Samin1y50

It’s reasonable to consider two agents playing against each other. “Playing against your copy” is a reasonable problem. ($9 rocks get 0 in this problem, LDTs probably get $5.)

Newcomb, Parfit’s hitchhiker, smoking, etc. are all very reasonable problems that essentially depend on the buttons you press when you play the game. It is important to get these problems right.

But playing against LDT is not necessarily in the “fair problem class” because the game might behave differently depending on your algorithm/on how you arrive at taking actions, and not just depending on your actions.

Your version of it- playing against an LDT- is indeed different from playing against a game that looks at whether we’re an alphabetizing agent and pick X instead of Y because X<Y and not because we looked at the expected utility: we would want LDT to perform optimally in this game. But the reason LDT-created-rock loses to a natural rock here isn’t fundamentally different from the reason LDT loses to an alphabetizing agent in the other game and it is known that you can construct a game like that where LDT will lose to something else. You can make the game description sound more natural, but I feel like there’s a sharp divide between the “fair problem class” problems and others.

(I also think that in real life, where this game might play out, there isn’t really a choice we can make, to make our AI a $9 rock instead of an LDT agent; because when we do that due to the rock’s better performance in this game, our rock gets slightly less than $5 in EV instead of getting $9; LDT doesn’t perform worse than other agents we could’ve chosen in this game.)

[-]Davidmanheim11mo20

As I commented on your other post, there are two possibilities; either the universe is murphy-like and pessimizing your outcome, in which case sure, you might be in a worst case universe, and there is a bound on how well you can do, or some agent sent the rock, in which case you are playing the game against that agent, and would know that fact. Or, as I mentioned, yes, an ultrapowerful system can create situations where you lose because it can fool you, lying to you perfectly, and that is equivalent to a worst-case universe.

[-]Tapatakt10mo80

Against $9 rock, X always chooses $1. Consider the problem "symmetrical ultimatum game against X". By symmetry, X on average can get at most $5. But $9 rock always gets $9. So $9 rock is more rational than X.

I don't like the implied requirement "to be rational you must play at least as good as the opponent" instead of "to be rational you must play at least as good as any other agent in your place". $9 rock gets $0 if it plays against $9 rock.

(No objection to overall no-free-lunch conclusion, though)

[-]Christopher King10mo10

In the first case, the problem is "symmetrical ultimatum game against X", in which $9 rock does get $9.

In the second case you are correct, in the problem "symmetrical ultimatum game against $9 rock" $9 rock gets $0.

[-]quetzal_rainbow1y56

It's just no free lunch theorem? For every computable decision procedure you can construct environment which predicts exact output for this decision procedure and reacts in way of maximum damage, making decision procedure to perform worse than random action selection.

[-]Noosphere891y20

Yeah, the main takeaway is that you must have all decision procedures as look-up tables, in the worst case.

[-]Christopher King1y21

Yes this would be a no free lunch theorem for decision theory.

It is different from the "No free lunch in search and optimization" theorem though. I think people had an intuition that LDT will never regret its decision theory, because if there is a better decision theory than LDT will just copy it. You can think of this as LDT acting as tho it could self-modify. So the belief (which I am debunking) is that the environment can never punish the LDT agent; it just pretends to be the environment's favorite agent.

The issue with this argument is that in the problem I published above, the problem itself contains a LDT agent, and that LDT agent can "punish" the first for acting like, or even pre-committing to, or even literally self-modifying to become $9 rock. It knows that the first agent didn't have to do that.

So the first LDT agent will literally regret not being hardcoded to "output $9".

This is very robust to what we "allow" agents to do (can they predict each other, how accurately can they predict each other, what counterfactuals are legit or not, etc...), because no matter what the rules are you can't get more than $5 in expectation in a mirror match.

[-]Dagon1y32

This is an important theorem. There is no perfect decision theory, especially against equal-or-better opponents. I tend to frame it as "the better predictor wins". Almost all such adversarial/fixed-sum cases are about power, not fairness or static strategy/mechanism.

We (humans, including very smart theorists) REALLY want to frame it as clever ways to get outcomes that fit our intuitions. But it's still all about "who goes first (in the logical/credible-committment sense)".

[-]Davidmanheim11mo20

As I commented on another post, It seems Eliezer already addressed the specific claim you made here via probabilistic LDT solutions, as Mikhail explained there, and in a comment here. (And the quoted solution was written before you wrote this post.)

Is there a version that the modification explains there fails to address?

[-]Christopher King11mo10

Towards the end of the post in the No agent is rational in every problem section, I provided a more general argument. I was assuming LDT would fall under case 1, but if not then case 2 demonstrates it is irrational.

[-]Davidmanheim11mo10

"More rational in a given case" isn't more rational! You might as well say it's more rational to buy a given lottery ticket because it's the winning ticket.

[-]Christopher King11mo10

I'm assuming the LDT agent knows what the game is and who their opponent is.

[-]Davidmanheim11mo20

But you really aren't assuming that, you're doing something much stranger.

Either the actual opponent is a rock, in which case it gains nothing from "winning" the game, and there's no such thing as being more or less rational than something without preferences, or the actual opponent is the agent who wrote the number on the rock and put it in front of the agent, in which case the example fails because the game actually started with an agent explicitly trying to manipulate the LDT agent into underperforming.

Moderation Log