How to Give in to Threats (without incentivizing them)

Mikhail Samin

LESSWRONG
LW

How to Give in to Threats (without incentivizing them) — LessWrong

73 How to Give in to Threats (without incentivizing them)

by Mikhail Samin

12th Sep 2024

6 min read

73

(available as a paper.)

TL;DR: using a simple mixed strategy, LDT can give in to threats, ultimatums, and commitments - while incentivizing cooperation and fair^[1] splits instead.

This strategy made it much more intuitive to many people I've talked to that smart agents probably won't do weird everyone's-utility-eating things like threatening each other or participating in commitment races.

1. The Ultimatum game

This part is taken from planecrash^[2]^[3].

You're in the Ultimatum game. You're offered 0-10 dollars. You can accept or reject the offer. If you accept, you get what's offered, and the offerer gets $(10-offer). If you reject, both you and the offerer get nothing.

The simplest strategy that incentivizes fair splits is to accept everything ≥ 5 and reject everything < 5. The offerer can't do better than by offering you 5. If you accepted offers of 1, the offerer that knows this would always offer you 1 and get 9, instead of being incentivized to give you 5. Being unexploitable in the sense of incentivizing fair splits is a very important property that your strategy might have.

With the simplest strategy, if you're offered 5..10, you get 5..10; if you're offered 0..4, you get 0 in expectation.

Can you do better than that? What is a strategy that you could use that would get more than 0 in expectation if you're offered 1..4, while still being unexploitable (i.e., still incentivizing splits of at least 5)?

I encourage you to stop here and try to come up with a strategy before continuing.

The solution, explained by Yudkowsky in planecrash (children split 12 jellychips, so the offers are 0..12):

When the children return the next day, the older children tell them the correct solution to the original Ultimatum Game.
It goes like this:
When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 6/7 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 6/7, or slightly less than 6. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation. It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.
If they offer you 8:4, accept with probability slightly-more-less than 6/8, so they do even worse in their own expectation by offering you 8:4 than 7:5.
It's not about retaliating harder, the harder they hit you with an unfair price - that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. This setup isn't about retaliation, it's about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don't expect to do any better by distorting their own estimate of what is 'fair'.
[The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it's been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.
The gotcha is, the 2-player puzzle-game isn't always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.]
They play the 2-station video games again. There's less anger and shouting this time. Sometimes, somebody rolls a continuous-die and then rejects somebody's offer, but whoever gets rejected knows that they're not being punished. Everybody is just following the Algorithm. Your notion of fairness didn't match their notion of fairness, and they did what the Algorithm says to do in that case, but they know you didn't mean anything by it, because they know you know they're following the Algorithm, so they know you know you don't have any incentive to distort your own estimate of what's fair, so they know you weren't trying to get away with anything, and you know they know that, and you know they're not trying to punish you. You can already foresee the part where you're going to be asked to play this game for longer, until fewer offers get rejected, as people learn to converge on a shared idea of what is fair.
Sometimes you offer the other kid an extra jellychip, when you're not sure yourself, to make sure they don't reject you. Sometimes they accept your offer and then toss a jellychip back to you, because they think you offered more than was fair. It's not how the game would be played between dath ilan and true aliens, but it's often how the game is played in real life. In dath ilan, that is.

This allows even very different agents with very different notions of fairness to cooperate most of the time.

So, if in the game with $0..10, you're offered $4 instead of the fair $5, you understand that if you accept, the other player will get $6 - and so you accept with the probability of slightly less than 5/6, making the other player receive, in expectation, slightly less than the fair $5. You still get $4 most of the time when you're offered this unfair split, but you're incentivizing fair splits. Even if you're offered $1, you accept slightly less than in 5/9 cases - which is more than half of the time, but still incentivizes offering you the fair 5-5 split instead.

If the other player makes a commitment to offer you $4 regardless of what you do, it simply doesn't change what you do when you're offered $4. You want to accept $4 with regardless of what led to this offer. Otherwise, you'll incentivize offers of $4 instead of $5. This means other players don't make bad commitments (and if they do, you usually give in).

(This is symmetrical. If you're the offerer, and the other player accepts only at least $6 and always rejects $5 or lower, you can offer $6 with p=5/6-e or otherwise offer less and be rejected.)

2. Threats, commitments, and ultimatums

You can follow the same procedure in all games. Figure out the fair split of gains, then try to coordinate on it; if the other agent is not willing to agree to the fair split and demands something else, agree to their ultimatum probabilistically, in a way that incentivizes the fair split instead.

2.1 Game of Chicken

Let's say the payoff matrix is:

-100, -100	5, -1
-1, 5	0, 0

Let's assume we consider the fair split in this game to be 2, you can achieve it by coordinating on throwing a fair coin to determine who does what.

If the other player instead commits to not swerve, you calculate that if you give in, they get 5; the fair payoff is 2; so you simply give in and swerve with p=97%, making the other player get less than 2 in expectation; they would've done better by cooperating. Note that this decision procedure is much better than never giving in to threats - which would correspond to getting -100 every time instead of just 3% of the time - while still having the property that it's better for everyone to not threaten you at all.

2.2 Stones

If the other player is a stone^[4] with "Threat" written on it, you should do the same thing, even if it looks like the stone's behavior doesn't depend on what you'll do in response. Responding to actions and ignoring the internals when threatened means you'll get a lot fewer stones thrown at you.

2.3 What if I don't know the other player's payoffs?

You want to make decisions that don’t incentivize threatening you. If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat! (If you have some information, you can transparently give in with a probability low enough that you're certain transparently making decisions this way isn't incentivizing this threat.)

2.4 What if the other player makes a commitment before I make any decisions?

Even without the above strategy, why would this matter? You can just make the right decisions you want to make. You can use information when you want to be using it and not use it when it doesn't make sense to use it. The time at which you receive the information doesn't have to be an input into what you consider if you think it doesn't matter when you receive it.

With the above algorithm, if you receive a threat, you simply look at it and give in to it most of the time in many games, all while incentivizing not threatening you, because the other player can get more utility if they don't threaten you.

(In reality, making decisions this way means you'll rarely receive threats. In most games, you'll coordinate with the other player on extracting the most utility. Agents will look at you, understand that threatening you means less utility, and you won't have to spend time googling random number generators and probabilistically giving in. It doesn't make sense for the other agent to make threatening commitments; and if they do, it's slightly bad for them.

It's never a good idea to threaten an LDT agent.)

^{^}
Humans might use the Shapley value, the ROSE value, or their intuitive feeling of fairness. Other agents might use very different notions of fairness.
^{^}
See ProjectLawful.com: Eliezer's latest story, past 1M words.
^{^}
The idea of unexploitable cooperation with agents with different notions of fairness seems to have first been introduced by @Eliezer Yudkowsky in this 2013 post, with agents accepting unfair (according to them) bargains in which the other agent does worse than in the fair point on the Pareto frontier; but it didn’t suggest accepting unfair bargains probabilistically, to create new points where the other agent does just slightly worse in expectation than it would’ve in the fair point. One of the comments almost got there, but didn’t suggest adding $- ϵ$ to the giving-in probability, so the result was considered exploitable (as the other agent was indifferent between making a threat and accepting the fair bargain).
See also the Arbital page on the Ultimatum game.
^{^}
A player with very deterministic behavior in a game with known payoffs, named this way after the idea of cooperate-stones in prisoner’s dilemma (with known payoffs).

Blackmail / ExtortionDecision theoryAI

Frontpage

73

New Comment

34 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:20 PM

[-]Wei Dai1y4713

If the other player is a stone with “Threat” written on it, you should do the same thing, even if it looks like the stone’s behavior doesn’t depend on what you’ll do in response. Responding to actions and ignoring the internals when threatened means you’ll get a lot fewer stones thrown at you.

In order to "do the same thing" you either need the other's player's payoffs, or according to the next section "If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat!" So if all you see is a stone, then presumably you don't know the other agent's payoffs, so presumably "do the same thing" means "don't give in".

But that doesn't make sense because suppose you're driving and suddenly a boulder rolls towards you. You're going to "give in" and swerve, right? What if it's an animal running towards you and you know they're too dumb to do LDT-like reasoning or model your thoughts in their head, you're also going to swerve, right? So there's still a puzzle here where agents have an incentive to make themselves look like a stone (i.e., part of nature or not an agent), or to never use LDT or model others in any detail.

Another problem is, do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

[-]Thomas Kwa1y70

do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

I recall Eliezer saying this was an open problem, at a party about a year ago.

[-]Davidmanheim11mo40

I'm a bit confused how this is a problem.

Either there is an agent that stands to benefit from my acceding to a threat, or there is not. If an agent "sufficiently" turns itself into a rock for a single interaction, but reaps the benefit as an agent, it's a full-fledged agent. Same if it sends a minion, where the relevant agent is the one who sent the rock, not the rock. And if we have uncertainty about the situation, that's part of the game.

If the question is whether other players can deceive you about the nature of the game or the probabilities, sure, that is a possibility, but it is not really a question about LDT, it's just a question about whether we should expand every decision into a recursive web of uncertainties about all other possible agents - and, I suspect, come to the conclusion that smarter agents can likely fool you, and you shouldn't allow others with misaligned incentives to manipulate your information environment, especially if they have more optimization power than you do. But as we all should know, once we make misaligned super-intelligent systems, we stop being meaningful players anyways.

In this world, maybe you want to suppose the agent's terminal value is to cause me to pay some fixed cost, and it permanently disables itself to that end - but that makes it either a minion sent by something else, or a natural feature of a Murphy-like universe where you started out screwed, in which case you should treat the natural environment as an adversary. But that's not our situation, again, at least until ASI shows up.

cc: @Mikhail Samin - does that seem right to you?

[-]Mikhail Samin11mo20

Seems right!

[-]Mikhail Samin1y*31

By a stone, I meant a player with very deterministic behavior in a game with known payoffs, named this way after the idea of cooperate-stones in prisoner’s dilemma (with known payoffs).

I think to the extent there’s no relationship between giving in to a boulder/implemeting some particular decision theory and having this and other boulders thrown at you, UDT and FDT by default swerve (and probably don't consider the boulders to be threatening them, and it’s not very clear in what sense this is “giving in”); to the extent it sends more boulders their way, they don’t swerve.

If making decisions some way incentivizes other agents to become less like LDTs and more like uncooperative boulders, you can simply not make decisions that way. (If some agents actually have an ability to turn into animals and you can’t distinguish the causes behind an animal running at you, you can sometimes probabilistically take out your anti-animal gun and put them to sleep.)

Do you maybe have a realistic example where this would realistically be a problem?

I’d be moderately surprised if UDT/FDT consider something to be a better policy than what’s described in the post.

Edit: to add, LDTs don't swerve to boulders that were created to influence the LDT agent's responses. If you turn into a boulder because you expect some agents among all possible agents to swerve, this is a threat, and LDTs don't give in to those boulders (and it doesn't matter whether or not you tried to predict the behavior of LDTs in particular). If you believed LDT agents or agents in general would swerve against a boulder, and that made you become a boulder, LDT agents obviously don't swerve to that boulder. They might swerve to boulders that are actually natural boulders caused by the very simple physics no one influenced to cause the agents to do something. They also pay their rent- because they'd be evicted otherwise, not for the reason of getting rent from them under the threat of eviction but for the reason of getting rent from someone else, and they're sure there were no self-modifications to make it look this way.

[-]Richard_Kennaway1y40

If making decisions some way incentivizes other agents to become less like LDTs and more like uncooperative boulders, you can simply not make decisions that way.

Another way that those agents might handle the situation is not to become boulders themselves, but to send boulders to make the offer. That is, send a minion to present the offer without any authority to discuss terms. I believe this often happens in the real world, e.g. customer service staff whose main goal, for their own continued employment, is to send the aggrieved customer away empty-handed and never refer the call upwards.

[-]Mikhail Samin1y32

A smart agent can simply make decisions like a negotiator with restrictions on the kinds of terms it can accept, without having to spawn a "boulder" to do that.

You can just do the correct thing, without having to separate yourself into parts that do things correctly and a part that tries to not look at the world and spawns correct-thing-doers.

In Parfit's Hitchhiker, you can just pay once you're there, without precommiting/rewriting yourself into an agent that pays. You can just do the thing that wins.

Some agents can't do the things that win and would have to rewrite themselves into something better and still lose in some problems, but you can be an agent that wins, and gradient descent probably crystallizes something that wins into what is making the decisions in smart enough things.

[-]Christopher King1y10

Another problem is, do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

There is a no free lunch theorem for this. LDT (and everything else) can be irrational

[-]Davidmanheim11mo42

I think your post misses the point made here.

What about a rock with $9 painted on it? The LDT agent in the problem reasons that the best action is to choose $1, so the rock gets $9.
Thus, $9 rock is more rational than LDT in this problem.

The solution above addresses this; by playing probabilistically, the rock gets a payoff somewhat less than $5 in expectation, so it does worse than an LDT agent.

[-]Christopher King11mo10

Towards the end of the post in the No agent is rational in every problem section, I provided a more general argument. I was assuming LDT would fall under case 1, but if not case 2 will demonstrate it is irrational.

[-]JamesFaville1y278

It's definitely not clear to me that updatelessness + Yudkowsky's solution prevent threats. The core issue is that a target and a threatener face a prima facie symmetric decision problem of whether to use strategies that depend on their counterpart's strategy or strategies that do not depend on their counterpart's strategy.^[1]

In other words, the incentive targets have to use non-dependent strategies that incentivise favourable (no-threat) responses from threateners is the same incentive threateners have to use non-dependent strategies that incentivise favourable (give-into-threat) responses from targets. This problem is discussed in more detail in parts of Responses to apparent rationalist confusions about game / decision theory and in Updatelessness doesn't solve most problems.

There are potential symmetry breakers that privilege a no-threat equilibrium, such as the potential for cooperation between different targets. However, there are also potential symmetry breakers in the other direction. I expect Yudkowsky is aware of the symmetry of this problem and either thinks the symmetry breakers in favour of no-threats seem very strong, or is just very confident in the superintelligences-should-figure-this-stuff-out heuristic. Relatedly, this post argues that mutually transparent agents should be able to avoid most of the harm of threats being executed, even if they are unable to avoid threats from being made.

But these are different arguments to the one you make here, and I'm personally unconvinced even these arguments are strong enough that it's not very important for us to work on preventing harmful threats from being made by or against AIs that humanity deploys.

FYI A lot of Center on Long-Term Risk's research is motivated by this problem; I suggest people reach out to us if you're interested in working on it!

^{^}
Examples of non-dependent strategies would include
- Refusing all threats regardless of why they were made
- Refusing threats to the extent prescribed by Yudkowsky's solution regardless of why they were made
- Making threats regardless of a target's refusal strategy when the target is incentivised to give in
An example of a dependent strategy would be
- Refusing threats more often when a threatener accurately predicted whether or not you would refuse in order to determine whether to make a threat; and refusing threats less often when they did not predict you, or did so less accurately

[-]James Payor1y71

For posterity, and if it's of interest to you, my current sense on this stuff is that we should basically throw out the frame of "incentivizing" when it comes to respectful interactions between agents or agent-like processes. This is because regardless of whether it's more like a threat or a cooperation-enabler, there's still an element of manipulation that I don't think belongs in multi-agent interactions we (or our AI systems) should consent to.

I can't be formal about what I want instead, but I'll use the term "negotiation" for what I think is more respectful. In negotiation there is more of a dialogue that supports choices to be made in an informed way, and there is less this element of trying to get ahead of your trading partner by messing with the world such that their "values" will cause them to want to do what you want them to do.

I will note that this "negotiation" doesn't necessarily have to take place in literal time and space. There can be processes of agents thinking about each other that resemble negotiation and qualify to me as respectful, even without a physical conversation. What matters, I think, is whether the logical process that lead to an another agent's choices can be seen in this light.

And I think in cases when another agent is "incentivizing" my cooperation in a way that I actually like, it is exactly when the process was considering what the outcome would be of a negotiating process that respected me.

[-]Multicore1y910

If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat!

With an important caveat: if carrying out the threat doesn't cost the threatener utility relative to never making the threat, then it's not a threat, just a promise (a promise to do whatever is locally in their best interests, whether you do the thing they demanded or not).

You're going to have a bad time if you try to live out LDT by ignoring threats, and end up ignoring "threats" like "pay your mortgage or we'll repossess your house".

[-]Mikhail Samin1y22

Yep! If someone is doing things because it's in their best interests and not to make you do something (and they're not a result of someone else shaping themselves into them to cause you do something, whereas some previous agent wouldn't actually prefer the thing the new one prefers, that you don't want to happen), then this is not a threat.

[-]Richard_Kennaway1y42

Not having read that part of planecrash, the solution I immediately thought of, just because it seemed so neat, was that if offered a fraction of the money, accept with probability $p$ . The other player’s expectation is $p (1 - p)$ , maximised at $p = 1 / 2$ . Is Eliezer’s solution better than mine, or mine better than his?

One way in which Eliezer’s is better is that mine does not have an immediate generalisation to all threat games.

[-]Mikhail Samin1y71

Your solution works! It's not exploitable, and you get much more than 0 in expectation! Congrats!

Eliezer's solution is better/optimal in the sense that it accepts with the highest probability a strategy can use without becoming exploitable. If offered 4/10, you accept with p=40%; the optimal solution accepts with p=83% (or slightly less than 5/6); if offered 1/10, it's p=10% vs. p=55%. The other player's payout is still maximized at 5, but everyone gets the payout a lot more often!

[-]Orioth6mo30

When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 6/7 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 6/7, or slightly less than 6. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation. It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.
If they offer you 8:4, accept with probability slightly-more-less than 6/8, so they do even worse in their own expectation by offering you 8:4 than 7:5.

Having thought about this strategy a bit, one thing I remain uncertain about is how to generalize it to a situation where the other party can try offering you a better deal if you reject the initial offer (or one where you get to make counteroffers, for that matter, which might or might not have the same problem).

If you evaluate subsequent offers independently, then it would seem to be in their interest to open with an offer of 1:11. With slightly less than 6/11 probability, you accept and they profit; with slightly greater than 6/11 probability, you say "no deal" and they say "fine, how about 2:10", and so on until a deal is reached, which will never be better for you than fair, and may be worse.

So obviously you won't do that.

In this scenario, it's a pretty transparent exploit, and you can just consider yourself committed to not even consider subsequent worse-than-fair offers after you've rejected the first one. But in real life, there may be situations where there's a perfectly legitimate reason for your counterparty to make you a better offer once you've rejected their initial one, and committing to reject those seems like it'll result in missed opportunities.

[-]Mikhail Samin6mo20

you can just consider yourself committed to not even consider subsequent worse-than-fair offers after you've rejected the first one

Note that if you do that, I can exploit you by offering 9:1 split, getting it accepted 50% of the time, and getting a 5:5 split another 50% of the time, which leaves me with 7 in expectation (and you with 3 in expectation).

If you interact with the kind of entity that offers you a better deal if you reject their first offer, and they make an offer before you explain what you'll do, you just always reject their first offer, explain your strategy to them, and say that it's in their interest to make a fair offer, and if you probabilistically reject that one, they won't be able to make more offers.

Or, a thing you can do in reality might be, e.g., deterministically rejecting all of their initial offers, making counteroffers that are better-than-fair for you, and bargaining this way until they reach a point where they don't go lower because this is their best offer, and then probabilistically accepting or rejecting that best offer.

[-]Orioth6mo10

If you interact with the kind of entity that offers you a better deal if you reject their first offer, and they make an offer before you explain what you'll do, you just always reject their first offer, explain your strategy to them, and say that it's in their interest to make a fair offer, and if you probabilistically reject that one, they won't be able to make more offers.

I'm not convinced that this remains optimal when you're bargaining with an entity that might offer you a better deal if you reject their first offer for legitimate reasons, rather than in an attempt to exploit your bargaining strategy.

Suppose a deal falls through, your counterparty tries their BATNA, it goes worse than they thought, so they come back and make a better offer than they did the first time. Or maybe they, or you, learn something new regarding the value of the goods being negotiated for. Shouldn't it be possible to take this into account and reopen negotiations? If not, a lot of mutually beneficial trades become impossible.

But I don't see a specific strategy that allows for this without being exploitable, since you can't necessarily tell whether a new offer is prompted by new information or whether it's just a bargaining tactic.

[-]Eli Tyre1y20

Thank you for writing this!

I'm thinking about how one might apply this policy in real life.

Though my first thought is that I'm not sure that I'm regularly the target of threats? This seems like maybe a solution to a problem that I mostly don't have.

The government threatens to put me in jail if I don't pay taxes, I guess.
- That case is a sort of mix between a willing trade, a solution to a collective action problem, and a threat, such that I'm not sure how best to think about it.
- And realistically, I don't think it actually maximizes a citizen's EV to not pay my taxes with some probability. to incentivize a government not to tax them. Governments have an overwhelming hard power advantage, and else is capitulating to the pseudo-threat, so they would just crush any given tax-non-payer. (I'm not sure what this means about the payoff table—maybe that the "citizen doesn't pay"+"Government arrests" quadrant is way way lower utility for the citizen than it is for the government, on the margin, though it might be different if enough people coordinated.)
There's sort of a generalized miasmic threat not to state true, tabboo, facts, or a twitter mob might attempt to ruin my life.
- But again, it doesn't seem like it would actually help me at all to capitulate with some probability. There's not really any way that I can strike back at a twitter mob, to disincentive them.

Are there are places where I am, or could be, the target of a clear cut threat?

[-]Mikhail Samin1y20

Thanks!

The post is mostly trying to imply things about AI systems and agents in a larger universe, like “aliens and AIs usually coordinate with other aliens annd AIs, and ~no commitment races happen”.

For humans, it’s applicable to bargaining and threat-shape situations. I think bargaining situations are common; clearly threat-shaped situations are rarer.

I think while taxes in our world are somewhat threat-shaped, it’s not clear they’re “unfair”- I think we want everyone to pay them so that good governments work and provide value. But if you think taxes are unfair, you can leave the country and pay some different taxes somewhere else instead of going to jail.

The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people: it makes sense to prevent people from committing more crimes by putting them in jails or not trading with them, but the threat of punishment that exists only to prevent an agent from doing something won’t work on smarter agents.

[-]Eli Tyre1y20

But if you think taxes are unfair, you can leave the country and pay some different taxes somewhere else instead of going to jail.

It's quite difficult to do that in the US, at least. You pay taxes if you're a citizen, even if you're not a resident, and you're required to pay taxes for the 10 years following your renouncing citizenship.

As far as I know, there's no way for US citizens to leave the US tax regime within a decade.

[-]Bunthut1y10

The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people

This is one of two claims here that I'm not convinced by. Informal disproof: If you are a smart individual in todays society, you shouldn't ignore threats of punishment, because it is in the states interest to follow through anyway, pour encourager les autres. If crime prevention is in peoples interest, intelligence monotonicity implies that a smart population should be able to make punishment work at least this well. Now I don't trust intelligence monotonicity, but I don't trust it's negation either.

The second one is:

You can already foresee the part where you're going to be asked to play this game for longer, until fewer offers get rejected, as people learn to converge on a shared idea of what is fair.

Should you update your idea of fairness if you get rejected often? It's not clear to me that that doesn't make you exploitable again. And I think this is very important to your claim about not burning utility: In the case of the ultimatum game, Eliezers strategy burns very little over a reasonable-seeming range of fairness ideals, but in the complex, high-dimensional action spaces of the real world, it could easily be almost as bad as never giving in, if there's no updating.

[-]Mikhail Samin1y30

If you are a smart individual in todays society, you shouldn't ignore threats of punishment

If today's society consisted mostly of smart individuals, they would overthrow the government that does something unfair instead of giving in to its threats.

Should you update your idea of fairness if you get rejected often?

Only if you're a kid who's playing with other human kids (which is the scenario described in the quoted text), and converging on fairness possibly includes getting some idea of how much effort various things take different people.

If you're an actual grown-up (not that we have those) and you're playing with aliens, you probably don't update, and you certainly don't update in the direction of anything asymmetric.

[-]Dagon1y20

It's not how the game would be played between dath ilan and true aliens

This is a very important caveat. Many humans or CDT agents could be classified as “true aliens” by someone not part of their ingroup.

[-]Mikhail Samin1y10

It's not how the game would be played between dath ilan and true aliens

This is a reference to "Sometimes they accept your offer and then toss a jellychip back to you". Between dath ilan and true aliens, you do the same except for tossing the jellychip when you think you got more than what would've been fair. See True Prisoner's Dilemma.

[-]Roman Brovko9mo10

Is there any proof that you shouldn't incentivize threats? Consider two situations:

Someone tells you "Give me $5 or I'll destroy your $10".
You are the first player in the Ultimatum Game. The second player tells you "Offer me $5 or I'll reject your offer".

Should you ignore the first threat but give in to the second one (instead of offering $0)? If yes, what is the difference between these cases?

The only difference I see is that in Ultimatum Game you get $10 from the third person (experimenter) before the "threat". But is this important for the decision?

[-]IC Rainbow1y10

How one should signal their decision procedure in real life without getting their ass busted for "gambling with lives" etc.?

[-]Thomas Kwa1y42

Make your decision unpredictable to your counterparty but not truly random. This happens all the time in e.g. nuclear deterrence in real life.

[-]Dagon1y20

For singleton events (large-scale nuclear attack and counterattack), deception plays an important role. This isn't a problem, apparently, in dath ilan - everyone has common knowledge of other's rationality.

[-]Mikhail Samin1y10

(It is pretty important to very transparently respond with a nuclear strike to a nuclear strike. I think both Russia and the US are not really unpredictable in this question. But yeah, if you have nuclear weapons and your opponents don't, you might want to be unpredictable, so your opponent is more scared of using conventional weapons to destroy you. In real-life cases with potentially dumb agents, it might make sense to do this.)

[-]Thomas Kwa1y23

I think creating uncertainty in your adversary applies a bit more than you give it credit for, and assuring a second strike is an exception.

It has been crucial to Russia's strategy in Ukraine to exploit NATO's fear of escalation by making various counter-threats whenever NATO proposes expanding aid to Ukraine somehow. This has bought them 2 years without ATACMS missiles attacking targets inside Russia, and that hasn't require anyone to be irrational, just incapable of perfectly modeling the Kremlin.

Even when responding to a nuclear strike, you can essentially have a mixed strategy. I think China does not have enough missiles to assure a second strike, but builds extra decoy silos so they can't all be destroyed. They didn't have to roll a die, just be unpredictable.

[-]Mikhail Samin1y20

Very funny that we had this conversation a couple of weeks prior to transparently deciding that we should retaliate with p=.7!

[-]Mikhail Samin1y10

I guess when criminals and booing bystanders are not as educated as dath ilani children, some real-world situations might get complicated. Possibly, transparent stats about the actions you've taken in similar situations might serve the same purpose even if you don't broadcast throwing your dice on live TV. Or it might make sense to transparently never give in to some kinds of threats in some sorts of real-life situations.

Moderation Log