Unpacking the Concept of "Blackmail"

Vladimir_Nesov

Keep in mind: Controlling Constant Programs, Notion of Preference in Ambient Control.

There is a reasonable game-theoretic heuristic, "don't respond to blackmail" or "don't negotiate with terrorists". But what is actually meant by the word "blackmail" here? Does it have a place as a fundamental decision-theoretic concept, or is it merely an affective category, a class of situations activating a certain psychological adaptation that expresses disapproval of certain decisions and on the net protects (benefits) you, like those adaptation that respond to "being rude" or "offense"?

We, as humans, have a concept of "default", "do nothing strategy". The other plans can be compared to the moral value of the default. Doing harm would be something worse than the default, doing good something better than the default.

Blackmail is then a situation where by decision of another agent ("blackmailer"), you are presented with two options, both of which are harmful to you (worse than the default), and one of which is better for the blackmailer. The alternative (if the blackmailer decides not to blackmail) is the default.

Compare this with the same scenario, but with the "default" action of the other agent being worse for you than the given options. This would be called normal bargaining, as in trade, where both parties benefit from exchange of goods, but to a different extent depending on which cost is set.

Why is the "default" special here? If bargaining or blackmail did happen, we know that "default" is impossible. How can we tell two situations apart then, from their payoffs (or models of uncertainty about the outcomes) alone? It's necessary to tell these situations apart to manage not responding to threats, but at the same time cooperating in trade (instead of making things as bad as you can for the trade partner, no matter what it costs you). Otherwise, abstaining from doing harm looks exactly like doing good. A charitable gift of not blowing up your car and so on.

My hypothesis is that "blackmail" is what the suggestion of your mind to not cooperate feels like from the inside, the answer to a difficult problem computed by cognitive algorithms you don't understand, and not a simple property of the decision problem itself. By saying "don't respond to blackmail", you are pushing most of the hard work into intuitive categorization of decision problems into "blackmail" and "trade", with only correct interpretation of the results of that categorization left as an explicit exercise.

(A possible direction for formalizing these concepts involves introducing some kind of notion of resources, maybe amount of control, and instrumental vs. terminal spending, so that the "default" corresponds to less instrumental spending of controlled resources, but I don't see it clearly.)

(Let's keep on topic and not refer to powerful AIs or FAI in this thread, only discuss the concept of blackmail in itself, in decision-theoretic context.)

Keep in mind: Controlling Constant Programs, Notion of Preference in Ambient Control.

(Let's keep on topic and not refer to powerful AIs or FAI in this thread, only discuss the concept of blackmail in itself, in decision-theoretic context.)

This is a clever idea, but I don't think it works: you need to unpack the question of why a decision algorithm would deem cooperation non-optimal, and see if it coincides with a special class of problems where cooperation is generally non-optimal.

So I think what gets an offer labeled as blackmail is the recognition that cooperation would lead the other party to repeatedly use their discretion to force my next remaining options to be even worse. So blackmail and trade differ in that:

If I cooperate wth a blackmailer, they are more likely to spend resources "digging up dirt" on me, kidnapping my loved ones, etc. I don't want to be in that position, regardless of what I decide to do then.
If I trade with a trade-offerer, they are more likely to spend resources acquiring goods that I may want to trade for. I do want to be in the position where others make things available to me that I want (except for where I'd be competing with them in that process.)

And yes, these two situations are equivalent, except for what I want the offerer to do, which I think is what yields the distinction, not the concept of a baseline in the initial offer.

You can phrase blackmail as a sort of addiction situation where dynamic inconsistency potentially leaves me vulnerable to exploitation. My preferences at any time t are:

1) Not have an addiction.
2) Have an addiction, and take some more of the drug.
3) Have an addiction, and not take the drug.

where I'm addicted at time t, and taking the drug will make me addicted in time t+1 (and i otherwise won't be addicted in t+1).

In this light, one can view the classification of something as blackmail, as being any feeling or mechanism that makes me choose 3) over 2). "2 looks appealing, but I feel a strong compulsion to do 3." Agents with such a mechanism gain a resistance to dynamic inconsistency.

In contrast, if "addiction" were good, and the item in 1) were moved below 3) in my preference ranking, then I wouldn't benefit from a mechanism that makes me choose 3 over 2. That would feel like trade.

Time-inconsistency seems unrelated. It may be a problem in implementing the strategy "don't respond to blackmail", but one can certainly TRY to blackmail a time-consistent person, if one believes them to be irrational or if they have only one blackmail-worthy secret.

2Vladimir_Nesov15y

Yes, the distinction is in the way you prefer to acausally observation-counterfactually influence the other player. Not being offered a trade shouldn't be considered irrelevant by your decision algorithm, even if given the observations you have it is impossible. Like in Counterfactual Mugging, but with the other player instead of a fair coin. Newcomb's with transparent boxes is also relevant.

36

Unpacking the Concept of "Blackmail"

36

36

36

Unpacking the Concept of "Blackmail"

36

36