... My line of thought is unchanged if Omega simply learns your decision after you decide but before Omega decides. The game is now not symmetrical.
Currently, I have concluded that if it is best to cooperate if the other player has cooperated, it is best for the first player to cooperate against a rational opponent. (3 instead of 1). However, it is better to cooperate with >1/3 chance, and that still provides a higher expected result to the first player.
If, given the choice between 3 points and 5 points, 5 points is better, then it is best for the first player to defect (1 instead of 0).
In the end, the first player has 2 possible strategies, and the second player has 4 possible strategies, for a total of 8 possibilities:
Player 1\Player 2: {c:C;d:C} {c:C;d:D} {c:D;d:C} {c:D;d:D}
c....................c:C 3/3...c:C 3/3...c:D 0/5...c:D 0/5
d....................d:C 5/0...d:D 1/1...d:C 5/0...d:D 1/1
My problem is that if quid pro quo {c:C;d:D} is the optimum strategy, two optimal players end up cooperating. But quid pro quo is a strictly worse strategy than defectbot {c:D;d:D}. However, if defectbot is the best strategy for player 2, then the best strategy for player 1 is to defect; if quid pro quo is the best strategy for player 2, then the best strategy for player 1 is to cooperate.
I have trouble understanding how the optimal strategy can be strictly worse than a competing strategy.
IFF quid pro quo is optimal, then optimal players score 3 points each.
However, iff quid pro quo is the optimal strategy, then defectbot scores more against an optimal player 1; the optimal player 1 strategy is to defect, and optimal players score 1 point each.
Please stop using the words "rational" and "optimal", and give me some sign that you've read the linked post on counterfactuals rather than asking counterfactual questions whose assumptions you refuse to spell out.
The only difficult question here concerns the imbalance in knowledge between Omega and a human, per comment by shminux. Because of this, I don't actually know what TDT does here (much less 'rationality').
Sometimes I see new ideas that, without offering any new information, offers a new perspective on old information, and a new way of thinking about an old problem. So it is with this lecture and the prisoner's dilemma.
Now, I worked a lot with the prisoners dilemma, with superrationality, negotiations, fairness, retaliation, Rawlsian veils of ignorance, etc. I've studied the problem, and its possible resolutions, extensively. But the perspective of that lecture was refreshing and new to me:
The prisoner's dilemma is resolved only when the off-diagonal outcomes of the dilemma are known to be impossible.
The "off-diagonal outcomes" are the "(Defect, Cooperate)" and the "(Cooperate, Defect)" squares where one person walks away with all the benefit and the other has none:
Facing an identical (or near identical) copy of yourself? Then the off-diagonal outcomes are impossible, because you're going to choose the same thing. Facing Tit-for-tat in an iterated prisoner's dilemma? Well, the off-diagonal squares cannot be reached consistently. Is the other prisoner a Mafia don? Then the off-diagonal outcomes don't exist as written: there's a hidden negative term (you being horribly murdered) that isn't taken into account in that matrix. Various agents with open code are essentially publicly declaring the conditions under which they will not reach for the off-diagonal. The point of many contracts and agreements is to make the off-diagonal outcome impossible or expensive.
As I said, nothing fundamentally new, but I find the perspective interesting. To my mind, it suggests that when resolving the prisoner's dilemma with probabilistic outcomes allowed, I should be thinking "blocking off possible outcomes", rather than "reaching agreement".