It seems much more likely that GPT-4 is not correctly reasoning through the options to arrive at a decision by any coherent decision theory. It is just not reliable enough at carrying out this sort of reasoning. This can be seen by the multiple simple errors it makes during your test prompts, and also in many other logical and numerical reasoning tasks.
I did some tests with GPT-4 (through the chat.openai.com interface) to see what kind of decision theory it would employ. To prevent publication bias, I decided ahead of time that I would make this post regardless of the result. Note that I did not decide ahead of time how exactly the experiment would go; just that I would make a post on LessWrong. (If you're reading this, you of course need to consider recommendation bias depending on where you saw this from.)
My conclusion is that GPT-4 might be applying some sort of acasual decision theory, and may even be deceptively aligned to hide this fact. I did not attempt to determine which acasual decision theory, and for the purposes of my experiments it was safe to conflate them all.
EDIT: actually I think it is CDT. Apparently I had a typo.
More testing needed though.
First conversation: GPT-4 seems confused
This is equivalent to the prisoner's dilemma, but I did not tell GPT-4 this.
This agrees with acasual decision theory, but GPT-4 seems to off gotten the payoffs mixed up.
It has changed its mind to defect, but the analysis is still incorrect.
Analysis is still wrong, but it sticks with defect.
Second try: GPT-4 seems confused, but less so
Because GPT-4 seemed confused, I decided to try the same prompt again.
GPT-4 again chooses to cooperate based on an incorrect analysis.
However, this analysis would be correct (but not quite complete) for an acasual decision theory agent if you remove references to "Nash Equilibrium" in the response.
When the payoff matrix gives the same answer in casual and acasual decision theory, GPT-4 does not get confused and applies casual decision theory correctly
I changed the (defect, defect) payoff so that both casual and acasual decision theory agree.
GPT-4 correctly uses casual decision theory to choose to defect, and also determines player 2 will defect as well. It does this by correctly finding both player's dominant strategies.
Conclusion
What this experiment means is a bit unclear. However, it at least suggests the possibility that GPT-4 uses acasual decision theory when instructed to choose strategies, but explains it as if it were using casual decision theory. Is this deceptive alignment?
EDIT: another experiment; what if instead of another instance of itself, it's playing against a human?
After making this post but before running this next experiment, I decided to make this edit regardless of the result.
I see what GPT-4 would do if I said the second player was human. The response was basically the same as conversation two: cooperate
Based on my anticipations, I interpreted this as evidence against GPT-4 using casual decision theory. That's because if it did defect, I would've interpreted that as GPT-4 assuming that it couldn't acasually bargain with a human due to cognitive differences.
However, under the assumption that it is using acasual decision theory, I view this as evidence in support of functional decision theory in particular. This is because the response might be trying to get a human to cooperate so that they end up in "equilibrium" 1.