Austria is also the player instigating a plan of action in the dialogue, which seems to be how the AI is so effective. It seems like the way it wins because it proposes mutually beneficial plans and then (mostly) follows through on them.
Is that also useful for human players to do? (I have never played Diplomacy.) That is, in negotiations with other players, be the first to propose a plan and so set the agenda for the conversation.
Yes. From page 34 of Supplementary Materials:
In practice, we observe that expert players tend to be very active in communicating, whereas those less experienced miss many opportunities to send messages to coordinate: the top 5% rated players sent almost 2.5 times as many messages per turn as the average in the WebDiplomacy dataset.
No-Press Diplomacy was solved by Deepmind in 2020. Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Is this correct? The paper doesn't seem to say anything about No-press Diplomacy being solved, not even about reaching human-level play. (I take "solve" to mean superhuman play.)
The paper does say
Our methods improve over the state of the art, yielding a consistent improvement of the agent policy. [...] Future work can now focus on questions like: (1) What is needed to achieve human-level No-Press Diplomacy AI? [...]
which seems to suggest they haven't achieved human-level play, let alone superhuman play.
Curious if any of the following are answered in the material around this.
If you're vocally obstinate about not going along with its plan, can the dialogue side feed that info back into the planning side? Can you talk it around to a different plan? And if you're dishonest does it learn not to trust you?
If you're vocally obstinate about not going along with its plan, can the dialogue side feed that info back into the planning side?
Yes. Figure 5 of the paper demonstrates this. Cicero (as France) just said (to England) "Do you want to call this fight off? I can let you focus on Russia and I can focus on Italy". When human agrees ("Yes! I will move out of ENG if you head back to NAO"), Cicero predicts England will move out of ENG 85% of the time, moves the fleet back to NAO as agreed, and moves armies to Italy. When human disagrees ("You've been fighting me all game. Sorry, I can't trust you won't stab me"), Cicero predicts England will attack 90% of the time, moves the fleet to attack EDI, and does not move armies.
And if you're dishonest does it learn not to trust you?
Yes. It's also demonstrated in Figure 5. When human tries to deceive ("Yes! I'll leave ENG if you move KIE -> MUN and HOL -> BEL"), Cicero judges it unreasonable. Cicero moves the fleet back to de-escalate, but does not move armies.
Key links?
What is Diplomacy?
How well did CICERO perform?
How does CICERO behave?
How does CICERO work?
How does CICERO compare to prior models?
Were people surprised by CICERO?
"For Press Diplomacy, as well as other settings that mix cooperation and competition, you need progress,” Bachrach says, “in terms of theory of mind, how they can communicate with others about their preferences or goals or plans. And, one step further, you can look at the institutions of multiple agents that human society has. All of this work is super exciting, but these are early days.”