How do AI agents work together when they can’t trust each other?
I investigated this question by having Claude play the advanced social deduction game Blood on the Clocktower. Clocktower is a game similar to Werewolf (or Mafia) where players sit in a circle and are secretly divided into a good team and an evil team. The good players outnumber the evil players, but they don’t know who the evil players are, so they have to share information with each other to deduce who is evil and execute them before it's too late. Meanwhile the evil players try to build trust with the good team and disrupt their deduction as much as possible. Clocktower takes this formula a step further by giving each player a unique character that gives them extra information or lets them disrupt other players’ abilities. Each player is told the set of characters that might be in play for the game and deducing which characters are in play and which are not is an important part of the puzzle. I created this website where you can scroll through an interactive timeline of games and view the players' actions, reasoning, and notes. The website also has a full rules explanation if you are curious. Agent Scaffolding During the day, each player is given four opportunities to take an action and the order they take actions in is shuffled each cycle so different players get a chance to act first. Players take actions through using Claude’s tool use capability. During the day those actions are message, nominate for execution, pass, or use the Slayer power. Messages can be sent to any number of players and nominations start a vote to execute the chosen player. Night actions are taken using a special night action tool with an interchangeable prompt depending on the choice the player is making. Each player has a history of recent events that gets added to whenever an event they are privy to occurs, such as receiving a message. At the end of each night phase players update their notes file using their history to record the most important events and they are also prompted