TL;DR = write a python script to win this applied game theory contest for $1000. Based on Prisoner's Dilemma / Tragedy of the Commons but with a few twists. Deadline Sunday August 18.
https://brilliant.org/competitions/hunger-games/rules/
I. Food and Winning
Each player begins the game with 300(P−1) units of food, where P is the number of players.
If after any round you have zero food, you will die and no longer be allowed to compete. All players who survive until the end of the game will receive the survivor's prize.
The game can end in two ways. After a large number of rounds, there will be a small chance each additional round that the game ends. Alternatively, if there is only one person left with food then the game ends. In each case, the winner is the person who has the most food when the game ends.
II. Hunts
Each round is divided into hunts. A hunt is a game played between you and one other player. Each round you will have the opportunity to hunt with every other remaining player, so you will have P−1 hunts per round, where P is the number of remaining players.
The choices are H = hunt (cooperate) and S = slack (defect), and they use confusing wording here, but as far as I can tell the payoff matrix is (in units of food)
H / C | S / D | |
H / C | 0:0 | -3:1 |
S / D | 1:-3 | -2:-2 |
What's interesting is you don't get the entirety of your partner's history (so strategies like Tit-Tit-Tit for Tat don't work) instead you get only their reputation, which is the fraction of times they've hunted.
To further complicate the Nash equilibria, there's the option to overhunt: a random number m, 0 < m < P(P−1) is chosen before each round (round consisting of P−1 hunts, remember) and if the total number of hunt-choices is at least m, then each player is awarded 2(P−1) food units (2 per hunt).
Your python program has to decide at the start of each round whether or not to hunt with each opponent, based on:
- the round number
- your food
- your reputation
- m
- an array of the opponents' reputations
Then this game actually seems to be about getting strategy data out of reputation data. The zeroth-level strategy is to defect very time, because that's the nash equilibrium. An example of a first-level strategy is to cooperate against players with reputation higher than some cutoff, and defect against everyone else, because you want to coooperate against cooperaters to outlast the slackers. A second level strategy models the other players modeling you, so you might for example cooperate against players with reputation higher than some function of your own reputation.
So I think the way to go in an idealized version of this game is to model other players as simple third-level players and try to predict whether they'll cooperate with you, based on reputation and the past reputation matrices, and picking a strategy to maximize food over future rounds, which means keeping a reputation where the backlash from slacking has yet to outweigh the food gained form it.
But there are some non-idealities. As the game goes on, reputation varies more slowly, so you can actually start playing tit-for-tat strategies - and so can other people. Identifiability, and most of the slackers getting eliminated, increase the costs for defecting. Of course, you can test whether people have identified you or not by defecting against unknown-strategy players occasionally and seeing if they tit for tat you back. If they don't, then you can model them as third level players and not have to treat them as equals.
Dealing with the random factor m is a lot like the iterated prisoner's dilemma - you cooperate the first time or two, then can go to tit for tat. Which is to say, you start out by being more likely to cooperate when m will take cooperation to reach. Bu if other players don't reciprocate enough to make it worthwhile, you just ignore m. It's good for cleverer players if the goals are reached, though, because it keeps the slackers and exploitable players in the game, which is a source of relative advantage for cleverer players.
What do you mean by "third level players" here? You call "second-level strategy" already modeling other players, but
... makes me think you maybe meant first-level players?
Anyway, I would model most of my opponents as what you call first-level strategy, with some varying parameters (and extra randomness). And possibly include as part of the population whichever more exotic strategies can systematically beat (various mixes of) those.