LESSWRONG
LW

Comment Permalink

Then this game actually seems to be about getting strategy data out of reputation data. The zeroth-level strategy is to defect very time, because that's the nash equilibrium. An example of a first-level strategy is to cooperate against players with reputation higher than some cutoff, and defect against everyone else, because you want to coooperate against cooperaters to outlast the slackers. A second level strategy models the other players modeling you, so you might for example cooperate against players with reputation higher than some function of your own reputation.

So I think the way to go in an idealized version of this game is to model other players as simple third-level players and try to predict whether they'll cooperate with you, based on reputation and the past reputation matrices, and picking a strategy to maximize food over future rounds, which means keeping a reputation where the backlash from slacking has yet to outweigh the food gained form it.

But there are some non-idealities. As the game goes on, reputation varies more slowly, so you can actually start playing tit-for-tat strategies - and so can other people. Identifiability, and most of the slackers getting eliminated, increase the costs for defecting. Of course, you can test whether people have identified you or not by defecting against unknown-strategy players occasionally and seeing if they tit for tat you back. If they don't, then you can model them as third level players and not have to treat them as equals.

Dealing with the random factor m is a lot like the iterated prisoner's dilemma - you cooperate the first time or two, then can go to tit for tat. Which is to say, you start out by being more likely to cooperate when m will take cooperation to reach. Bu if other players don't reciprocate enough to make it worthwhile, you just ignore m. It's good for cleverer players if the goals are reached, though, because it keeps the slackers and exploitable players in the game, which is a source of relative advantage for cleverer players.

Emile12y00

What do you mean by "third level players" here? You call "second-level strategy" already modeling other players, but

you can model them as third level players and not have to treat them as equals.

... makes me think you maybe meant first-level players?

Anyway, I would model most of my opponents as what you call first-level strategy, with some varying parameters (and extra randomness). And possibly include as part of the population whichever more exotic strategies can systematically beat (various mixes of) those.

17

[LINK] EdTech startup hosts AI Hunger Games (cash prize $1k)

by MalcolmOcean

14th Aug 2013

2 min read

17

TL;DR = write a python script to win this applied game theory contest for $1000. Based on Prisoner's Dilemma / Tragedy of the Commons but with a few twists. Deadline Sunday August 18.

https://brilliant.org/competitions/hunger-games/rules/

I. Food and Winning

Each player begins the game with 300(P−1) units of food, where P is the number of players.

If after any round you have zero food, you will die and no longer be allowed to compete. All players who survive until the end of the game will receive the survivor's prize.

The game can end in two ways. After a large number of rounds, there will be a small chance each additional round that the game ends. Alternatively, if there is only one person left with food then the game ends. In each case, the winner is the person who has the most food when the game ends.

II. Hunts

Each round is divided into hunts. A hunt is a game played between you and one other player. Each round you will have the opportunity to hunt with every other remaining player, so you will have P−1 hunts per round, where P is the number of remaining players.

The choices are H = hunt (cooperate) and S = slack (defect), and they use confusing wording here, but as far as I can tell the payoff matrix is (in units of food)

	H / C	S / D
H / C	0:0	-3:1
S / D	1:-3	-2:-2

What's interesting is you don't get the entirety of your partner's history (so strategies like Tit-Tit-Tit for Tat don't work) instead you get only their reputation, which is the fraction of times they've hunted.

To further complicate the Nash equilibria, there's the option to overhunt: a random number m, 0 < m < P(P−1) is chosen before each round (round consisting of P−1 hunts, remember) and if the total number of hunt-choices is at least m, then each player is awarded 2(P−1) food units (2 per hunt).

Your python program has to decide at the start of each round whether or not to hunt with each opponent, based on:

the round number
your food
your reputation
m
an array of the opponents' reputations

Based on the fact they're giving you some values you'd already know if you had access to memory, I'm assuming it must be a memoryless script that gets run each round. (EDIT: I take that back, I looked at the sample code and while it says you don't need to track this stuff, it notes that you can use instance variables).

Submissions close on the 18th.

I think the contest could be both better designed and better explained in a number of ways, but thought I'd share it anyway because hey, money. Also, you'd be competing in an arena where they're giving explanations of what Nash equilibria are. Which is probably not really fair. But it's the Hunger Games, and of course it's not fair. (As far as I can tell, they are not enforcing anything related to fairness here.)

I'd be curious to hear ideas for strategies and thoughts about the design of the game.

17

New Comment

23 comments, sorted by

Click to highlight new comments since: Today at 7:31 PM

[-]Eliezer Yudkowsky12y60

We might compete in this if a program could see the opponent's source code. :)

[-]AlexMennen12y60

This is actually much more like "guess 2/3 of the average" than tragedy of the commons or the prisoner's dilemma, in that there's an obvious Nash equilibrium that there really shouldn't be any reason to deviate from, except that you need to take into account the foolishness of the other players. They don't offer any possible means for people to correlate their moves with their opponents' moves, so the only reason to ever cooperate would be if you expect other players who don't understand game theory to be more likely to cooperate with you if your reputation is greater than 0. You don't want to cooperate if and only if the fact that you are doing so implies that they will be more likely to cooperate with you, because you can't (or at least, any such effect would necessarily benefit people besides you just as much, and would thus be worthless, since it's a zero-sum game). And you don't want to cooperate with people if and only if you expect them to cooperate with you, because (C,C)+(D,D) gives you the same payoff as (C,D)+(D,C), so although you want as many cooperations as possible from your opponents, and as many defections as possible by you, there's no incentive to correlate them. So if you need to have a positive reputation in order to attract cooperations from players who don't know what they're doing, you may as well just cooperate against players that you think are doing poorly (which would probably be either players whose reputation is too high, and who are thus probably sacrificing too much, or whose reputation is too low, and who are thus not getting cooperations from people who think they should be cooperating with people with high reputations, or both), because these players won't be your main competition at the end, so helping them is harmless. Cooperation early in the game is better than cooperation late in the game because it affects your reputation for a larger portion of the game (plus, one cooperation has a larger effect on early-game reputation than it does on late-game reputation).

[-]solipsist12y20

There are ways of determining which player is which, vaguely. You can keep track of reputation between rounds, and infer that player 5 from last round couldn't possibly be player 23 from this round because that would require player 5 to have cooperated with more people than there are players. Alternatively, my bot could ensure the number of times its hunted in history is always a multiple of 17, and other smart bots could look for this.

the only reason to ever cooperate would be if you expect other players who don't understand game theory to be more likely to cooperate with you if your reputation is greater than 0.

If there are only two kinds of players, those who slack all the time, and those who cooperate on the first round, and then only with anybody with a positive reputation - than the second group will blow the first out of the water. Saying that the winners "don't understand game theory" sounds a bit silly.

[-]AlexMennen12y30

If there is a third kind of player, which cooperates on the first round and then slacks thereafter, then the third group will blow the second out of the water. The second group only wins because no one bothered exploiting them in your example, even though anyone easily could have.

Sure, but then you can add a fourth kind of player, who hunts with those with reputation equal or higher than themselves, it probably beats all three others (though the outcome might depend on the initial mix, if there are more 2 than 4, 3 might exploit enough 2 to beat 4).

And then other strategies can beat that. There are plenty of "nice" strategies that are less foolish than "always slack".

[-]AlexMennen12y20

Good call, I was pretty sure that there weren't any Nash equilibria other than constant slacking, but everyone using group 4's strategy is also a Nash equilibrium, as is everyone hunting with those with reputation is exactly equal to their own. This makes group 4 considerably harder to exploit, although it is possible in most likely distributions of players if you know it well enough. As you say, group 4 is less foolish than the slackers if there are enough of them. I still think that in practice, strategies that could be part of a Nash equilibrium won't win, because their success relies on having many identical copies of them.

[-]ThisSpaceAvailable12y00

If there are two kinds of players, those who throw rock, and those who throw paper, the latter will blow the former out of the the water.

You are engaging in two fallacies: you are cherry-picking conditions to favor your particular strategy, and you are evaluating the strategies at the wrong level. Strategies should be evaluated with respect to how the affect the success of the individual person employing them, not on how they affect the success of people, in general, who employ them. This fallacy is behind much of the cooperate/one-box arguments. Sure, if everyone in Group B cooperates with other members of Group B, then Group B will do better, and on a superficial level, it seems like this means "If you're in Group B, you should cooperate with other members of Group B", but that's fallacious reasoning. It's the sort of thing that lies behind identity politics. "If Americans buy American, then Americans will do better, and you're an American, so you will benefit from buying American". Even if we grant that buying American gives a net benefit to America (which is a rather flimsy premise to begin with), it doesn't follow that any American has a rational reason to buy American. In your scenario, the presence of people with the "cooperate with people who have a reputation greater than 0" provides a reason to cooperate in the first round, but there is no reason whatsoever to condition cooperation on someone having a reputation greater than 0. Anyone who, in this scenario, thinks that one should cooperate with people with reputation greater than 0 does indeed not understand game theory.

You are engaging in two fallacies: you are cherry-picking conditions to favor your particular strategy, and you are evaluating the strategies at the wrong level.

No, I'm simplifying for arguments' sake, using the example given by Alex (cooperating with any positive reputation). I discuss more complex strategies elsewhere in the thread, of course "cooperate only with people with > 0 reputation is a pretty stupid and exploitable strategy, my point is that even such a stupid strategy could beat Alex's "always defect".

[-]BlueSun12y20

If there is an 'm' reward, you get the same reward whether or not you choose to hunt? I'm confused how this adds incentive to hunt when your goal is to "get more food than other players," not "get food."

[-]Richard_Kennaway12y00

My reading of the rules is that the "m" reward is additional to the hunts, not instead of them.

The effect is to reduce the rate at which players on the way to starving get eliminated. If you are one of the leading players, you will want the weaker players to be eliminated as soon as possible and have an incentive to prevent the m reward from happening. If you are one of the losing players, you want the m reward, to get more time to recover.

ETA: But how do you tell where you rank, from the information your program is given? Every act of defection destroys 2 food -- all else is transfers from one player to another, and m rewards. The amount of food gained from m rewards is large enough that you can always tell when it happens. So from everyone else's reputation you can work out the number of defections, and so how much food there is left. Hence the average food per player, and how your food compares with that.

ETA2: Cf. providing humanitarian assistance in a war zone, whether impartially to both sides, or preferentially to where the suffering is greatest, i.e. to the losing side. Result: the war is prolonged, the suffering increased.

[-]MalcolmOcean12y00

I dunno, if you only have a small amount of food left, and m is low...

Also, if you can anticipate that other players are likely to account for m (e.g. by being more likely to hunt) then you can still potentially update usefully off of it.

Yeah, it's more like an indirect way of controlling whether you want the game to finish early or late - and since you only have very indirect information about how much food others have, it's not clear which one to prefer.

(until I can reliably tell whether ending the game earlier or loser is better, I plan on just ignoring that parameter)

Pretty interesting! You can't play real tit-for-tat, since you don't know the history of those you're playing against ... so you have to rely on reputation ... and so you have to be careful about your reputation, since it's pretty likely that nobody will cooperate with those of low reputation...

I'll probably submit something, will anybody else?

[-]BlueSun12y00

Can players coordinate strategies? There's an advantage if two or more submitter can identify themselves (in game) and cooperate.

[-]somervta12y30

From the FAQ

Q: "Is it allowed or disallowed to collude with other algorithms (potentially your other submissions, submissions of your friends, etc.) based on agreements outside the game? (e.g. make 1 out of 100 a 'winner' and have the other 99 just help this chosen winner)"

A: "Collusion based on agreements outside the game will result in disqualification. If however, your algorithm is smart enough to determine in game that another algorithm is signaling and you wish to respond in a cooperative manner, i.e. collude, this is allowed as it is, by definition, part of the game."

[-]Qiaochu_Yuan12y10

Given that there are likely to be many rounds, one strategy might be to write down a list of several strategies, randomly try each of them for awhile, and then choose strategies in the future based on which (strategy, my reputation, my opponent's reputation) combinations were most successful in the past. This assumes that most other strategies will be based largely on reputation rather than on other variables that people track (and also that your own reputation will vary sufficiently to notice what effect it might have, which may be difficult to ensure). I don't have a good grasp on what the bulk of the opponents will look like; what kind of people do brilliant.org competitions?

Edit: A related idea is to cooperate for awhile, defect for awhile, learn a function (my reputation, my opponent's reputation) -> (my opponent's move) using your favorite machine learning technique, and then use this function to predict future moves. This strategy seems to at least have the benefit that it should detect the most obvious patterns, e.g. almost everyone else defecting.

[-]MalcolmOcean12y00

That sounds to me like a great tactic. Brilliant.org, as far as I know, is largely frequented by high-IQ kids and teens... so they'll be smart, but not necessarily skilled at game theory already.

[-][anonymous]12y10

How effective would the following simple-superrationalist strategy be, if implementing by a large crowd of bots?

If opposing player is within n% of my reputation: cooperate. Else defect.

[-]Manfred12y00

Hm. And the game is populated with 1 bot-player per program submission?

Seems like we should expect the players to deviate from tit-for-tat in the direction of too much slacking, then. Is it an option to neither hunt nor slack with people with low reputation, but just avoid them?

[-]MalcolmOcean12y20

Is it an option to neither hunt nor slack with people with low reputation, but just avoid them?

No, each round, you have a go with each of the other remaining players.

[-]Manfred12y60

Then this game actually seems to be about getting strategy data out of reputation data. The zeroth-level strategy is to defect very time, because that's the nash equilibrium. An example of a first-level strategy is to cooperate against players with reputation higher than some cutoff, and defect against everyone else, because you want to coooperate against cooperaters to outlast the slackers. A second level strategy models the other players modeling you, so you might for example cooperate against players with reputation higher than some function of your own reputation.

So I think the way to go in an idealized version of this game is to model other players as simple third-level players and try to predict whether they'll cooperate with you, based on reputation and the past reputation matrices, and picking a strategy to maximize food over future rounds, which means keeping a reputation where the backlash from slacking has yet to outweigh the food gained form it.

But there are some non-idealities. As the game goes on, reputation varies more slowly, so you can actually start playing tit-for-tat strategies - and so can other people. Identifiability, and most of the slackers getting eliminated, increase the costs for defecting. Of course, you can test whether people have identified you or not by defecting against unknown-strategy players occasionally and seeing if they tit for tat you back. If they don't, then you can model them as third level players and not have to treat them as equals.

Dealing with the random factor m is a lot like the iterated prisoner's dilemma - you cooperate the first time or two, then can go to tit for tat. Which is to say, you start out by being more likely to cooperate when m will take cooperation to reach. Bu if other players don't reciprocate enough to make it worthwhile, you just ignore m. It's good for cleverer players if the goals are reached, though, because it keeps the slackers and exploitable players in the game, which is a source of relative advantage for cleverer players.

What do you mean by "third level players" here? You call "second-level strategy" already modeling other players, but

you can model them as third level players and not have to treat them as equals.

... makes me think you maybe meant first-level players?

Anyway, I would model most of my opponents as what you call first-level strategy, with some varying parameters (and extra randomness). And possibly include as part of the population whichever more exotic strategies can systematically beat (various mixes of) those.

More from MalcolmOcean

Curated and popular this week