My reading of the rules is that the "m" reward is additional to the hunts, not instead of them.
The effect is to reduce the rate at which players on the way to starving get eliminated. If you are one of the leading players, you will want the weaker players to be eliminated as soon as possible and have an incentive to prevent the m reward from happening. If you are one of the losing players, you want the m reward, to get more time to recover.
ETA: But how do you tell where you rank, from the information your program is given? Every act of defection destroys 2 food -- all else is transfers from one player to another, and m rewards. The amount of food gained from m rewards is large enough that you can always tell when it happens. So from everyone else's reputation you can work out the number of defections, and so how much food there is left. Hence the average food per player, and how your food compares with that.
ETA2: Cf. providing humanitarian assistance in a war zone, whether impartially to both sides, or preferentially to where the suffering is greatest, i.e. to the losing side. Result: the war is prolonged, the suffering increased.
TL;DR = write a python script to win this applied game theory contest for $1000. Based on Prisoner's Dilemma / Tragedy of the Commons but with a few twists. Deadline Sunday August 18.
https://brilliant.org/competitions/hunger-games/rules/
The choices are H = hunt (cooperate) and S = slack (defect), and they use confusing wording here, but as far as I can tell the payoff matrix is (in units of food)
What's interesting is you don't get the entirety of your partner's history (so strategies like Tit-Tit-Tit for Tat don't work) instead you get only their reputation, which is the fraction of times they've hunted.
To further complicate the Nash equilibria, there's the option to overhunt: a random number m, 0 < m < P(P−1) is chosen before each round (round consisting of P−1 hunts, remember) and if the total number of hunt-choices is at least m, then each player is awarded 2(P−1) food units (2 per hunt).
Your python program has to decide at the start of each round whether or not to hunt with each opponent, based on: