In your last table, shooting while opponent blocks should yield u(0,0), right? And both reloading would be u(2,1).
shooting while opponent blocks should yield u(0,0), right?
Well, I could make a table for the state where no one has any bullets, but it would just have one cell: both players reload and they go back to having one bullet each. In fact, the game actually starts with no one having any bullets, but I omitted this step.
Also, in both suggestions, you are telling me that the action that leads to state x should yield the expected utility of state x, which is correct, but my function u(x,y) yields the expected utility of the resulting state assuming that you're coming from the original, neutral one. Otherwise, it would need an additional argument to say what state you're currently in. Instead of writing the utility of each action as u(current state, next state), I wrote it as u(next state)- u(current state). Each state is an ordered pair of positive integers, the two player's bullets. So, to write it the way you suggested, the function would need four arguments instead of two.
What's the utility of going to the state where I have one bullet and the opponent has none?
If I try to do the matrix for that, I get this:
This made me think the last table was just for the (1,0) state. Is this not the case?
I'm not sure why the previous state would matter.
So the utility for S+B is 0 and the utility for R+R is 0.5. The equilibrium is where both players reload with probability = 2/3. The utility of the (1,0) state is +2/3.
Thanks. I now see my mistake. I shouldn't have subtracted the expected utility of the current state from the expected utility of the next.
There are two versions of zero-zero-sete or 007. In the first one, there are two players. Each player has three options: shooting, reloading or blocking. They start with zero bullets. Each turn, both play at the same time in regular intervals (they achieve this by saying the name of the game in unison and making their move by handsigns as they say "sete", kind of like rock-paper-scissors). If you shoot as the other reloads, you win. If both shoot, it's a draw. Everytime you shoot you spend one bullet. Now, this version is really boring, since you can (and should) block forever without being punished.
The actually interesting version of the game is just like the first one, but if you hoard a set number of bullets, you get a rocket launcher that can break the shield, so from that point your opponent is playing for a draw.
I tried to solve the interesting version of the game using game theory. Specifically, the version where the rocket launcher costs only two bullets, because it's the simplest.
This is the pay-off matrix for player one when both players have one bullet:
S means shoot. B means block. R means reload. u(1,0) means the expected utility of going to the state where you have one bullet and the oponent has zero, and u(2,1), the utility of going to the state where you have two bullets and he has one.
Now, let's calculate u(2,1).
This is the matrix for the state where I have two bullets and my opponent has one:
It's smaller because, since I already have the rocket launcher, I have no reason to reload, and my opponent has no reason to block. And blocking also wins me the game because after that I have the rocket launcher and the opponent has no bullets, and then he can't threaten a draw, so I shoot him in the next turn. The nash equilibrium is for us to shoot 50% of the time, which leaves me with the expected utility of .5. So, back to the main matrix, we have:
Now, here's where I'm stuck. What's the utility of going to the state where I have one bullet and the opponent has none?
If I try to do the matrix for that, I get this:
Here, the opponent can't shoot because he has no bullets and thus I have no reason to block. Missing a shot gets me back to where I started and reloading either wins me the game the next turn or gets me in the situation where I either win or draw (see the second table).
I call the probability that I shoot "a". To achieve the nash equilibrium, the expected utility my opponent's actions must be equal.
a+(1−a)/2=(1−a)a=(1−a)/2a=1/2−a/2a+a/2=1/23a/2=1/23a=1a=1/3Therefore, I should shoot 1/3 of the time. If The opponent blocks, he will lose the other 2/3. So his expected utility is -2/3 in this state, and mine 2/3, since this is a zero sum game. And, since this subgame is symmetrical, the opponent should block 1/3 of the time.
Now, back to the main table:
This time we have three options, which gives us two degrees of freedom. So i'm going to use two variables: a for shooting and b for blocking given that we don't shoot.
2(1−a)b/3−(1−a)(1−b)=0−2a/3+(1−a)(1−b)/2=0Here, again, I find the nash equilibrium by playing in such a way as to equal his expected returns. But for this one I'm just going to plug the system of equations in an online solver because I'm feeling lazy. The result is that, when both players have one bullet, you should shoot 3/13 of the time and block 6/13 of the time. And there you have it. We solved the game, and it only took a little help from this cool guy in the comments who corrected my mistakes. Now I can finally go ruin some kid's day by absolutely destroying them in this one game.