Does this game have a name?
There is a game where one player wants to predict the action of the other, and the other player wants them to fail (as a fixed sum game) It has payoffs 1,-1| -1,1 -1,1|1,-1 Or equivalent. I believe that it has a nash equilibrium of choosing randomly.
The counterargument against continous tokens being passed forwards is that if you want to use neuralese, you have to give up sampling, since the big idea of latent reasoning is to not pass through the random discretization of sampling a token. But random discretization is itself powerful, especially with the possibility of a useful bias. If you give it up, the model becomes deterministic, so it can't use Best of N. If Best of N or tree search on chains of thoughts is really important, either in training or in deployment, that is something that is not really compatible with the latent paradigm, in addition to the difficulty of training data.
The argument against semantic drift/Thinkish is extremely weak, and we should expect semantic drift when training with self play without countermeasures.