Stuart_Armstrong comments on In partially observable environments, stochastic policies can be optimal - Less Wrong

5 Post author: Stuart_Armstrong 19 July 2016 10:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (8)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 23 July 2016 04:56:32PM 1 point [-]

Take a game with a mixed strategy Nash equilibrium. If you and the other player follow this, using source of randomness that remain random for the other player, then it is never to your advantage to deviate from this. You play this game, again and again, against another player or against the environment.

Consider an environment in which the opponent's strategies are in an evolutionary arms race, trying to best beat you; this is an environmental model. Under this, you'd tend to follow the Nash equilibrium on average, but, at (almost) any given turn, there's a deterministic choice that's a bit better than being stochastic, and it's determined by the current equilibrium of strategies of the opponent/environment.

However, if you're facing another player, and you make deterministic choices, you're vulnerable if ever they figure out your choice. This is because they can peer into your algorithm, not just track your previous actions. To avoid this, you have to be stochastic.

This seems like a potentially relevant distinction.