Lumifer comments on In partially observable environments, stochastic policies can be optimal - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (8)
It's deterministic, but not memoryless.
But it really does seem that there is a difference between facing an environment and another player - the other player adapts to your strategy in a way the environment doesn't. The environment only adapts to your actions.
I think for unbounded agents facing the environment, a deterministic policy is always optimal, but this might not be the case for bounded agents.
Is this how you define environment?
At least as an informal definition, it seems pretty good.