Lumifer comments on In partially observable environments, stochastic policies can be optimal - Less Wrong

5 Post author: Stuart_Armstrong 19 July 2016 10:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (8)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 22 July 2016 06:51:28PM 1 point [-]

ABABABABABAB...

It's deterministic, but not memoryless.

But it really does seem that there is a difference between facing an environment and another player - the other player adapts to your strategy in a way the environment doesn't. The environment only adapts to your actions.

I think for unbounded agents facing the environment, a deterministic policy is always optimal, but this might not be the case for bounded agents.

Comment author: Lumifer 22 July 2016 07:44:20PM 1 point [-]

The environment only adapts to your actions.

Is this how you define environment?

Comment author: Stuart_Armstrong 23 July 2016 04:43:16PM 1 point [-]

At least as an informal definition, it seems pretty good.