You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on In partially observable environments, stochastic policies can be optimal - Less Wrong Discussion

5 Post author: Stuart_Armstrong 19 July 2016 10:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (8)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 22 July 2016 06:51:28PM 1 point [-]

ABABABABABAB...

It's deterministic, but not memoryless.

But it really does seem that there is a difference between facing an environment and another player - the other player adapts to your strategy in a way the environment doesn't. The environment only adapts to your actions.

I think for unbounded agents facing the environment, a deterministic policy is always optimal, but this might not be the case for bounded agents.

Comment author: Lumifer 22 July 2016 07:44:20PM 1 point [-]

The environment only adapts to your actions.

Is this how you define environment?

Comment author: Stuart_Armstrong 23 July 2016 04:43:16PM 1 point [-]

At least as an informal definition, it seems pretty good.