You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gram_Stone comments on In partially observable environments, stochastic policies can be optimal - Less Wrong Discussion

5 Post author: Stuart_Armstrong 19 July 2016 10:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (8)

You are viewing a single comment's thread.

Comment author: Gram_Stone 19 July 2016 02:19:44PM *  4 points [-]

Is the Absent-minded Driver an example of a single-player decision problem whose optimal policy is stochastic? Isn't the optimal policy to condition your decision on an unbiased coin?

I ask because it seems like it might make a good intuitive example, as opposed to the POMDP in the OP. But I'm not sure who your intended audience is.

Comment author: Stuart_Armstrong 19 July 2016 05:16:43PM 3 points [-]

Yes, you can see this POMDP as a variant of the absent minded-driver, and get that result.