Gram_Stone comments on In partially observable environments, stochastic policies can be optimal - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (8)
Is the Absent-minded Driver an example of a single-player decision problem whose optimal policy is stochastic? Isn't the optimal policy to condition your decision on an unbiased coin?
I ask because it seems like it might make a good intuitive example, as opposed to the POMDP in the OP. But I'm not sure who your intended audience is.
Yes, you can see this POMDP as a variant of the absent minded-driver, and get that result.