Stuart_Armstrong comments on In partially observable environments, stochastic policies can be optimal - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (8)
Yes, you can see this POMDP as a variant of the absent minded-driver, and get that result.