paulfchristiano comments on Creating a satisficer - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (26)
It seems more principled, equally effective, and much more practical, to simply take the policy that optimizes E[u] - (E[v] - v0)^2, where v0 is the expected value of v given some baseline "do nothing" policy. You can sum over many different v's to give a harsher requirement. I don't know if the machinery with counterfactuals etc. is adding much beyond this.
Yep, that seems sensible (I assume you meant E[u] - (E[v] - v0)^2 ?)
Yes, fixed.