You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

paulfchristiano comments on Creating a satisficer - Less Wrong Discussion

4 Post author: Stuart_Armstrong 11 March 2015 03:03PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread.

Comment author: paulfchristiano 28 October 2015 06:38:27PM *  1 point [-]

It seems more principled, equally effective, and much more practical, to simply take the policy that optimizes E[u] - (E[v] - v0)^2, where v0 is the expected value of v given some baseline "do nothing" policy. You can sum over many different v's to give a harsher requirement. I don't know if the machinery with counterfactuals etc. is adding much beyond this.

Comment author: Stuart_Armstrong 28 October 2015 06:41:35PM 0 points [-]

Yep, that seems sensible (I assume you meant E[u] - (E[v] - v0)^2 ?)

Comment author: paulfchristiano 28 October 2015 07:14:09PM 1 point [-]

Yes, fixed.