Larks comments on Google Deepmind and FHI collaborate to present research at UAI 2016 - Less Wrong

23 Post author: Stuart_Armstrong 09 June 2016 06:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread.

Comment author: Larks 10 July 2016 03:33:57AM 0 points [-]

Very interesting paper, congratulations on the collaboration.

I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta - why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)

Comment author: Stuart_Armstrong 10 July 2016 05:09:02AM 0 points [-]

We're working on the theta problem at the moment. Basically we're currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can't set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.

Comment author: Larks 12 July 2016 12:18:07AM 0 points [-]

Yup, I think I understand that, and agree you need to at least tend to one. I'm just wondering why you initially use the loser definition of theta (where it doesn't need to tend to one, and can instead be just 0 )

Comment author: Stuart_Armstrong 12 July 2016 01:50:26PM 0 points [-]

When defining safe interruptibility, we let theta tend to 1. We probably didn't specify that earlier, when we were just introducing the concept?