V_V comments on Google Deepmind and FHI collaborate to present research at UAI 2016 - Less Wrong

23 Post author: Stuart_Armstrong 09 June 2016 06:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread.

Comment author: V_V 13 June 2016 05:26:55PM 0 points [-]

Talking of yourself in third person? :)

Cool paper!

Anyway I'm a bit bothered by the theta thing, the probability that the agent complies with the interruption command. If I understand correctly, you can make it converge to 1, but if it converges to quickly then the agent learns a biased model of the world, while if it converges too slowly it is unsafe of course.
I'm not sure if this is just a technicality that can be circumvented or if it represents a fundamental issue: in order for the agent to learn what happens after the interruption switch is pressed, it must ignore the interruption switch with some non-negligible probability, which means that you can't trust the interruption switch as a failsafe mechanism.