This is a linkpost for https://medium.com/@huwjames81/the-value-alignment-problem-as-an-interactive-game-18799a1ea3c7#.3dh6wavi6
If the robot uses a stationary policy then the human is effectively solving a standard reinforcement learning problem. I therefore think it reasonable to assume that the human will be able to find a policy which is optimal with respect to any stationary policy taken by the robot, given sufficent time, as many standard reinforcement learning approaches have been proven to converge.
I think this is not a reasonable assumption. This is where most of the problem is (the cooperation part seems pretty easy if standard IRL already works for inferring goals). See: The easy goal inference problem is still hard, Model mis-specification and inverse reinforcement learning.