I'm not too concerned about the rational agent case. If we have a fully rational agent whose values I endorse, the Friendliness problem has either been solved or turns out to be irrelevant.
It's just a way to pin down the problem. If we can show that the AI in a box could misinform an idealized rational agent via selective evidence, then we know it can do so to us. If it can't misinform the idealized agent, then there exists some method by which we can resist it.
Also,I don't think idealized rational agents can actually exist anyway. All riddles involving them are for the sake of narrowing down some other problem.
AI Box Experiment Update #3
Tuxedage (AI) vs Alexei (GK) - Gatekeeper Victory
Tuxedage (AI) vs Anonymous (GK) - Gatekeeper Victory
I have won a second game of AI box against a gatekeeper who wished to remain Anonymous.
This puts my AI Box Experiment record at 3 wins and 3 losses.