Could you clarify who you're referring to by "strong" agents? You refer to humans as weak agents at the start, yet claim later that AIs should have human inductive biases from the beginning, which makes me think humans are the strong ones.
I appreciate you making the point about Boltzmann rationality. Indeed, I think this is where my lack of familiarity in actually implementing IRL systems begins to show. Would it be fair to claim that, even with a model taking into account the fact that humans aren't perfect, it still assumes that there is an ultimate human reward function? So then the error model would just be seen as another tool to help the system get at this reward function. The system assumes that humans don't always act according to this
Could you clarify who you're referring to by "strong" agents? You refer to humans as weak agents at the start, yet claim later that AIs should have human inductive biases from the beginning, which makes me think humans are the strong ones.