SilentCal comments on Goal completion: the rocket equations - Less Wrong

4 Post author: Stuart_Armstrong 20 January 2016 01:54PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (13)

You are viewing a single comment's thread.

Comment author: SilentCal 21 January 2016 06:16:18PM 1 point [-]

I think we need to provide some kind of prior regarding unknown features of model and reward if we want the given model and reward to mean anything. Otherwise, for all the AI knows, the true reward has a +2-per-step term that reverses the reward-over-time feature. It can still infer the algorithm generating the sample trajectories, but the known reward is no help at all in doing so.

I think what we want is for the stated reward to function as a hint. One interpretation might be to expect that the stated reward should approximate the true reward well over the problem and solution domains humans have thought about. This works in, for instance, the case where you put an AI in charge of the paper clip factory with the stated reward '+1 per paper clip produced'.

Comment author: Stuart_Armstrong 22 January 2016 10:19:39AM 0 points [-]

Indeed. But I want to see if I can build up to this in the model.