I read the latter one, and from a brief glance I think the first one is essentially the same paper. It uses some tricks to get a relatively efficient algorithm in the special case where all the agent has to do is recognize some simple patterns in the environment, somewhat simpler than regular expressions. It would never be able to learn that, in general, the distance the ball bounces up is 50% the distance that it fell, but if the falling distance were quantized and the bouncing distance were quantized and there was a maximum height the ball could fall ...
I searched the posts but didn't find a great deal of relevant information. Has anyone taken a serious crack at it, preferably someone who would like to share their thoughts? Is the material worthwhile? Are there any dubious portions or any sections one might want to avoid reading (either due to bad ideas or for time saving reasons)? I'm considering investing a chunk of time into investigating Legg's work so any feedback would be much appreciated, and it seems likely that there might be others who would like some perspective on it as well.