I read the latter one, and from a brief glance I think the first one is essentially the same paper. It uses some tricks to get a relatively efficient algorithm in the special case where all the agent has to do is recognize some simple patterns in the environment, somewhat simpler than regular expressions. It would never be able to learn that, in general, the distance the ball bounces up is 50% the distance that it fell, but if the falling distance were quantized and the bouncing distance were quantized and there was a maximum height the ball could fall from, it could eventually learn all possible combinations.
They also gave up on the idea of experimentation not being a special case for AIXI and instead use other heuristics to decide how much to experiment and how often to do take the best known action for a short-term reward.
I believe they're doing the best they can, and for all I know it might be state of the art, but it isn't general intelligence.
I searched the posts but didn't find a great deal of relevant information. Has anyone taken a serious crack at it, preferably someone who would like to share their thoughts? Is the material worthwhile? Are there any dubious portions or any sections one might want to avoid reading (either due to bad ideas or for time saving reasons)? I'm considering investing a chunk of time into investigating Legg's work so any feedback would be much appreciated, and it seems likely that there might be others who would like some perspective on it as well.