For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:
We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.
Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.
Paul Christiano was about to give a tutorial on AIXI and TDT, so I whipped out my iPhone and recorded it. His tutorial wasn't carefully planned or executed, but it may still be useful to some. Note that when Paul writes "UDT" on a piece of paper he really meant "TDT." :)
HD video download links: 1, 2.