chegra comments on [video] Paul Christiano's impromptu tutorial on AIXI and TDT - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (13)
For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:
We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.
Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.