chegra comments on [video] Paul Christiano's impromptu tutorial on AIXI and TDT - Less Wrong

7 Post author: lukeprog 19 March 2012 05:20PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (13)

You are viewing a single comment's thread.

Comment author: chegra 15 November 2013 11:17:32PM *  0 points [-]

For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:

  1. We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.

  2. Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.