Wei_Dai comments on Does Solomonoff always win? - Less Wrong

11 Post author: cousin_it 23 February 2011 08:42PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (55)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 23 February 2011 11:45:43PM 0 points [-]

counted over world-programs that "contain" corresponding agent.

How do you formalize this? I couldn't figure it out when I tried this.

Comment author: Vladimir_Nesov 24 February 2011 12:10:02AM *  0 points [-]

Select the worlds whose world history is ambiently controlled by the agent, that is the ambient dependence is non-constant, the conclusion of which world-history is implemented by given world-program depends on which strategy we assume the agent implements. Then read out the utility of reward channel from that strategy in that world.

Hmm... This is problematic if the same world contains multiple agent-instances that received different rewards (by following the same strategy but encountering different observations). What is the utility of such a world? This is a necessary question of specifying the decision problem. Perhaps it is a point where the notion of reinforcement learning breaks.