duwease comments on Reinforcement Learning: A Non-Standard Introduction (Part 1) - Less Wrong

20 Post author: royf 29 July 2012 12:13AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (18)

You are viewing a single comment's thread.

Comment author: duwease 02 August 2012 04:52:06PM 0 points [-]

I'm having a hard time understanding what the arrows from W-node to W-node and M-node to M-node represent in the chess example, given the premise that the world and memory states take turns changing.

If I understand correctly, W is the board state at the start of the player's turn, and M is the state of the memory containing the model of the board and possible moves/outcomes. W(t) is the state that precedes M(t), and likewise the action resulting from the completion of remodelling the memory at M(t), plus the opposing player's action, results in new world state W(t+1).

This interpretation seems to suggest a simple, linear, linked list of alternating W and M nodes instead of the idea that, for example, the W(t-1) node is the direct precursor to W(t). The reason being, it seems that one could generate W(t) simply from the memory model in M(t-1), regardless of what W(t-1) was.. and the same goes for M(t) and W(t-1).

Perhaps it's that the arrow from one W-node to another does not represent the causal/precursor relationship that a W-node to M-node arrow represents, but a different relationship? If so, what is that relationship? Sorry if this seems picky, but I do think that the model is causing some confusion as to whether I properly understand your point.

Comment author: Johnicholas 02 August 2012 06:09:50PM 0 points [-]

The arrows all mean the same thing, which is roughly 'causes'.

Chess is a perfect-information game, so you could build the board entirely from the player's memory of the board, but in general, the state of the world at time t-1, together with the player, causes the state of the world at time t.

Comment author: duwease 02 August 2012 06:34:47PM 0 points [-]

Ah, so what we're really talking about here is situations where the world state keeps changing as the memory builds its model.. or even just a situation where the memory has an incomplete subset of the world information. Reading the second article's example, which makes the limitations of the memory explicit, I understand. I'd say the chess example is a bit misleading in this case, as the discrepancies between the memory and world are a big part of the discussion -- and as you said, chess is a perfect-information game.