RaelwayScot comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (122)
Credit assignment and reward delay are nonexistent? What do you think happens when one diffs the board strength of two potential boards?
"Nonexistent problems" was meant as a hyperbole to say that they weren't solved in interesting ways and are extremely simple in this setting because the states and rewards are noise-free. I am not sure what you mean by the second question. They just apply gradient descent on the entire history of moves of the current game such that expected reward is maximized.
It seems to me that the problem of value assignment to boards--"What's the edge for W or B if the game state looks like this?" is basically a solution to that problem, since it gives you the counterfactual information you need (how much would placing a stone here improve my edge?) to answer those questions.
I agree that it's a much simpler problem here than it is in a more complicated world, but I don't think it's trivial.