skeptical_lurker comments on AlphaGo versus Lee Sedol - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (183)
Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can't accidentally prune 78 because it's already on the board). But people have been saying that it made a mistake at 79.
I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
Of course, but I can't remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.