not at 79 (when you can't accidentally prune 78 because it's already on the board
Of course, but I can't remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.
There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.
The original paper in Nature about AlphaGo.
Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.
Discussion on Hacker News after AlphaGo's win of the first game.