bogus comments on Open thread, Jan. 25 - Jan. 31, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (169)
lol no. The pruning ('policy') network is entirely the result of supervised learning from human games. The other network is used to evaluate game states.
Your other ideas are more interesting, but they are not related to AlphaGo specifically, just deep neural networks.
If I understood correctly, this is only the first stage in the training of the policy network. Then (quoting from Nature):
Except that they don't seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.