bogus comments on Open thread, Jan. 25 - Jan. 31, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (169)
Those that follows are random spurts of ideas that emerged when thinking at AlphaGo. I make no claim of either validity, soundness or even sanity. But they are random interesting directions that are fun for me to investigate, and they might turn out interesting for you too:
lol no. The pruning ('policy') network is entirely the result of supervised learning from human games. The other network is used to evaluate game states.
Your other ideas are more interesting, but they are not related to AlphaGo specifically, just deep neural networks.
If I understood correctly, this is only the first stage in the training of the policy network. Then (quoting from Nature):
Except that they don't seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.