bogus comments on Open thread, Jan. 25 - Jan. 31, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (169)
If I understood correctly, this is only the first stage in the training of the policy network. Then (quoting from Nature):
Except that they don't seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.