bogus comments on Open thread, Jan. 25 - Jan. 31, 2016 - Less Wrong

3 Post author: username2 25 January 2016 09:07PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (169)

You are viewing a single comment's thread. Show more comments above.

Comment author: bogus 01 February 2016 08:04:53PM 0 points [-]

The second stage of the training pipeline aims at improving the policy network by policy gradient reinforcement learning (RL).

Except that they don't seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.