skeptical_lurker comments on AlphaGo versus Lee Sedol - Less Wrong

17 Post author: gjm 09 March 2016 12:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (183)

You are viewing a single comment's thread. Show more comments above.

Comment author: skeptical_lurker 15 March 2016 01:40:10PM 0 points [-]

not at 79 (when you can't accidentally prune 78 because it's already on the board

Of course, but I can't remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?

I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.