philh comments on AlphaGo versus Lee Sedol - Less Wrong

17 Post author: gjm 09 March 2016 12:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (183)

You are viewing a single comment's thread. Show more comments above.

Comment author: philh 15 March 2016 01:19:00PM 0 points [-]

Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can't accidentally prune 78 because it's already on the board). But people have been saying that it made a mistake at 79.

I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

Comment author: skeptical_lurker 15 March 2016 01:40:10PM 0 points [-]

not at 79 (when you can't accidentally prune 78 because it's already on the board

Of course, but I can't remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?

I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.