AlphaGo versus Lee Sedol

gjm

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can't accidentally prune 78 because it's already on the board). But people have been saying that it made a mistake at 79.

I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

not at 79 (when you can't accidentally prune 78 because it's already on the board

Of course, but I can't remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?

I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.

30

AlphaGo versus Lee Sedol

30

30

30

AlphaGo versus Lee Sedol

30

30