AlphaGo versus Lee Sedol

gjm

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.

The original paper in Nature about AlphaGo.

Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.

Discussion on Hacker News after AlphaGo's win of the first game.

I think its policy net was only trained on amateurs, not professionals or self-play, making it a little weak. Normally, I suppose that reading large numbers of game trees compensates, but the odds of Lee making his brilliant move 78 (and one other move, but I can't remember which) were 1/10000, so I think that AG never even analysed the first move of that sequence.

In other words:

David Ormerod of GoGameGuru stated that although an analysis of AlphaGo's play around 79–87 was not yet available, he believed it was a result of a known weakness in play algorithms which use Monte Carlo tree search. In essence, the search attempts to prune sequences which are less relevant. In some cases a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar".[56]

I wonder if Google could publish a sgf showing the most probable lines of play as calculated at each move, as well as the estimated probability of each of Lee's moves?

I wonder if the best thing to do would be to train nets on: strong amateur games (lots of games, but perhaps lower quality moves?); pro games (fewer games but higher quality?); and self-play (high quality, but perhaps not entirely human-like?) and then take the average of the three nets?

Of course, this triples the GPU cycles needed, but it could perhaps be implemented just for the first few moves in the game tree?

I don't think the issue is that 78 was a human like move. It's just a move that's hard to see both for humans and non-humans.

0philh10y

Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can't accidentally prune 78 because it's already on the board). But people have been saying that it made a mistake at 79. I don't recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

30

AlphaGo versus Lee Sedol

30

30

30

AlphaGo versus Lee Sedol

30

30