ChristianKl comments on AlphaGo versus Lee Sedol - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (183)
I think its policy net was only trained on amateurs, not professionals or self-play, making it a little weak. Normally, I suppose that reading large numbers of game trees compensates, but the odds of Lee making his brilliant move 78 (and one other move, but I can't remember which) were 1/10000, so I think that AG never even analysed the first move of that sequence.
In other words:
I wonder if Google could publish a sgf showing the most probable lines of play as calculated at each move, as well as the estimated probability of each of Lee's moves?
I wonder if the best thing to do would be to train nets on: strong amateur games (lots of games, but perhaps lower quality moves?); pro games (fewer games but higher quality?); and self-play (high quality, but perhaps not entirely human-like?) and then take the average of the three nets?
Of course, this triples the GPU cycles needed, but it could perhaps be implemented just for the first few moves in the game tree?
I don't think the issue is that 78 was a human like move. It's just a move that's hard to see both for humans and non-humans.