Vaniver comments on AlphaGo versus Lee Sedol - Less Wrong

17 Post author: gjm 09 March 2016 12:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (183)

You are viewing a single comment's thread.

Comment author: Vaniver 09 March 2016 02:01:50PM 5 points [-]

Amazing match. Well worth staying up to 2 AM to watch.

Comment author: Vaniver 09 March 2016 02:35:27PM 10 points [-]

Several things I thought were interesting:

  1. The commentator (on the Deepmind channel) calling out several of AlphaGo's moves as conservative. Essentially, it would play an additional stone to settle or augment some group that he wouldn't necessarily have played around. What I'm curious about is how much this reflects an attempt by AlphaGo to conserve computational resources. "I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points." (It wouldn't be verbalized like that, since it's not thinking verbally, but you can get this effect naturally from the tree search and position evaluator.)

  2. Both players took a long time to play "obvious" moves. (Typically, by this I mean something like a response to a forced move.) 이 sometimes didn't--there were a handful of moves he played immediately after AlphaGo's move--but I was still surprised by the amount of thought that went into some of the moves. This may be typical for tournament play--I haven't watched any live before this.

  3. AlphaGo's willingness to play aggressively and get involved in big fights with 이, and then not lose. I'm not sure that all the fights developed to AlphaGo's advantage, but evidently enough of them did by enough.

  4. I somewhat regret 이 not playing the game out to the end; it would have been nice to know the actual score. (I'm sure estimates will be available soon, if not already.)

Comment author: V_V 09 March 2016 04:29:21PM 7 points [-]

What I'm curious about is how much this reflects an attempt by AlphaGo to conserve computational resources.

If I understand correctly, at least according to the Nature paper, it doesn't explicitly optimize for this. Game-playing software is often perceived as playing "conservatively", this is a general property of minimax search, and in the limit the Nash equilibrium consists of maximally conservative strategies.

but I was still surprised by the amount of thought that went into some of the moves.

Maybe these obvious moves weren't so obvious at that level.

Comment author: Error 09 March 2016 06:16:03PM 3 points [-]

I don't know about that level, but I can think of at least one circumstance where I think far longer than would be expected over a forced move. If I've worked out the forced sequence in my head and determined that the opponent doesn't gain anything by it, but they play it anyway, I start thinking "Danger, Danger, they've seen something I haven't and I'd better re-evaluate."

Most of the time it's nothing and they just decided to play out the position earlier than I would have. But every so often I discover a flaw in the "forced" defense and have to start scrabbling for an alternative.

Comment author: WalterL 09 March 2016 06:34:51PM 4 points [-]

This is very true in Go. If you are both playing down a sequence of moves without hesitation, anticipating a payoff, one of you is wrong (kind of. It's hard to put in words.) It is always worth making double sure that it isn't you.

Comment author: Vaniver 09 March 2016 07:20:34PM 2 points [-]

Maybe these obvious moves weren't so obvious at that level.

Sure. And I'm pretty low as amateurs go--what I found surprising was that there were ~6 moves where I thought "obviously play X," and 이 immediately played X in half of them and spent 2 minutes to play X in the other half of them. It wasn't clear to me if 이 was precomputing something he would need later, or was worried about something I wasn't, or so on.

Most of the time I was thinking something like "well, I would play Y, but I'm pretty unconfident that's the right move" and then 이 or AlphaGo play something that are retrospectively superior to Y, or I was thinking something like "I have only the vaguest sense of what to do in this situation." So I guess I'm pretty well-calibrated, even if my skill isn't that great.

Comment author: SquirrelInHell 10 March 2016 01:40:49AM 3 points [-]

The commentator (on the Deepmind channel) calling out several of AlphaGo's moves as conservative. Essentially, it would play an additional stone to settle or augment some group that he wouldn't necessarily have played around. What I'm curious about is how much this reflects an attempt by AlphaGo to conserve computational resources. "I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points."

If the search tree is narrowed, it is narrowed for both players, so why would it be a gain?

Comment author: Vaniver 10 March 2016 01:45:15AM 6 points [-]

If the search tree is narrowed, it is narrowed for both players, so why would it be a gain?

There may be an asymmetry between successful modes of attack and successful modes of defense--if there's a narrow thread that white can win through, and a thick thread that black can threaten through, then white wins computationally by closing off that tree.

But thanks for asking: I was confused somewhat because I was thinking about AI vs. human games, but the AI is trained mostly on human vs. human and AI vs. AI games, neither of which will have the AI vs. human feature. Well, except for bots playing on KGS.

Comment author: Vaniver 21 March 2016 06:22:56PM 0 points [-]

But thanks for asking: I was confused somewhat because I was thinking about AI vs. human games, but the AI is trained mostly on human vs. human and AI vs. AI games, neither of which will have the AI vs. human feature. Well, except for bots playing on KGS.

As it turns out, we learned later that Fan Hui started working with Deepmind on AlphaGo after their match, and played a bunch of games against it as it improved. So it did have a number of AI vs. human training games.

Comment author: gjm 09 March 2016 04:51:14PM 2 points [-]

I'm sure estimates will be available soon

I saw some blog comment from someone claiming to be (IIRC) an amateur 3-4 dan -- i.e., good enough to estimate this sort of thing pretty well -- reckoning probably 3.5 or 4.5 points in white's favour. That would be after the komi of 7.5 points given to white as compensation for moving second, or so I assume from the half-points in the figure. So that would correspond to black being ahead by 3-4 points before komi.

Comment author: ChristianKl 10 March 2016 09:04:58AM 1 point [-]

I somewhat regret 이 not playing the game out to the end; it would have been nice to know the actual score. (I'm sure estimates will be available soon, if not already.)

That wouldn't have given you the actual score as AlphaGo didn't care to maximize the score in the endgame.

Comment author: ChristianKl 10 March 2016 09:05:17AM 0 points [-]

Both players took a long time to play "obvious" moves. (Typically, by this I mean something like a response to a forced move.)

Which specific moves do you mean?

Comment author: Vaniver 10 March 2016 07:34:48PM 0 points [-]

I would have to rewatch the game, since the easily available record doesn't have the time it took them to make each move.

Comment author: ChristianKl 09 March 2016 04:58:53PM 0 points [-]

"I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points."

No. 2 points is a lot at that level. If the commentator would think a move cost 2 points he wouldn't call it conversative but he would call it an error.

Not playing out every move is more about keeping aji open and not wasting possible ko threads. Unfortunately I don't know how to translate aji into English.

Comment author: Vaniver 09 March 2016 07:28:32PM *  1 point [-]

No. 2 points is a lot at that level. If the commentator would think a move cost 2 points he wouldn't call it conversative but he would call it an error.

I think B actually results in more points overall, which is why it would play it; my curiosity is what fraction is due to direct effects vs. indirect effects.

For example, one could imagine the board position evaluation function being different for different timing schemes. If you're playing a blitz game where both players have 10 seconds to play each turn, some positions might move from mildly favoring black to strongly favoring black because white needs to do a bunch of thinking to navigate the game tree successfully.

Comment author: ChristianKl 10 March 2016 09:21:56AM 0 points [-]

It's no blitz game and there plenty of time to think through moves.

Just for the record at my prime I used to play Go at around 2 kyu.

Comment author: polymathwannabe 09 March 2016 07:11:48PM 0 points [-]

I understand aji as potential for future moves that is currently not too usable but may be after the board configuration has evolved.

Comment author: ChristianKl 10 March 2016 12:47:39PM 0 points [-]

It goes in that direction but moves don't have to be used directly to constrain movements elsewhere on the board.

When playing around with Fold.it there was a similar scenario. It's often possible to run a script to get a higher local maxima. However that made the fold more "rigid". The experienced folders did only run the script to search the local maximas at the end when they manually did everything that could be done. With my usage of Go vocabulary running the script to optimize locally beforehand would also be a case of aji-keshi.

Aji is for me a phenomological primitive that I learned while playing Go and that I can use outside of Go but which doesn't have an existing English or German word.

Comment author: Vaniver 10 March 2016 02:01:37AM *  0 points [-]

The way I think about aji is something fragile on a ledge--sure, it's safe now, but as things shift around, it may suddenly become unsafe.