Comment Permalink

gjm6y40

Either your understanding is correct or mine isn't: AlphaGo Zero and AlphaZero _do_ do a tree search that the DM papers call "Monte Carlo Tree Search" but that doesn't involve actual Monte Carlo playouts and that doesn't match e.g. the description on the Wikipedia page about MCTS.

31

[ Question ]

Is AlphaZero any good without the tree search?

by Steven Byrnes

30th Jun 2019

1 min read

1 9

31

One component of AlphaZero is a neural net which takes a board position as input, and outputs a guess about how good the position is and what a good next move would be. It combines this neural net with Monte Carlo Tree Search (MCTS) that plays out different ways the game could go, before choosing the move. The MCTS is used both during self-play to train the neural net, and during competitive test-time. I'm mainly curious about whether the latter is necessary.

So my question is: Once you have the fully-trained AlphaZero system, if you then turn off the MCTS and just choose moves directly with the neural net policy head, is it any good? Is it professional-level, amateur-level, child-level?

(I think this would be a fun little data-point related to discussions of how powerful an AI can be with and without mesa-optimization / search-processes using a generative environmental model.)

New to LessWrong?

31

Mentioned in

29The Unreasonable Feasibility Of Playing Chess Under The Influence

Is AlphaZero any good without the tree search?

New Answer

New Comment

1 Answers sorted by
top scoring

gwern

Jun 30, 2019

140

The paper includes the ELO for just the NN. I believe it's professional level but not superhuman, but you should check if you really need to know. However, note that Alphazero's actual play doesn't use MCTS at all, it uses a simple tree search which only descends a few ply.

[-]Steven Byrnes6y50

Thanks for your answer! But I'm afraid I'm confused on both counts.

I couldn't, and still can't, find "ELO for just the NN" in the paper... :-( I checked the arxiv version and preprint version.

As for "actual play doesn't use MCTS at all", well the authors say it does use MCTS... Am I misunderstanding the authors, or are you saying that the "thing the authors call MCTS" is not actually MCTS? (For example, I understand that it's not actually random.)

7gwern6y

You want the original 'AlphaGo Zero' paper, not the later 'AlphaZero' papers, which merely simplify it and reuse it in other domains; the AGZ paper is more informative than the AZ papers. See Figure 6b, and pg25 for the tree search details: So the raw NN - a single forward pass and selecting the max - is 3k ELO, about 100 ELO under AlphaGo Fan, which soundly defeated a human professional (Fan Hui). I'm not sure whether −100 ELO is enough to demote it to 'amateur' status, but it's at least clearly not that far from professional in the worst case. EDIT: for a much more thorough and rigorous discussion of how you can exchange training for runtime tree search, see Jones 2021; this lets you calculate how much you'd have to spend to train a (probably larger) AlphaZero to close that 100 ELO gap, or to try to get up to 4,858 ELO with solely a forward pass and no search.

4gjm6y

[-]Douglas_Knight6y30

But it does use MCTS in training. You might say that it uses MCTS to generate a better player to learn from.

7gwern6y

Sure. But the final player does not use MCTS, and it's interesting that it's not necessary then. (It's even more interesting that the way they discovered they didn't need MCTS is by hyperparameter optimization, but that's a different discussion.)

[-]David Fendrich6y00

This is incorrect. It is International Master-level without tree search. Good amateur, but there are >1000 players in the world that are better.

And it is neither MCTS or a "simple tree search", it uses PUCT, often calculating very deeply in a few lines.

2dxu6y

International masters are emphatically not amateurs. Indeed, IMs are at the level where they can offer coaching services to amateur players, and reasonably expect to be paid something on the order of $100 per session. To elaborate on this point: The total number of FIDE-rated chess players is over 500,000. The number of IMs, meanwhile, totals less than 3,000. IMs are quite literally in the 99th percentile of chess ability, and that's actually being extremely restrictive with the population--there are many casual players who don't have FIDE ratings at all, since only people who play in at least one FIDE-rated tournament will be assigned a rating.

2gwern6y

I didn't say anything about chess or shogi because I don't recall any ablation for A0, I just remember the one in the AG0 paper for Go. The AG0 is definitely at or close to professional level and better than 'good amateur'. And I would consider a non-distributed PUCT with no rollouts or other refinements to be a 'simple tree search': it doesn't do any rollouts, and the depth is seriously limited by running on only a single machine w/4 TPUs with a few seconds for search: as the AG0 paper puts it, "Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts...we chose to use the simplest possible search algorithm".

Rendering 0/8 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:05 AM

Moderation Log

31

[ Question ]

Is AlphaZero any good without the tree search?

31

New to LessWrong?

31

1 Answers sorted by top scoring

Jun 30, 2019

1 Answers sorted by
top scoring