gwern comments on AlphaGo versus Lee Sedol - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (183)
I don't think that's true. The distributed system for playing is using multiple copies of the CNN value network so each one can do board evaluation during the MCTS on its own without the performance disaster of sending it over the network to a GPU or something crazy like that, not a single one sharded over two hundred servers (CPU!=computer). Similarly for training: each was training the same network in parallel, not 1/200th of the full NN. (You could train something like AlphaGo on your laptop's GPU, it'd just take like 2 years by their wallclock numbers.)
The actual CNN is going to be something like 10MB-1GB, because more than that and you can't fit it on 1 GPU to do training. Reading the paper, it seems to be fairly comparable in size to ImageNet competitors:
So 500M would be a reasonable guess if you don't want to work out how many parameters that 13-layer network translates to. Not large at all, and model compression would at least halve that.
200 GPUs is not that expensive. Amazon will rent you 1 GPU at spot for ~$0.2/hour, so <$1k/day.
Thanks for clarification. If size is rally 500 MB it could easily be stolen or run away, and in 1к a day seems affordable to dedicating hacker.