gwern comments on AlphaGo versus Lee Sedol - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (183)
It is also interesting to know the size of Alphago.
Wiki says: "The distributed version in October 2015 was using 1,202 CPUs and 176 GPUs (and was developed by teem of 100 scientists). Assuming that it was best GPU on the market in 2015, with power around 1 teraflop, total power of AlphaGO was around 200 teraplop or more. (I would give it 100 Teraflop - 1 Petaflop with 75 probability estimate). I also think that the size of the program is around terabytes, but only conclude it from the number of computers in use.
This could provide us with minimal size of AI on current level of technologies. Fooming for such AI will be not easy as it would require sizeable new resources and rewriting of it complicated inner structure.
And it is also not computer virus size yet, so it can't run away. A private researcher probably don't have such computational resources, but hacker could use botnet
But if such AI will be used to create more effective master algorithms, it may foom.
I don't think that's true. The distributed system for playing is using multiple copies of the CNN value network so each one can do board evaluation during the MCTS on its own without the performance disaster of sending it over the network to a GPU or something crazy like that, not a single one sharded over two hundred servers (CPU!=computer). Similarly for training: each was training the same network in parallel, not 1/200th of the full NN. (You could train something like AlphaGo on your laptop's GPU, it'd just take like 2 years by their wallclock numbers.)
The actual CNN is going to be something like 10MB-1GB, because more than that and you can't fit it on 1 GPU to do training. Reading the paper, it seems to be fairly comparable in size to ImageNet competitors:
So 500M would be a reasonable guess if you don't want to work out how many parameters that 13-layer network translates to. Not large at all, and model compression would at least halve that.
200 GPUs is not that expensive. Amazon will rent you 1 GPU at spot for ~$0.2/hour, so <$1k/day.
Thanks for clarification. If size is rally 500 MB it could easily be stolen or run away, and in 1к a day seems affordable to dedicating hacker.