You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gunnar_Zarncke comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning - Less Wrong Discussion

14 Post author: ESRogs 27 January 2016 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (122)

You are viewing a single comment's thread. Show more comments above.

Comment author: Gunnar_Zarncke 28 January 2016 10:31:35PM 2 points [-]

There are other big deals. The MS ImageNet win also contained frightening progress on the training meta level.

The other issue is that constructing this kind of mega-neural net is tremendously difficult. Landing on a particular set of algorithms—determining how each layer should operate and how it should talk to the next layer—is an almost epic task. But Microsoft has a trick here, too. It has designed a computing system that can help build these networks.

As Jian Sun explains it, researchers can identify a promising arrangement for massive neural networks, and then the system can cycle through a range of similar possibilities until it settles on this best one. “In most cases, after a number of tries, the researchers learn [something], reflect, and make a new decision on the next try,” he says. “You can view this as ‘human-assisted search.'”

-- extracted from very readable summary at wired: http://www.wired.com/2016/01/microsoft-neural-net-shows-deep-learning-can-get-way-deeper/

Comment author: gwern 29 January 2016 12:29:56AM *  2 points [-]

Going by that description, it is much much less important than residual learning, because hyperparameter optimization is not new. There are a lot of approaches: grid search, random search, Gaussian processes. Some hyperparameter optimizations baked into MSR's deep learning framework would save some researcher time and effort, certainly, but I don't know that it would've made any big difference unless they have something quite unusual going one.

(I liked one paper which took a Bayesian multi-armed bandit approach and treated error curves as partial information about final performance, and it would switch between different networks being trained based on performance, regularly 'freezing' and 'thawing' networks as the probability each network would become the best performer changed with information from additional mini-batches/epoches.) Probably the single coolest one is last year some researchers showed that it is possible to somewhat efficiently backpropagate on hyperparameters! So hyperparameters just become more parameters to learn, and you can load up on all sorts of stuff without worrying about it making your hyperparameter optimization futile or having to train a billion times, and would both save people a lot of time (for using vanilla networks) and allow exploring extremely complicated and heavily parameterized families of architectures, and would be a big deal. Unfortunately, it's still not efficient enough for the giant networks we want to train. :(

Comment author: Gunnar_Zarncke 29 January 2016 01:00:57PM 1 point [-]

The key point is that machine learning starts to happen at the hyper-parameter level. Which is one more step toward systems that optimize themselves.

Comment author: gwern 29 January 2016 05:02:02PM 1 point [-]

A step which was taken a long time ago and does not seem to have played much of a role in recent developments; for the most part, people don't bother with extensive hyperparameter tuning. Better initialization, better algorithms like dropout or residual learning, better architectures, but not hyperparameters.