"Human-level control through deep reinforcement learning" - computer learns 49 different games

skeptical_lurker

This seems like an impressive first step towards AGI. The games, like 'pong' and 'space invaders' are perhaps not the most cerebral games, but given that deep blue can only play chess, this is far more impressive IMO. They didn't even need to adjust hyperparameters between games.

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

full text

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

Regardless, its amazing how simple DNNs are. People have been working on computer vision and AI for about 60 years, and then a program like this comes along which is only around 500 lines of code, conceptually simple enough to explain to anyone with a reasonable mathematical background, but can nevertheless beat humans at a reasonable range of tasks.

Beware, there is a lot of non-obvious complexity in these models:
"Traditional" machine learning models (i.e. logistic regression, SVM, random forests) only have few hyperparameters and they are not terribly sensitive to their values, hence you can usually tune them coarsely and quickly.
These fancy deep neural networks can easily have tens, if not hundreds of hyperparameters, and they are often quite sensitive to them. A bad choice can easily make your training procedure quickly stop making progress (insufficient capacity/vanishing gradients) or diverge (exploding gradients) or converge to something which doesn't generalize well on unseen data (overfitting).
Finding a good choice of hyperparameters can be really a non-trivial optimization problem on its own (and a combinatorial one, since many of these hyperparameters are discrete and you can't really expect the model performances to depend monotonically on their values).
Unfortunately, in these DNN papers, especially the "better than humans" ones, hyperparameters values often seem to appear out of nowhere.
There is some research and tools to do that systematically, but it is not often discussed in the papers presenting novel architectures and results.

SVMs are pretty bad for hyperparameters too, if you want a simple model use random forests or naive bayes.

I struggle to see how DNNs can have hundreds of hyperparameters - looking at the code for the paper I linked to, they seem to have learning rate, 2 parameters for simulated annealing, weight cost and batch size. That's 5, not counting a few others which only apply to reinforcement learning DNNs. Admittedly, there is the choice of sigmoid/rectilinear, and of the number of neurons, layers and epocs, but these last few are largely determined by what hardw... (read more)

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19