"Human-level control through deep reinforcement learning" - computer learns 49 different games

skeptical_lurker

This seems like an impressive first step towards AGI. The games, like 'pong' and 'space invaders' are perhaps not the most cerebral games, but given that deep blue can only play chess, this is far more impressive IMO. They didn't even need to adjust hyperparameters between games.

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

full text

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

SVMs are pretty bad for hyperparameters too, if you want a simple model use random forests or naive bayes.

I struggle to see how DNNs can have hundreds of hyperparameters - looking at the code for the paper I linked to, they seem to have learning rate, 2 parameters for simulated annealing, weight cost and batch size. That's 5, not counting a few others which only apply to reinforcement learning DNNs. Admittedly, there is the choice of sigmoid/rectilinear, and of the number of neurons, layers and epocs, but these last few are largely determined by what hardware you have and how much time you are willing to spend training.

Having skimmed the paper you linked to, it seems they have hundreds of parameters because they are using a rather more complex network topology with SVMs fitting the neuron activation to the targets. And that's interesting in itself.

Unfortunately, in these DNN papers, especially the "better than humans" ones, hyperparameters values often seem to appear out of nowhere.

The general problem of hyperparameter values is one of the things that worries me about academia. So you have an effect (p1, which is an improvement I suppose.

Oh, and this paper was published in Nature.

There is some research and tools to do that systematically, but it is not often discussed in the papers presenting novel architectures and results.

I'd be surprised if this could work with DNNs - AKAIK, monte-carlo optimization, for instance, generally takes thousands of evaluations steps, yet with DNNs each evaluation step would require days of training, so it would require thousands of GPU-days. Indeed, the paper you linked to ran 1200 evaluations, so I'm guessing they had a lot of hardware.

SVMs are pretty bad for hyperparameters too

How so? Linear SVM main hyperparameter is the regularization coefficient. There is also the choice of loss and regularization penalty, but these are only a couple of bits.
Non-linear SVM has also the choice of the kernel (in practice it's either RBF or polynomial, unless you are working on special types of data such as strings or trees) and one or two kernel hyperparameters.

I struggle to see how DNNs can have hundreds of hyperparameters - looking at the code for the paper I linked to, they seem to have learnin

... (read more)

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19