"Human-level control through deep reinforcement learning" - computer learns 49 different games

skeptical_lurker

This seems like an impressive first step towards AGI. The games, like 'pong' and 'space invaders' are perhaps not the most cerebral games, but given that deep blue can only play chess, this is far more impressive IMO. They didn't even need to adjust hyperparameters between games.

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

full text

I'd also like to see whether they can train a network that plays the same game on different maps without re-training, which seems a lot harder.

Really? I was under the impression that training the whole network with gradient decent was impossible, because the propagated error becomes infinitesimally small. In fact, I thought that training layers individually was the insight that made DNNs possible.

Do you have a link about how they managed to train the whole network?

I was under the impression that training the whole network with gradient decent was impossible, because the propagated error becomes infinitesimally small.

If you do it naively, yes. But researches figured out how to attack that problem from multiple angles: from the choice of the non-linear activation function, to specifics of the optimization algorithm, to the random distribution used to sample the initial weights.

Do you have a link about how they managed to train the whole network?

The batch normalization paper cited above is one example of that.

9jkrause11y

That was indeed one of the hypotheses about why it was difficult to train the networks - the vanishing gradient problem. In retrospect, one of the main reasons why this happened was the use of saturating nonlinearities in the network -- nonlinearities like the logistic function or tanh which asymptote at 1. Because they asymptote, their derivatives always end up being really small, and the deeper your network the more this effect compounds. The first large-scale network that fixed this was by Krizhevsky et al., which used a Rectified Linear Unit (ReLU) for their nonlinearity, given by f(x) = max(0, x). The earliest reference I can find to using ReLUs is Jarrett et al., but since Krizhevsky's result pretty much everyone uses ReLUs (or some variant thereof). In fact, the first result I've seen showing that logistic/tanh nonlinearities can work is the batch normalization paper Sean_o_h linked, which gets around the problem by normalizing the input to the nonlinearity, which presumably prevents the units from saturating too much (though this is still an open question).

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19

19

"Human-level control through deep reinforcement learning" - computer learns 49 different games

19

19