You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

V_V comments on "Human-level control through deep reinforcement learning" - computer learns 49 different games - Less Wrong Discussion

11 Post author: skeptical_lurker 26 February 2015 06:21AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (19)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 27 February 2015 03:48:40PM 0 points [-]

I was under the impression that training the whole network with gradient decent was impossible, because the propagated error becomes infinitesimally small.

If you do it naively, yes. But researches figured out how to attack that problem from multiple angles: from the choice of the non-linear activation function, to specifics of the optimization algorithm, to the random distribution used to sample the initial weights.

Do you have a link about how they managed to train the whole network?

The batch normalization paper cited above is one example of that.