jsteinhardt comments on [link] New essay summarizing some of my latest thoughts on AI safety - Less Wrong

14 Post author: Kaj_Sotala 01 November 2015 08:07AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (27)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 10 November 2015 08:29:52AM *  1 point [-]

Yeah I should be a bit more careful on number 4. The point is that many papers which argue that a given NN is learning "natural" representations do so by looking at what an individual hidden unit responds to (as opposed to looking at the space spanned by the hidden layer as a whole). Any such argument seems dubious to me without further support, since it relies on a sort of delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself. But I agree that if such an argument was accompanied by justification of why the training procedure or data noise or some other factor led to the symmetry being broken in a natural way, then I would potentially be happy.

Comment author: paulfchristiano 15 November 2015 01:15:19AM 0 points [-]

delicate symmetry-breaking which can only come from either the training procedure or noise in the data, rather than the model itself

I'm still not convinced. The pointwise nonlinearities introduce a preferred basis, and cause the individual hidden units to be much more meaningful than linear combinations thereof.

Comment author: jsteinhardt 15 November 2015 07:48:24AM 0 points [-]

Yeah; I discussed this with some others and came to the same conclusion. I do still think that one should explain why the preferred basis ends up being as meaningful as it does, but agree that this is a much more minor objection.