All of jem-mosig's Comments + Replies

Hi jylin04. Fantastic post! It touches on many more aspects of interpretability than my post about the book. I also enjoyed your summary PDF!

I'd love to contribute to any theory work in this direction, if I can. Right now I'm stuck around p. 93 of the book. (I've read everything, but I'm now trying to re-derive the equations and have trouble figuring out where a certain term goes. I am also building a Mathematica package that takes care of some of the more tedious parts of the calculations.) Maybe we could get in touch?

The predictions laid out in the book are mostly about how to build a perceptron such that representation learning works well in practice, and that the generalisation error gets minimised. For example,

  1. When you train with (stochastic) gradient descent, you have to scale the learning rate differently for different layers and also differently for weights and biases. The theory tells you specifically how to scale them, and how this depends on activation functions. If you don't do that, the theory predicts among other things that the change in performance of you
... (read more)

To be clear: I don't have strong confidence that this works, but I think this is something worth exploring.

One more thing I should probably have added: I am only talking about the distributional shift in input data, which is important. But I think Eliezer is also talking about another kind of distributional shift that comes from a change in ontology. I am confused about how to think of this. Intuitively it is "the world hasn't changed, just how I look at it", whereas I discuss "the world has changed" (because the agent is doing things that haven't occurred during training). 

I think 3blue1brown's videos give a good first introduction about neural nets (the "atomic" description): 

Does this help?

I did not write down the list of quantities because you need to go through the math to understand most of them. One very central object is the neural tangent kernel, but there are also algorithm projectors, universality classes, etc., each of which require a lengthy explanation that I decided to be beyond the scope of this post.

Hmm, you may be right, sorry. I somehow read the opaqueness problem as a sub-problem of lie-detection. To do lie-detection we need to formulate mathematically what lying means, and for that we need theoretical understanding of what's going on in a neural net in the first place, so we have the right concepts to work with. 

I think lie-detection in general is very hard, although it might be tractable in specific cases. The general problem seems hard because I find it difficult to define lying mathematically. Thinking about it for five minutes I hit sever... (read more)

1ADifferentAnonymous
I would count that as substantial progress on the opaqueness problem.