I have come across various people (including my past self) who meet up regularly to study, e.g., alignment forum posts and discuss them. This helps people bond over their common believes, fears, and interests, which I think is good, but in no way is this ever going to lead anyone to find a solution to the alignment problem. In this post I'll reason why this doesn't help, and what I think we should do instead.
The cult
Reading good papers can be fun. You learn something interesting and, if the topic is hard but well presented by the authors, you get a kick from finally understanding something complicated. But is what you learned actually... (read 1260 more words →)
Hi jylin04. Fantastic post! It touches on many more aspects of interpretability than my post about the book. I also enjoyed your summary PDF!
I'd love to contribute to any theory work in this direction, if I can. Right now I'm stuck around p. 93 of the book. (I've read everything, but I'm now trying to re-derive the equations and have trouble figuring out where a certain term goes. I am also building a Mathematica package that takes care of some of the more tedious parts of the calculations.) Maybe we could get in touch?