In the same way that cells were understood to be indivisible, atomic units of biology hundreds of years ago--before the discovery of sub-cellular structures like organelles, proteins, and DNA--we currently understand features to be fundamental units of neural network representations that we are examining with tools like mechanistic interpretability.
This is not to say that the definition of what constitutes a "feature" is clear at all--in fact, its lack of consensus reflects the extremely immature (but exciting!) state of interpretability research tod...
This was an interesting read and points to a simple truth that I think is often forgotten: Newton's first law applies to basically everything in life, not just physical systems. The "resets" you describe are definitely valid but by no means a comprehensive list of "opposing" forces that can help drive you in the other direction to reverse your momentum (in a positive way). The two other main ones that I believe are missing, yet fundamental are:
- Diet: the food we eat affects our mental/emotional tendencies to procrastinate vs get things done through ...
On second thought, I agree that gazing at the cosmos is not a fair comparison: rather, I would compare mechanistic interpretability to the early experiments of the Dutch microbiologist van Leeuwenhoek as he first looked at protozoa and bacteria under a microscope.. They weren't the most accurate or informative experiments in the large scheme of things, but they were necessary for others to develop a more sophisticated understanding of biology.
It's very likely that the field of mechanistic interpretability will grow beyond simply examining weights in a mode...
It would be perverse to try to understand a king in terms of his molecular configuration, rather than in the contact between the farmer and the bandit. The molecules of the king are highly diminished phenomena, and if they have information about his place in the ecology, that information is widely spread out across all the molecules and easily lost just by missing a small fraction of them.
Agreed, but in the same vein that empirical observations and low-tech experiments gazing at the cosmos laid the foundation upon which we were able to build grander and mo...
I agree that it is dubious at the moment. I just think it's too early to tell and the field itself will undoubtedly grow in complexity over the coming years.
Your point about the spontaneity of cells forming stands, although I wasn't phrasing the analogy at the level of thermodynamics / physics.