I realized that my learning process for last n years was quite unproductive, seemingly because of my implicit belief that I should have full awareness of my state of learning.
I.e., when I tried to learn something complex I expected to come up with full understanding of the topic of the lesson right after the lesson. When I didn't get it, I abandoned the topic. And in reality it was more like:
Imagine the following reasoning of AI:
I am paperclip-maximizer. Human is a part of me. If human learns that I am paperclip-maximizer, they will freak out and I won't produce paperclips. But it would be detrimental for I and for human, as they are part of I. So I won't tell human about paperclips for humans' own good.
I don't have deep understanding of modern mechanistic interpretability field, but my impression is that MI should mostly explore more new possible methods instead of trying to scale/improve existing methods. MI is spiritually similar to biology and a lot of progress in biology came from development of microscopy, tissue staining, etc.
I don't think "hostile takeover" is a meaningful distinction in case of AGI. What exactly prevents AGI from pulling plan consisting of 50 absolutely legal moves which ends up with it as US dictator?