x

LESSWRONG
LW

ch271828n — LessWrong

ch271828n

ch271828n

Message

1

2y

ch271828n

Constructing Neural Network Parameters with Downstream Trainability

I have recently done some preliminary experiments related to mechanistic interpretability. It seems researchers in this subfield often post their results in the AI Alignment Forum / LessWrong Forum (e.g. causal scrubbing, emergent world representation, monosemanticity, etc), and AI Alignment Forum Q&A suggests to post in LessWrong. Therefore, I post...

Jul 31, 2024•1

Message

1 post

Member for 2 years

Constructing Neural Network Parameters with Downstream Trainability

2y

I have recently done some preliminary experiments related to mechanistic interpretability. It seems researchers in this subfield often post their results in the AI Alignment Forum / LessWrong Forum (e.g. causal scrubbing, emergent world representation, monosemanticity, etc), and AI Alignment Forum Q&A suggests to post in LessWrong. Therefore, I post it here to follow the convention as well as open source spirits, and maybe someone happens to find it a little bit useful.

Abstract: The pretrain-finetune paradigm is shown to be beneficial and is widely adopted. However, the reason why it works has not been fully understood, especially when using the lenses of mechanistic interpretability and constructions. From another perspective, pre-training is a... (read more)

1