Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
This is a project submission post for the AI Safety Fundamentals course from BlueDot Impact. Therefore, some of its sections are intended to be beginner-friendly and overly verbose for familiar readers (mainly the Introduction section) and may freely be skipped. TLDR (Executive Summary) * We explored whether Sparse Autoencoders (SAEs)...
Sep 29, 202428