Makes sense! Thanks! In that case, we can potentially reduce the width, which might (along with a smaller dataset) help scale saes to understanding mechanisms in big models?
Great work! Is there something like too narrow of a dataset? For refusal, what do you think happens if we specifically train on a bunch of examples that show signs refusal?
Can't agree more with this post! I used to be afraid of long notebooks but they are powerful in allowing me to just think.
Although while creating a script I tend to use "#%%" of vscode to run cells inside the script to test stuff. My notebooks usually contain a bunch of analysis code that don't need to be run, but should stay.