Hello! I’ve made 2 quick improvements, mainly with prompt and tokens.
TL;DR I changed the prompt to
prompt = '<start_of_turn>user\n "<unk>"?<end_of_turn>\n<start_of_turn>model\n "<unk>" "'
I noticed that the scales were being affected by prior words in the prompt/context itself. I tried out feature 4088 and replaced some words. For example, replacing ”word”with "concept" and "number" resulted in slightly different explanations at the higher scale. Intuitively, I s...
I was thinking about the practical implication of this. As others have mentioned, models in production pretty much all use the prompt "you are an AI assistant". From a model training perspective, it makes sense to build with this assumption in mind.
However, it occurs to me that I have never explicitly referred to any of my AI assistants as an AI assistant. Instead, I treat them more as an inner monologue, and I suspect many other users do this as well. If the AI makes an error, I essentially correct them the way I would correct my own inner monologue...
[crossposted from EA Forum, to emphasise an important point. hope that's OK! will delete if it isn't]
How do we prevent the methodology of exclusively seeking and publishing negative information, without fact checking, from becoming an acceptable norm?
Adding on as former Nonlinear intern who was aware of a “falling out” between Alice and Nonlinear for almost a year now:
Further optimisation Update Log for 28th August:
I am working from here: Minh's Copy of Gemma SAE self-explanation - Colab (google.com)
What worked: Multi-Feature Combination and Replacing Earlier Layers
... (read more)Multi-feature combination works! I managed to combine feature 7656 ("France") and feature 7154 ("capital cities") from Neuronpedia's Gemma-1-2B [1] feature directory to elicit outputs for Paris, France. I'm just taking the sum of the vectors and dividing to find average, so this should work same as before even if you have 1 feature. Weighing should