User Comment Replies

Hello! A great write-up and fascinating investigation. Well done with such a great result from a hackathon.

I'm trying to understand your plot titled 'Proportion of Top Predictions that are " an" by Layer 31 Neuron 892 Activation'. Can you explain what the y-axis is in this plot? It's not clear what the y-axis is a proportion of.

I read through the code, but couldn't quite follow the logic for this plot. It seems that the y-axis is computed with these lines;

neuron_act_top_pred_proportions = [dict(sorted([(k / bin_granularity, v["top_pred"] / v["count"])

... (read more)

We Found An Neuron in GPT-2

aaronsnoswell2y10

1Joseph Miller2y

Hi! For each token prediction we record the activation of the neuron and whether on not " an" has a greater logit than any other token (if it was the top prediction). We group the activations into buckets of width 0.2. For each bucket we plot Number of times ‘‘ an" was the top prediction (for activations in this bucket)Number of activations in this bucket Does that clarify things for you?

LESSWRONG
LW

All of aaronsnoswell's Comments + Replies