Review
Winner = first correct solution, or winner = best / highest-quality solution over what time period?
Winner = highest-quality solution over the time period of a month (solutions get posted at the start of the next month, along with a new problem).
Note that we're slightly de-emphasising the competition side now that there are occasional hints which get dropped during the month in the Slack group. I'll still credit the best solution in the Slack group & next LW post, but the choice to drop hints was to make the problem more accessible and hopefully increase the overall reach of this series.
I'm writing this post to discuss solutions to the October challenge, and present the challenge for this November.
If you've not read the first post in this sequence, I'd recommend starting there - it outlines the purpose behind these challenges, and recommended prerequisite material.
November Problem
The problem for this month is interpreting a model which has been trained to classify the cumulative sum of a sequence.
The model is fed sequences of integers, and is trained to classify the cumulative sum at a given sequence position. There are 3 possible classifications:
For example, if the sequence is:
Then the classifications would be:
The model is not attention only. It has one attention layer with a single head, and one MLP layer. It does not have layernorm at the end of the model. It was trained with weight decay, and an Adam optimizer with linearly decaying learning rate.
I don't expect this problem to be as difficult as some of the others in this sequence, however the presence of MLPs does provide a different kind of challenge.
You can find more details on the Streamlit page. Feel free to reach out if you have any questions!
October Problem - Solutions
In the second half of the sequence, the attention heads perform the algorithm "attend back to (and copy) the first token which is larger than me". For example, in a sequence like:
we would have the second 3 token attending back to the first 5 token (because it's the first one that's larger than itself), the second 5 attending back to 7, etc. The SEP token just attends to the smallest token.
Some more refinements to this basic idea:
x < y < z
where the three numbers are close together,x
will often attend toz
rather than toy
. So why isn't this an adversarial example, i.e. why does the model still correctly predicty
followsx
?s
, we also boost things slightly less thns
, and suppress things slightly more thans
.x < y < z
, we have:y
will boosty
a lot, and suppressz
a bit.z
will boostz
a lot, and boosty
a bit.z
gets slightly more attention thany
, it might still be the case thaty
gets predicted with higher probability.Best Submissions
We received more submissions for this month's problem than any other in the history of the series, so thanks to everyone who attempted! The best solution to this problem was by Vlad K, who correctly identified the model's tendency to produce unexpected attention patterns when 3 numbers are close together, and figured out how the model manages to produce correct classifications anyway.
Best of luck for this and future challenges!