redhatbluehat

Thank you for the post! Either to Neel, or to anyone else in the comments: what datasets have you found most useful for testing tiny image transformers?

The vit-pytorch repo uses the cats vs dogs repo from Kaggle but I'm wondering if this is too complex for the very simple image transformers.

Replying to200 COP in MI: The Case for Analysing Toy Language Models

redhatbluehat3y

200 COP in MI: The Case for Analysing Toy Language Models

Hi Neel! Thanks so much for all these online resources. I've been finding them really interesting and helpful.

I have a question about research methods. "How far can you get with really deeply reverse engineering a neuron in a 1 layer (1L) model? (solu-1l, solu-1l-pile or gelu-1l in TransformerLens)."

I've loaded up solu-1l in my Jupyter notebook but now feeling a bit lost. For your IOI tutorial, there was a very specific benchmark and error signal. However, when I'm just playing around with a model without a clear capability in mind, it's harder to know how to measure performance. I could make a list of capabilities/benchmarks, systematically run the model on them, and then... (read more)

LESSWRONG
LW

LESSWRONG
LW

redhatbluehat

redhatbluehat

redhatbluehat

redhatbluehat

redhatbluehat

redhatbluehat