Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
asher10

Oh shoot, yea. I'm probably just looking at the rotary embeddings, then. Forgot about that, thanks

asher10

I'm pretty confused; this doesn't seem to happen for any other models, and I can't think of a great explanation.
Has anyone investigated this further?
 

Here are graphs I made for GPT2, Mistral 7B, and Pythia 14M.
3 dimensions indeed explain almost all of the information in GPT's positional embeddings, whereas Mistral 7B and Pythia 14M both seem to make use of all the dimensions. 

 

[This comment is no longer endorsed by its author]Reply
asher10

Is all the money gone by now? I'd be very happy to take a bet if not.