yihe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Induction heads - illustrated

yihe4mo20

Thank you so much for the post! I'm starting to get a sense of induction heads.

Probably an unrelated question - can a single attention head store multiple orthogonal information? For example, in this post, the layer 0 may store the information "I follow 'D'". Can it also store information like "I am a noun"?

Or, to put it another way, should an attention head have a single, dedicated functionality?

Reply