All of Kallistos's Comments + Replies

Oh, yes you're right...hmmm. Unsure what to do about that, I'd love a neat graphical solution. Wasn't quite sure how to represent repeating units without just duplicating subdiagrams!

Many thanks! I have residuals going from before to after the MHA block, and from before to after MLP+FFNN. Should I have others in other places? 

edit: Those links are super!

2mishka
Ah, it's mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of f(g(h... (x))), so it de-emphasizes the fact that each of these Transformer Block transformations is shaped like x=x+function(x))