FireToDust
4
2
FireToDust has not written any posts yet.

FireToDust has not written any posts yet.

A bit late to the party. Love the article, but I believe it is somewhat misleading when you say that transformers run in constant time complexity.
If the number of tokens in the input sentence is the input size of its time complexity, which I'm sure you can agree is the obvious choice; The transformer encoder is run on each token in the sentence, in parallel if needed, but it still has to do all of its computations for each input token, immediately causing at least O(n) time.
I do think that the point you are trying to give is different though. Correct me if I'm wrong but it seems like you are saying... (read more)
Alright. Interested to see the new post. Your content is great btw.