This is a linkpost for https://arxiv.org/abs/2405.17394
Whoever (strong?) downvoted this, I'm genuinely curious about the reasoning behind. Especially since my previous linkpost also got downvoted down to negative numbers at some point, for reasons unknown to me.
Huh, that would seem a bit unnuanced to me, especially since I mentioned weak forward passes; though maybe I should add more context, e.g. this comment / the whole thread.
Thanks, might try to edit with a bit more context!
Paper authors: Yash Sarrof, Yana Veitsman, Michael Hahn.
Context: architectures with weak forward passes can be differentially transparent; see e.g. this comment / the whole thread and research agendas like externalized reasoning or the translucent thought hypothesis.
Summary thread: https://x.com/yashYRS/status/1795340993757352402. Summary of the summary thread: like transformers, which have weak forward passes, SSMs are also in the TC0 computational complexity class, 'but cover distinct fragments within it'. 'SSMs can track hierarchical structures with optimal memory [...] suggesting that SSMs, while being more parallellizable, maintain sufficient power to handle the hierarchical structure of language.'
Abstract: