Great post; a few short comments:
Closing the action loop of active inference
There is a sense in which this loop is already closed - the sensory interface for an LLM is a discrete space of size context window x vocabulary that it observes and acts upon. The environment is whatever else writes to this space, e.g., a human interlocutor. This description contains the necessary variables and dependencies to get an action-perception loop off the ground. One caveat is that action-perception loops usually have actions that influence the environment to generat...
Interesting, iterative attention mechanisms had always reminded me of predictive coding, where cross-attention encodes a kind of prediction error between the latent and data. But I could also see how self-attention could be read as a type of prediction error between tokens {0,...,n} and {1,...,n+1}.
There is some work comparing residual connections and iterative inference that may be of relevance; they show that such architectures "naturally encourage features to move along the negative gradient of loss during the feedforward phase", I ... (read more)