Any computer program can be presented in the form of an equation. Specifically, you define a function named step
such that step (s, input) = (s2, output) where s and s2 are "states", i.e., mathematical representations of the RAM, cache and registers.
To run the computer program, you apply step
to some starting state, yielding (s2, output), then you apply step to s2, yielding (s3, output2), then apply step to s3, and so on for billions of repetitions.
Another reply to your question asserts that equations cannot handle non-determinism. Untrue. To handle it, all we need to do is add another argument to step, rand
say, that describes the non-deterministic influences on the program. This is routinely done in formalisms for modelling causality, e.g., the structural equation models used in economics.
So, in summary, your question has some implicit assumptions that would need to be made explicit before I can answer.
Depends on what you include in the definition of LLM. NN itself? Sure, it can. With the caveat of hardware and software limitations - we aren't dealing with EXACT math here, floating points operations rounding, non-deterministic order of completion in parallel computation will also introduce slight differences from run to run even though the underlying math would stay the same.
The system that preprocess information, feeds into the NN and postprocess NN output into readable form? That is trickier, given that these usually involve some form of randomness, otherwise LLM output would be exactly the same, given exactly the same inputs and that generally is frowned upon, not very AI-like behavior. But if the system uses pseudo-random generators for that - those also can be described in math terms, if you know the random generator seed.
If they use non-deterministic source for their randomness - no. But that is rarely required and makes system really difficult to debug, so I doubt it.
Can an arbitrary LLM (or LxM) be presented in the form of an equation? I realised it would need to be some crazy big equation with billions of parameters, but is it theoretically possible? The way I see it, the weights are static once the model is trained so why not