All of gideonite's Comments + Replies

gideonite-2-2

Likewise, LLMs are produced by a relatively simple training process (minimizing loss on next-token prediction, using a large training set from the internet, Github, Wikipedia etc.) but the resulting 175 billion parameter model is extremely inscrutable.

So the author is confusing the training process with the model. It’s like saying “although it may appear that humans are telling jokes and writing plays, all they are actually doing is optimizing for survival and reproduction”. This fallacy occurs throughout the paper.

 

The train/test framework is not hel... (read more)

4Bill Benzon
What I'm arguing is that what LLMs does go way beyond predicting the next word. That's just the proximal means to an end, which is a coherent statement.

We’re all more or less doing that when we speak or write, though there are times when we may set out to be deliberately surprising – but we can set such complications aside

 

We're all more or less doing that when we speak or write?

1Bill Benzon
This:

If you think of the LLM as a complex dynamical system, then the trajectory is a valley in the system’s attractor landscape.

 

The real argument here is that you can construct simple dynamical systems, in the sense that the equation is quite simple, that have complex behavior. For example, the Lorenz system though there should be an even more simple example of say, ergodic behavior.

1Bill Benzon
When was the last time someone used the Lorenz system to define justice?