All of Stanley Ihesiulo's Comments + Replies

I do not get your argument here, it doesn't track. I am not an expert in transformer systems or the in-depth architecture of LLMs, but I do know enough to make me feel that your argument is very off.

You argue that training is different from inference, as a part of your argument that LLM inference has a global plan. While training is different from inference, it feels to me that you may not have a clear idea as to how they are different.

You quote the accurate statement that "LLMs are produced by a relatively simple training process (minimizing loss on next-... (read more)

1Bill Benzon
Thanks for reminding me that training uses inference. As for ChatGPT having a global plan, as you can see if you look at the comments I've made earlier today, I have come around to that view. The people that wrote the stories ChatGPT consumed during training, they had plans, and those plans are reflected in the stories they wrote. That structure is “smeared” over all those parameters weights and gets “reconstructed” each time ChatGPT generates a new token. In his last book, The Computer and the Brain, John von Neumann noted, quite correctly, that each neuron is both a memory store and a processor. Subsequent research has made it clear that the brain stores specific things – objects, events, plans, whatever ¬– in populations of neurons, not individual neurons. These populations operate in parallel. We don’t yet have the luxury of such processors so we have to make do with programming a virtual neural net to run on a processor having way more memory units than processing units. And so our virtual machine has to visit each memory unit every time it makes one step in its virtual computation.