All of dhar174's Comments + Replies

You're missing the possibility that parameters during training were larger than models used for inference. It is common practice now to train large, then distill into a series of smaller models that can be used based on the task need.

To those that believe language models do not have internal representations of concepts:

I can help at least partially disprove the assumptions behind that.

There is convincing evidence otherwise, as demonstrated through an Othello in an actual experiment:

https://thegradient.pub/othello/ The researchers conclusion:

"Our experiment provides evidence supporting that these language models are developing world models and relying on the world model to generate sequences." )

1Bill Benzon
Thanks for this. I've read that piece and think it is interesting and important work. The concept of story trajectory that I am using plays a role in my thinking similar to the model of the Othello game board in your work.