Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
danten31

GPT-4o... like Sora not that long ago, is a major triumph of the Sutskeverian vision of just scaling up sequence prediction for everything, and which OA has been researching for years 

 

Could you elaborate as to why you see GPT-4o as continuous with the scaling strategy? My understanding is this is a significantly smaller model than 4, designed to reduce latency and cost, which is then "compensated for" with multimodality and presumably many other improvements in architecture/data/etc.

Isn't GPT-4o a clear break (temporary, I assume) with the singular focus on scaling of the past few years?