Without a detailed Model Card for 4o it is impossible to know " for sure" why models drift in performance over time, but drift they do.
It is entirely possible that Open-AI started with a version of GPT-4 Turbo, parallelize processing and performed an extensive "fine tune" to improve the multi-modal capabilities.
Essentially, the model could "forget" how to complete prompuppies. Workhfrom just a week ago, because some of its "memory" was over-written with instructions to complete requests for multi-modal replies.
Without a detailed Model Card for 4o it is impossible to know " for sure" why models drift in performance over time, but drift they do.
It is entirely possible that Open-AI started with a version of GPT-4 Turbo, parallelize processing and performed an extensive "fine tune" to improve the multi-modal capabilities.
Essentially, the model could "forget" how to complete prompuppies. Workhfrom just a week ago, because some of its "memory" was over-written with instructions to complete requests for multi-modal replies.