- Until now ChatGPT dealt with audio through a pipeline of 3 models: audio transcription, then GPT-4, then text-to-speech. GPT-4o is apparently trained on text, voice and vision so that everything is done natively. You can now interrupt it mid-sentence.
- It has GPT-4 level intelligence according to benchmarks. 16-shot GPT-4o is somewhat better at transcription than Whisper (that's a weird comparison to make), and 1-shot GPT-4o is considerably better at vision than previous models.
- It's also somehow been made significantly faster at inference time. Might be mainly driven by an improved tokenizer. Edit: Nope, English tokenizer is only 1.1x.
- It's confirmed it was the "gpt2" model found at LMSys arena these past weeks, a marketing move. It has the highest ELO as of now.
- They'll be gradually releasing it for everyone, even free users.
- Safety-wise, they claim to have run it through their Preparedness framework and the red-team of external experts, but have published no reports on this. "For now", audio output is limited to a selection of preset voices (addressing audio impersonations).
- The demos during the livestream still seemed a bit clanky in my opinion. Still far from naturally integrating in normal human conversation, which is what they're moving towards.
- No competitor of Google search, as had been rumored.
I have just used it for coding for 3+ hours and found it quite frustrating. Definitely faster than GPT 4.0 but less capable. More like an improvement for 3.5. To me a seems a lot like LLM progress is plateauing.
Anyway in order to be significantly more useful a coding assistant needs to be able to see debug output, in mostly real time, have the ability to start/stop the program, automatically make changes, keep the user in the loop and read/use GUI as that is often an important part of what we are doing. I havn't used any LLM that are even low-average ability at debugging kind of thought processes yet.
Thanks!
Interesting. I see a lot of people reporting their coding experience improving compared to GPT-4, but it looks like this is not uniform, that experience differs for different people (perhaps, depending on what they are doing)...