We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Full paper available here: https://cdn.openai.com/papers/gpt-4.pdf
It seems pretty unfortunate to me that ARC wasn't given fine-tuning access here, as I think it pretty substantially undercuts the validity of their survive and spread eval. From the text you quote it seems like they're at least going to work on giving them fine-tuning access in the future, though it seems pretty sad to me for that to happen post-launch.
More on this from the paper: