We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Full paper available here: https://cdn.openai.com/papers/gpt-4.pdf
They tried to detect and prevent questions appearing in the training set being asked as part of the tests. It didn’t seem to make much difference. See table 10, “contamination data for exams”. It’s a pretty tiny fraction of the data, and removing it didn’t make much difference.