We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Full paper available here: https://cdn.openai.com/papers/gpt-4.pdf
Why doesn't it improve on AP English Literature and AP English Language?
I think performance on AP english might be a quirk of how they dealt with dataset contamination. English and Literature exams showed anomalous amount of contamination (lots of the famous texts are online and referenced elsewhere) so they threw out most of the questions, leading to a null conclusion about performance.