We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Full paper available here: https://cdn.openai.com/papers/gpt-4.pdf
I want to highlight that in addition to the main 98-page "Technical Report," OpenAI also released a 60-page "System Card" that seems to "highlight safety challenges" and describe OpenAI's safety processes. Edit: as @vladimir_nesov points out, the System Card is also duplicated in the Technical Report starting on page 39 (which is pretty confusing IMO).
I haven't gone through it all, but one part that caught my eye from section 2.9 "Potential for Risky Emergent Behaviors" (page 14) and shows some potentially good cross-organizational cooperation:
It's all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC's evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved -- and they fully intend to do so.