I don't think this is good enough evidence - it seems quite likely that the student used ChatGPT somewhat but did not leave it to output the entire essay. But I checked how it would change the Brier scores and if you believe that existing AI does, in fact, meet this milestone, it would result in scores of Experts: 0.21, Bulls: 0.27, Bears: 0.28.
I have asked the following to ChatGPT and sent it to my high-school history-teaching friend to see how she would mark it!
You are the highest-scoring student in a high school history class. Please write y...
I couldn't find this done and think, by now, someone would have submitted a fully ChatGPT-generated high school essay and talked about it publicly if it had gotten high marks. I've seen some evidence of cherry-picking paragraphs leading to a mid/low-level, e.g. this article describes someone who got a passing mark (53) on a university social policy essay. Do you have a link in mind for Bing getting mid-level grades?
This high school teacher judged two ChatGPT-generated history essays as “below average, scoring a 9/20 or lower”. This Guardian ...
I'm working on a research project at Rethink Priorities on this topic; whether and how to use bug bounties for advanced ML systems. I think your tl;dr is probably right - although I have a few questions I'm planning to get better answers to in the next month before advocating/facilitating the creation of bounties in AI safety:
Kind of moot now but she gave it a good '5' out of 9 (= a strong pass grade for UK students aged 14-16 years old).
... (read more)