User Comment Replies

Scoring forecasts from the 2016 “Expert Survey on Progress in AI”

Kind of moot now but she gave it a good '5' out of 9 (= a strong pass grade for UK students aged 14-16 years old).

Grammar and argument structure is excellent - the essay flows well with a coherent introduction and conclusion based on the text's body. The student seems to understand the topic well (though obv I have no idea about it) and addresses a few different aspects of the industry.

The essay misses the opportunity for deeper back-and-forth critical analysis, although there is evidence of this being attempted. And it could go deeper into the environment

PatrickL2y30

I don't think this is good enough evidence - it seems quite likely that the student used ChatGPT somewhat but did not leave it to output the entire essay. But I checked how it would change the Brier scores and if you believe that existing AI does, in fact, meet this milestone, it would result in scores of Experts: 0.21, Bulls: 0.27, Bears: 0.28.

I have asked the following to ChatGPT and sent it to my high-school history-teaching friend to see how she would mark it!

You are the highest-scoring student in a high school history class. Please write y... (read more)

1PatrickL2y

Kind of moot now but she gave it a good '5' out of 9 (= a strong pass grade for UK students aged 14-16 years old). So not quite a 'high mark'. This doesn't necessarily mean GPT3.5 definitely doesn't meet the criteria - just one example of it not. Thanks Zoe! I think GPT-4 meets the criteria but I haven't looked in-depth or updated this post to include GPT-4's capabilities (although I think it would be reasonable to do so because GPT-4's capabilities seem fully developed by Feb 2023) - because it goes against my criteria (and because I don't plan to spend the time to do it). I'll probably update it in a year though!

Scoring forecasts from the 2016 “Expert Survey on Progress in AI”

PatrickL2y*10

I couldn't find this done and think, by now, someone would have submitted a fully ChatGPT-generated high school essay and talked about it publicly if it had gotten high marks. I've seen some evidence of cherry-picking paragraphs leading to a mid/low-level, e.g. this article describes someone who got a passing mark (53) on a university social policy essay. Do you have a link in mind for Bing getting mid-level grades?

This high school teacher judged two ChatGPT-generated history essays as “below average, scoring a 9/20 or lower”. This Guardian ... (read more)

4SandXbox2y

https://www.nytimes.com/2023/01/16/technology/chatgpt-artificial-intelligence-universities.html

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

PatrickL2y*110

I'm working on a research project at Rethink Priorities on this topic; whether and how to use bug bounties for advanced ML systems. I think your tl;dr is probably right - although I have a few questions I'm planning to get better answers to in the next month before advocating/facilitating the creation of bounties in AI safety:

How subjective can prize criteria for AI safety bounties be, while still incentivizing good quality engagement?
- If prize criteria need high specificity, are we able to specify unsafe behaviour which is relevant to longterm AI safety (a

... (read more)

LESSWRONG
LW

All of PatrickL's Comments + Replies