All of Thomas Broadley's Comments + Replies

I neglected to update my comment here -- the agent I built for this replication is now publicly available as part of the METR task workbench, here: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf

That's me on the bass! Thank you for hosting, it was really fun to jam with everyone.

3jefftk
Thanks for coming!

Yeah, I definitely could! It's on my to-do list. I'll let you know when I complete it.

2Daniel Kokotajlo
Yay! Thanks in advance!

Thank you! No, I'm not building custom prompts for the different tasks. I wrote a single prompt template -- the only difference between runs is the task description, which gets plugged into the template. I think ARC Evals did the same thing.

I have been improving the prompt as I worked through the tasks. I probably spent 2-3 hours working on the prompt to try and improve the agent's performance on some tasks. I'll definitely rerun all the tasks with the current version of my prompt, just to check that it can still perform the easier tasks.

You're right that ... (read more)

Thank you for the kind comment! You have lots of good ideas for how to improve this. I especially like the idea of testing with different cloud providers. I could add programming languages in there: Maybe GPT-4 is better at writing Node.JS than Python (the language I prompted it to use).

I agree, a fully reproducible version would have benefits. Differences in prompt quality between evaluations is a problem.

Also agreed that it's important to allow the agent to try and complete the tasks without assistance. I did that for this reproduction. The only changes ... (read more)

2aog
I wouldn't recommend open sourcing any state of the art LLM agents. But if you open source the evaluation, that would provide most of the benefits (letting labs evaluate their models on your benchmark, helping people build new benchmarks, allowing researchers to build safer agents which reject dangerous actions) while avoiding the capabilities externalities of open sourcing a SOTA language agent.

EDIT: The agent I built for this replication is now publicly available as part of the METR task workbench: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf

I'm torn! I think that better LLM scaffolding accelerates capabilities as much as it accelerates alignment. On the other hand, a programmer (or a non-programmer with help from ChatGPT) could easily reproduce my current scaffolding code. Maybe open-sourcing the current state of the project is fine. What do you think?

2jacquesthibs
At the very least, would you be happy to share the code with alignment researchers interested in using it for our experiments?
2[anonymous]
Which is not good enough. We need alignment to accelerate faster than capabilities in order to catch up.
4Stephen Fowler
I believe you should err on the side of not releasing it.
8Tao Lin
I do think open sourcing is better, because there already was a lot of public attention and results on llm capabilities which are messy and misleading, and open sourcing one eval like this might improve our understanding a lot.  Also, there are tons of llm agent projects/startups trying to build hype, so if you drop a benchmark here you are unlikely to attract unwanted attention (i'm guessing).  I largely agree with https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model
6Gurkenglas
If it is twice as easy, that halves the positives of open-sourcing and the negatives, it doesn't change the direction. Beware the Unilateralist's Curse.
2Ethan Mendes
I think open-sourcing the current state of the project would be very useful to researchers.

since private goods are non-rival it is efficient to exclude consumers who aren't willing to pay

 

Should this be, "since private goods are rival it is efficient..."?

2moyamo
Yes. Thanks. Fixed

Here is a submission: https://ai-safety-conversational-agent.thomasbroadley.com

Source code here: https://github.com/tbroadley/ai-safety-conversational-agent

I followed @Max H's suggestion of using chat-langchain. To start, I created an embedding based on the articles from https://aisafety.info and have the submission using that embedding.

I'll get in touch with Stampy about working on their conversational agent.