This is probably not the first barrier to getting into evals, but I have an AI safety startup that designs evals. However, we don't have the capacity to also do good elicitation. I think we lose a lot of signal from our evals because our agent is too weak to explore properly. We're currently using Inspect's basic_agent
. Metr's modular_public
is better, but we prefer inspect
over vivaria
otherwise. I think open-sourcing a better agent would be positive for the evals community without contributing to capabilities.
As someone with very little working knowledge of evals, I think the following open-source resources would be useful for pedagogy
Maybe similar in style to https://www.neelnanda.io/mechanistic-interpretability/quickstart
It's also hard to understate the importance of tooling that is:
I suspect TransformerLens + associated Colab walkthroughs has had a huge impact in popularising mechanistic interpretability.
Hi there! I'm Ameya, currently at the University of Tübingen. I share similar broad interests and am particularly enthusiastic about working on evaluations. Would love to be a part of broader evals group if any created (slack/discord)!
We organized an evals workshop recently! It had a broader focus and wasn't specifically related to AI safety, but it was a great experience -- we are planning to keep running more iterations of it and sharpen focus.
I want to make a serious effort to create a bigger evals field. I’m very interested in which resources you think would be most helpful. I’m also looking for others to contribute and potentially some funding.
I wrote a more detailed post here (relevant parts copied below) but I’m primarily interested in other people’s ideas and preferences.
##########################################################
Future plans and missing resources
Note: the following is largely “copy-paste what Neel Nanda did for the mechanistic interpretability community but for evals”. I think his work was great for the field, so why not give it a shot?
Broadly speaking, I want to make it easy and attractive for people to get involved with evals.
I think the following resources would be good:
If you’re keen to be involved in any of the above, please reach out. In case you want to spend a few months on producing evals materials, I might be able to find funding for it (but no promises). If you’re a funder and want to support these kinds of efforts, please contact me. I would only serve as a regrantor and not take any cut myself.