Lukas Petersson

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

This is probably not the first barrier to getting into evals, but I have an AI safety startup that designs evals. However, we don't have the capacity to also do good elicitation. I think we lose a lot of signal from our evals because our agent is too weak to explore properly. We're currently using Inspect's basic_agent. Metr's modular_public is better, but we prefer inspect over vivaria otherwise. I think open-sourcing a better agent would be positive for the evals community without contributing to capabilities.