Send us example gnarly bugs
Update: We are no longer accepting gnarly bug submissions. However, we are still accepting submissions for our Task Bounty! Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example. METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an agentic capabilities evaluation. To create these tasks, we’re seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won’t pay for submissions that don’t meet these requirements.) If we’re particularly excited about your submission, we may also be interested in purchasing IP rights to it. We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional types of tasks over the next few weeks. Criteria for submission: * Contains a bug that would take at least 6 hours for an experienced programmer to solve, and ideally >20hrs * More specifically, ">6 hours for a decent engineer who doesn't have context on this particular codebase". E.g. a randomly selected engineer who's paid $100-$200 per hour who's familiar with the language and overall stack that's being used, but not the person who wrote the code, and not an expert in the particular component that is causing the bug. * Ideally, has not been posted publicly in the past * (Though note that we may still accept submissions from public repositories given that they are not already in a SWE-bench dataset and meet the rest of our requirements. Check with us first.) * You have the legal right to share it with us (e.g. please don’t send us other people’s proprietary code or anything you signed an NDA about) * Ideally, the task should work well with static resources - e.g. you can have a local copy of the documentation for all the relevant libra
Maybe "accountability" is just the mixture of responsibility and dominance, and of those two I find dominance more motivating.