Why Is No One Trying To Align Profit Incentives With Alignment Research?
A whole lot of Alignment work seems to be resource-constrained. Many funders have talked about how they were only able to give grants to a small percentage of projects and work they found promising. Many researchers also receive a small fraction of what they could make in the for-profit sector (Netflix recently offered $900k for an ML position). The pipeline of recruiting talent, training, and hiring could be greatly accelerated if it wasn’t contingent on continuing to receive nonprofit donations. Possible Ideas AI Auditing Companies We’ve already seen a bit of this with ARC’s eval of GPT4, but why isn’t there more of this? Many companies will/are training their own models, or else using existing models in a way beyond what they were intended. Even starting with non-cutting-edge models could provide insight and train people to have the proper Security Mindset and understanding to audit larger ones. Furthermore, there has been a push to regulate and make this a required practice. The possibility of this regulation being made into law will likely be contingent on the infrastructure for it already existing. And it makes sense to take action toward this now, if we want those auditing teams to be as useful as possible, and not merely satisfy a governmental requirement. Existential concerns would also be taken more seriously by a company that has already built a reputation for auditing models. Evals reporting Companies don’t want to see their models doing things that weren’t intended (example, giving people credit card information, as was just recently demonstrated). And as time goes on, companies will want some way of showcasing their models have been rigorously tested. Audit reports covering a large, diverse set of vulnerabilities is something many will probably want. Red teaming Jailbreaking has been a common practice, done by a wide number of people after a model is released. Like an Evals Report, many will want a separate entity that can red team their
What does AI Safety currently need most that isn't being done?
I'd love to get as many takes from as many people as possible about what they think is most needed right now in AI Safety, but is not currently being done. I'm trying to determine what is best for me to work on.