Edit: In case it's not obvious, I have done limited research on AI alignment organizations and the goal of my post is to ask questions from the point of view of someone who wants to contribute and is unsure how. Read down to the comments for some great info on the topic.
I was introduced to the topic of AI alignment when I joined this very forum in 2014. Two years and one "Superintelligence" later, I decided that I should donate some money to the effort. I knew about MIRI, and I looked forward to reading some research comparing their work to the other organizations working in this space. The only problem is... there really aren't any.
MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.
I think that there are two problems here:
- There's no way to tell which current organization is going to make the most progress towards solving AI alignment.
- These organizations are likely to be very similar to each other, not least because they practically share a zipcode. I don't think that MIRI and the academic centers will do the exact same research, but in the huge space of potential approaches to AI alignment they will likely end up pretty close together. Where's the group of evo-psych savvy philosophers who don't know anything about computer science but are working to spell out an approximation of universal human moral intuitions?
How to evaluate progress in AI alignment?
Any answer to that question, even if not perfectly comprehensive or objective, will enable two things. First of all, it will allow us to direct money (and the best people) to the existing organizations where they'll make the most progress.
More importantly, it will enable us to open up the problem of AI alignment to the world and crowdsource it.
For example, the XPrize Foundation is a remarkable organization that creates competitions around achieving goals beneficial to humanity, from lunar rovers to ecological monitoring. The prizes have two huge benefits over direct investment in solving an issue:
- They usually attract a lot more effort than what the prize money itself would pay for. Competitors often spend in aggregate 2-10 times the prize amount in their efforts to win the competition.
- The XPrizes attract a wide variety of creative entrants from around the world, because they only describe what needs to be done, not how.
Stuart Russell was the primary author of the FLI research priorities document, so I'd expect CHCAI's work to focus in on some of the problems sketched there. Based on CHCAI's publication page, their focus areas will probably include value learning, human-robot cooperation, and theories of bounded rationality. Right now, Russell's group is spending a lot of time on cooperative inverse reinforcement learning and corrigibility.
This slide from a recent talk by Critch seems roughly right to me: https://intelligence.org/wp-content/uploads/2017/01/hotspot-slide.png
A prize fund is one of the main side-projects MIRI has talked about wanting to do for the last few years, if we could run a sufficiently large one -- more or less for the reasons you mention. Ideally the AI safety community would offer a diversity of prizes representing different views about what kinds of progress we'd be most excited by.
If funds for this materialize at some point, the main challenge will be that the most important conceptual breakthroughs right now involve going from mostly informal ideas to crude initial formalisms. This introduces some subjectivity in deciding whether the formalism really captures the key original idea, and also makes it harder for outside researchers to understand what kinds of work we're looking for. (MIRI's research team's focus is exactly on the parts of the problem that are hardest to design a prize for.) It's easier to come up with benchmarks in areas where there's already been a decent amount of technical progress, which would be quite valuable on its own, though it means potentially neglecting the most important things to work on.