Edit: In case it's not obvious, I have done limited research on AI alignment organizations and the goal of my post is to ask questions from the point of view of someone who wants to contribute and is unsure how. Read down to the comments for some great info on the topic.
I was introduced to the topic of AI alignment when I joined this very forum in 2014. Two years and one "Superintelligence" later, I decided that I should donate some money to the effort. I knew about MIRI, and I looked forward to reading some research comparing their work to the other organizations working in this space. The only problem is... there really aren't any.
MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.
I think that there are two problems here:
- There's no way to tell which current organization is going to make the most progress towards solving AI alignment.
- These organizations are likely to be very similar to each other, not least because they practically share a zipcode. I don't think that MIRI and the academic centers will do the exact same research, but in the huge space of potential approaches to AI alignment they will likely end up pretty close together. Where's the group of evo-psych savvy philosophers who don't know anything about computer science but are working to spell out an approximation of universal human moral intuitions?
How to evaluate progress in AI alignment?
Any answer to that question, even if not perfectly comprehensive or objective, will enable two things. First of all, it will allow us to direct money (and the best people) to the existing organizations where they'll make the most progress.
More importantly, it will enable us to open up the problem of AI alignment to the world and crowdsource it.
For example, the XPrize Foundation is a remarkable organization that creates competitions around achieving goals beneficial to humanity, from lunar rovers to ecological monitoring. The prizes have two huge benefits over direct investment in solving an issue:
- They usually attract a lot more effort than what the prize money itself would pay for. Competitors often spend in aggregate 2-10 times the prize amount in their efforts to win the competition.
- The XPrizes attract a wide variety of creative entrants from around the world, because they only describe what needs to be done, not how.
I came to a similar conclusion a while ago: it is hard to make progress in a complex technical field when progress itself is unmeasurable or worse ill-defined.
Part of the problem may be cultural: most working in the AI safety field have math or philosophy backgrounds. Progress in math and philosophy is intrinsically hard to measure objectively; success is mostly about having great breakthrough proofs/ideas/papers that are widely read and well regarded by peers. If your main objective is to convince the world, then this academic system works fine - ex: Bostrom. If your main objective is to actually build something, a different approach is perhaps warranted.
The engineering oriented branches of Academia (and I include comp sci in this) have a very different reward structure. You can publish to gain social status just as in math/philosophy, but if your idea also has commercial potential there is the powerful additional motivator of huge financial rewards. So naturally there is far more human intellectual capital going into comp sci than math, more into deep learning than AI safety.
In a sane world we'd realize that AI safety is a public good of immense value that probably requires large-scale coordination to steer the tech-economy towards solving. The X-prize approach essentially is to decompose a big long term goal into subgoals which are then contracted to the private sector.
The high level abstract goal for the Ansari XPrize was "to usher in a new era of private space travel". The specific derived prize subgoal was then "to build a reliable, reusable, privately financed, manned spaceship capable of carrying three people to 100 kilometers above the Earth's surface twice within two weeks".
AI safety is a huge bundle of ideas, but perhaps the essence could be distilled down to: "create powerful AI which continues to do good even after it can take over the world."
For the Ansari XPrize, the longer term goal of "space travel" led to the more tractable short term goal of "100 kilometers above the Earth's surface twice within two weeks". Likewise, we can replace "the world" in the AI safety example:
AI Safety "XPrize": create AI which can take over a sufficiently complex video game world but still tends to continue to do good according to a panel of human judges.
To be useful, the video game world should be complex in the right ways: it needs to have rich physics that agents can learn to control, it needs to permit/encourage competitive and cooperative strategic complexity similar to that in the real world, etc. So more complex than pac-man, but simpler than the Matrix. Something in the vein of a minecraft mod might have the right properties - but there are probably even more suitable open-world MMO games.
The other constraint on such a test is we want the AI to be superhuman in the video game world, but not our world (yet). Clearly this is possible - ala AlphaGo. But naturally the more complex the video game world is in the direction of our world, both the harder the goal becomes and the more dangerous.
Note also that the AI should not know that it is being tested; it shall not know it inhabits a simulation. This isn't likely to be any sort of problem for the AI we can actually build and test in the near future, but it becomes an interesting issue later on.
DeepMind is now focusing on Starcraft, OpenAI has universe, so we already on a related path. Competent AI for open-ended 3D worlds with complex physics - like minecraft - is still not quite here, but is probably realizable in just a few years.