We (Zvi Mowshowitz and Vladimir Slepnev) are happy to announce the results of the fourth round of the AI Alignment Prize, funded by Paul Christiano. From July 15 to December 31, 2018 we received 10 entries, and are awarding four prizes for a total of $20,000.
The winners
We are awarding two first prizes of $7,500 each. One of them goes to Alexander Turner for Penalizing Impact via Attainable Utility Preservation; the other goes to Abram Demski and Scott Garrabrant for the Embedded Agency sequence.
We are also awarding two second prizes of $2,500 each: to Ryan Carey for Addressing three problems with counterfactual corrigibility, and to Wei Dai for Three AI Safety Related Ideas and Two Neglected Problems in Human-AI Safety.
We will contact each winner by email to arrange transfer of money. Many thanks to everyone else who participated!
Moving on
This concludes the AI Alignment Prize for now. It has stimulated a lot of good work during its year-long run, but participation has been slowing down from round to round, and we don't think it's worth continuing in its current form.
Once again, we'd like to thank everyone who sent us articles! And special thanks to Ben and Oliver from the LW2.0 team for their enthusiasm and help.
Could there be some kind of mentorship incentive? Another problem at large in alignment research seems to be lack of mentors, since most of the people skilled enough to fill this role are desperately working against the clock. A naïve solution could be to offer a smaller prize to the mentor of a newer researcher if the newbie's submission details a significant amount of help on their part. Obviously, dishonest people could throw the name of their friend on the submission because "why not", but I'm not sure how serious this would be.
What would be nice would be some incentive for high quality mentorship / for bringing new people into the contest and research field, in a way that encourages the mentors to get their friends in the contest, even though that might end up increasing the amount of competition they have for their own proposal.
This might also modestly improve social incentives for mentors, since people like being associated with success and being seen as helpful / altruistic.
ETA: What about a flat prize (a few thousand dollars) you can only win once, but thence can mentor others and receive a slightly more modest sum for prizes they win? It might help kickstart people’s alignment careers if sufficiently selective / give them the confidence to continue work. Have to worry about the details for what counts as mentorship, depending on how cheaty we think people would try to be.