Stronger than human artificial intelligence would be dangerous to humanity. It is vital any such intelligence’s goals are aligned with humanity's goals. Maximizing the chance that this happens is a difficult, important and under-studied problem.
To encourage more and better work on this important problem, we (Zvi Mowshowitz and Vladimir Slepnev) are announcing a $5000 prize for publicly posted work advancing understanding of AI alignment, funded by Paul Christiano.
This prize will be awarded based on entries gathered over the next two months. If the prize is successful, we will award further prizes in the future.
The prize is not backed by or affiliated with any organization.
Rules
Your entry must be published online for the first time between November 3 and December 31, 2017, and contain novel ideas about AI alignment. Entries have no minimum or maximum size. Important ideas can be short!
Your entry must be written by you, and submitted before 9pm Pacific Time on December 31, 2017. Submit your entries either as links in the comments to this post, or by email to apply@ai-alignment.com. We may provide feedback on early entries to allow improvement.
We will award $5000 to between one and five winners. The first place winner will get at least $2500. The second place winner will get at least $1000. Other winners will get at least $500.
Entries will be judged subjectively. Final judgment will be by Paul Christiano. Prizes will be awarded on or before January 15, 2018.
What kind of work are we looking for?
AI Alignment focuses on ways to ensure that future smarter than human intelligence will have goals aligned with the goals of humanity. Many approaches to AI Alignment deserve attention. This includes technical and philosophical topics, as well as strategic research about related social, economic or political issues. A non-exhaustive list of technical and other topics can be found here.
We are not interested in research dealing with the dangers of existing machine learning systems commonly called AI that do not have smarter than human intelligence. These concerns are also understudied, but are not the subject of this prize except in the context of future smarter than human intelligence. We are also not interested in general AI research. We care about AI alignment, which may or may not also advance the cause of general AI research.
(Addendum: the results of the prize and the rules for the next round have now been announced.)
You don't mention decision theory in your list of topics, but I guess it doesn't hurt to try.
I have thought a bit about what one might call the "implementation problem of decision theory". Let's say you believe that some theory of rational decision making, e.g., evidential or updateless decision theory, is the right one for an AI to use. How would you design an AI to behave in accordance with such a normative theory? Conversely, if you just go ahead and build a system in some existing framework, how would that AI behave in Newcomb-like problems?
There are two pieces that I uploaded/finished on this topic in November and December. The first is a blog post noting that futarchy-type architectures would, per default, implement evidential decision theory. The second is a draft titled "Approval-directed agency and the decision theory of Newcomb-like problems".
For anyone who's interested in this topic, here are some other related papers and blog posts:
So far, my research and the papers by others I linked have focused on classic Newcomb-like problems. One could also discuss how existing AI paradigms related to other issues of naturalized agency, in particular self-locating beliefs and naturalized induction, though here it seems more as though existing frameworks just lead to really messy behavior.
Send comments to firstnameDOTlastnameATfoundational-researchDOTorg. (Of course, you can also comment here or send you a LW PM.)
Caspar, thanks for the amazing entry! Acknowledged.