AI researchers and others are increasingly looking for an introduction to the alignment problem that is clearly written, credible, and supported by evidence and real examples. The Wikipedia article on AI Alignment has become such an introduction.
Link: https://en.wikipedia.org/wiki/AI_alignment
Aside from me, it has contributions from Mantas Mazeika, Gavin Leech, Richard Ngo, Thomas Woodside (CAIS), Sidney Hough (CAIS), other Wikipedia contributors, and copy editor Amber Ace. It also had extensive feedback from this community.
In the last month, it had ~20k unique readers and was cited by Yoshua Bengio.
We've tried hard to keep the article accessible for non-technical readers while also making sense to AI researchers.
I think Wikipedia is a useful format because it can include videos and illustrations (unlike papers) and it is more credible than blog posts. However, Wikipedia has strict rules and could be changed by anyone.
Note that we've announced this effort on the Wikipedia talk page and shared public drafts to let other editors give feedback and contribute.
I you edit the article, please keep in mind Wikipedia's rules, use reliable sources, and consider that we've worked hard to keep it concise because most Wikipedia readers spend <1 minute on the page. For the latter goal, it helps to focus on edits that reduce or don't increase length. To give feedback, feel free to post on the talk page or message me. Translations would likely be impactful.
Yeah, generally when competent people hear a new word (e.g. AI Alignment, Effective Altruism, etc), they go to wikipedia to get a first impression overview of what it's all about.
When you look at it like that, lots of pages e.g. Nick Bostrom and Effective Altruism, seem to have been surprisingly efficiently vandalized to inoculate new people against longtermism and EA, whereas Eliezer Yudkowsky and MIRI are basically fine.
EDIT: I didn't mean to imply anything against Yud or MIRI here, I was being absentminded, and if I was paying more attention to that sort of thing at the time I wrote that, I would have went and found a non-Yud third example of a wikipedia article that was fine (which is most wikipedia articles). In fact, I strongly think that if Yud and MIRI are being hated on by the forces of evil, people should mitigate/reject that by supporting them, and label/remember the people who personally gained status by hopping on the hate train.
Likely referring to the "Racist e-mail controversy" section on Bostrom and the pervasive FTX and Bankman-Fried references throughout the EA article.