🥞 Apply now! (First step takes < 15 min if you have a résumé ready.) What is Ashgro? https://www.ashgro.org/ Ashgro helps AI safety projects focus on AI safety. We offer fiscal sponsorship to AI safety projects, saving them time and allowing them to access more funding. We save them time...
This is the process I used for maintaining the Predicted AI alignment event/meeting calendar. I shared it with Linda Linsefors, since she wanted to create an AI Safety Google Calendar. And I think it might be useful for others, too, so I'm sharing it here. I did the following every...
If you strap this device to your forearm, it vibrates when you raise your arm to touch your face. Useful? Not sure, because one gets many false positives. Cheap? Yes. Applicable everywhere? Yes. Update 2020-03-19: I've been using it all day while working at my desk and it has warned...
This diagram summarizes the requirements for independent AI alignment research and how they are connected. In this post I'll outline my four-year-long attempt at becoming an AI alignment researcher. It's an ‘I did X [including what I did wrong], and here's how it went’ post (see also jefftk's More writeups!)....
Paul Christiano published his and Buck Shlegeris' implementation at https://github.com/paulfchristiano/amplification. It's the code behind the article Supervising strong learners by amplifying weak experts. With William Saunders' permission, I published a version modified by him and later me: https://github.com/rmoehn/amplification This one has changes and more documentation that allow you to run...
Link: https://github.com/rmoehn/farlamp/blob/master/build/tiny-supfail.pdf This is the first experiment report of my AI alignment research project. A reminder what it is about: I'm studying the impact of overseer failure on RL-based IDA, because I want to know under what conditions amplification and distillation increase or decrease the failure rate, in order to...
I came up with these research project definitions when I read the iterated amplification sequence. Last year I put five of them up for voting (see Which of these five AI alignment research projects ideas are no good?) and chose no. 23 to work on (see IDA with RL and...