Sounds really exciting! I'm wondering which kind of theoretical computer science do you have in mind specifically? Like which part of that do you think has the most uses for alignment? (Still trying to find a way to use my PhD in the theory of distributed computing for something alignment related ^^)
What do you see as central examples of work on "alignment forecasting"? I'm unclear on what that means.
(Cross-post from ai-alignment.com)
I’m now working full-time on the Alignment Research Center (ARC), a new non-profit focused on intent alignment research.
I left OpenAI at the end of January and I’ve spent the last few months planning, doing some theoretical research, doing some logistical set-up, and taking time off.
For now it’s just me, focusing on theoretical research. I’m currently feeling pretty optimistic about this work: I think there’s a good chance that it will yield big alignment improvements within the next few years, and a good chance that those improvements will be integrated into practice at leading ML labs.
My current goal is to build a small team working productively on theory. I’m not yet sure how we’ll approach hiring, but if you’re potentially interested in joining you can fill out this tiny form to get notified when we’re ready.
Over the medium term (and maybe starting quite soon) I also expect to implement and study techniques that emerge from theoretical work, to help ML labs adopt alignment techniques, and to work on alignment forecasting and strategy.