Though they're both somewhat outdated at this point, there are certainly still some interesting concrete experiment ideas to be found in my “Towards an empirical investigation of inner alignment” and “Concrete experiments in inner alignment.”
I wrote a research agenda that suggests additional work to be done and that I'm not doing.
https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1
I co-authored a paper suggesting that we take advantage of AI's superhuman abilities in chess to create trustworthy and untrustworthy chess oracles to help develop strategies for dealing with possibly unfriendly oracles. https://arxiv.org/abs/2010.02911
We have developed AI Safety Ideas which is a collaborative AI safety research platform with a lot of research project ideas.
Thanks Aryeh for collecting these! I added them to a new Project Ideas section in my AI Safety Resources list.
Can we compile a list of good project ideas related to AI safety that people can work on? There are occasions at work when I have the opportunity to propose interesting project ideas for potential funding, and it would be really useful if there was somewhere I could look for projects that people here would really like someone to work on, even if they themselves don't have the time or resources to do so. I also keep meeting people who are searching for useful alignment-related projects they can work on for school, work, or as personal projects, and I think a list of project ideas might be helpful for them as well.
I'm particularly interested in project ideas that are currently not being worked on (to your knowledge) but where it would be great if someone would take up that project. Or alternatively, project ideas that are currently being worked on but where there are variations on those ideas that nobody has yet attempted but someone should.
Occasionally someone will post an idea or set of ideas on the Alignment Forum, for example Ajeya Cotra's "sandwiching" idea or the recent list of ideas from Stuart Armstrong and Owain Evans. I also sometimes come across ideas mentioned towards the end of a paper or buried somewhere in a research agenda. But I think having a larger list somewhere could be really useful.
(Note: I am not looking for lists of open problems, challenges, or very general research directions. I'm looking for suggestions that at least point towards a concrete project idea, and where an individual or small team might be able to produce useful results given current technology and with sufficient time and resources.)
Please post ideas or links / references to published ideas in the comments, if you know of any. Ideas mentioned as part of a larger post or paper would count, but please point to the section where the idea is mentioned.
If I get enough links or references maybe I'll try to compile a list that others can use.