I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)
Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., "philosophy"). I am inclined towards more mainstream/technical works that I can apply to the models we already have. I like capability research, if that matters.
The fields which remind me of alignment:
interpretability
AI ethics
privacy
fairness and biases
out-of-distribution generalization
robustness to adversarial attacks
P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don't already know the elephant in the room.
P.P.S: I assume I can't apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.
I have recently started working towards a research-focused MSc in AI under the supervision of Dr. Pilehvar and Dr. Soleymani at Sharif University of Technology. (CV, LinkedIn)
Since we essentially get no funding here in Iran, I have ample freedom in choosing what research topics to pursue. What are some interesting papers that are alignment-adjacent in the NLP/transformer space that I can read? I have taken a cursory look at the previously aggregated resources, but the papers there pattern-matched in my brain with non-concrete abstractions that speak in hypotheses (i.e., "philosophy"). I am inclined towards more mainstream/technical works that I can apply to the models we already have. I like capability research, if that matters.
The fields which remind me of alignment:
P.S.: Recent events in Iran might have created novel opportunities for effective altruism. Please take a moment to review the situation if you don't already know the elephant in the room.
P.P.S: I assume I can't apply for any grants because I am an Iranian and thus under sanctions. If you know otherwise, please let me know.
Related:
TAI Safety Bibliography
AI safety starter pack - EA Forum
Resources I send to AI researchers about AI safety - LessWrong
2021 AI Alignment Literature Review and Charity Comparison - LessWrong 2.0 viewer
AI Alignment Research Overview (Oct 2019) by Jacob Steinhardt
AI Alignment Curriculum --- AGI Safety Fundamentals
colah's blog
Information-Theoretic Probing with Minimum Description Length