Something I believe could also be helpful is to have a non-archival peer review system that helps improve the quality of safety writings or publications; and optionally makes a readable blog post etc.
LessWrong/Alignment forum essentially has this for users with >100 karma. If you have a draft you can click on the feedback button and ask for this kind of feedback.
Hey Antb, I'm Pablo, Spanish and based in Spain. As far as I know these are the following AI Safety researchers:
For the record, I think Jose Orallo (and his lab), in Valencia, Spain and CSER, Cambridge, is quite interested in this same exact topics (evaluation of AI models, specifically towards safety). Jose is a really good researcher, part of the FLI existential risk faculty community, and has previously organised AI Safety conferences. Perhaps it would be interesting for you to get to know each other.
Ok, so perhaps: specific tips on how to become a distiller: https://www.lesswrong.com/posts/zo9zKcz47JxDErFzQ/call-for-distillers In particular:
Are there examples or best practices you would recommend for this?
I think value learning might be causal because human preferences cannot be observed, and therefore can act as a confounder, similar to the work in
Zhang, J., Kumor, D., Bareinboim, E. Causal Imitation Learning with Unobserved Confounders. In Advances in Neural Information Processing Systems 2020.
At least that was one of my motivations.
I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science." You have no data!
Sure, I agree. I think I was quite inaccurat...
While I enjoyed this post, I wanted to indicate a couple of reasons why you may want to instead stay in academia or industry, rather than being an independent researcher:
Now, many will argue that Safety is still preparadigmatic and consequently there might be contributions that do not really fit well into standard academic journ...
Hi Ilya! Thanks a lot for commenting :)
(a) I think "causal representation learning" is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.
Yes, you're right. I had found this, and other reviews by similar authors. In this one, I was mostly thinking of section VI (Learning causal variables) and its applications to RL (section VII-E). Perhaps section V on causal discovery is also relevant.
...(b) I would try to read "classical causal inference" stu
Hey Koen, Thanks a lot for the pointers! The literature I am most aware of are https://crl.causalai.net/, https://githubmemory.com/repo/zhijing-jin/Causality4NLP_Papers and Bernhard Scholkopf's webpage
Alternatively you may want to join here: https://join.slack.com/t/ai-alignment/shared_invite/zt-fkgwbd2b-kK50z~BbVclOZMM9UP44gw
My prediction. Some comments
The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.