EleutherAI's #alignment
channels are good to ask questions in. Some specific answers
I understand that a reward maximiser would wire-head (take control over the reward provision mechanism), but I don’t see why training an RL agent would necessarily end up in a reward-maximising agent? Turntrout’s Reward is Not the Optimisation Target shed some clarity on this, but I definitely have remaining questions.
Leo Gao's Toward Deconfusing Wireheading and Reward Maximization sheds some light on this.
I agree with this suggestion. EleutherAI's alignment channels have been invaluable for my understanding of the alignment problem. I typically get insightful responses and explanations on the same day as posting. I've also been able to answer other folks' questions to deepen my inside view.
There is a alignment-beginners
channel and a alignment-general
channel. Your questions seem similar to what I see in alignment-general
. For example, I received helpful answers when I asked this question about inverse reinforcement learning there yesterday.
Question: When I read Human Compatible a while back, I had the takeaway that Stuart Russel was very bullish on Inverse Reinforcement Learning being an important alignment research direction. However, I don’t see much mention of IRL on EleutherAI and the alignment forum. I see much more content about RLHF. Is IRL and RLHF the same thing? If not, what are folks’ thoughts on IRL?
Hey, this is me. I’d like to understand AI X-risk better. Is anyone interested in being my “alignment tutor”, for maybe 1 h per week, or 1 h every two weeks? I’m happy to pay.
Fields I want to understand better:
Fields I’m not interested in (right now):
My level of understanding:
Example questions I wrestled with recently, and I might have brought up during the tutoring:
You don’t need to have very crisps answers to these to be my tutor, but you should probably have at least some good thoughts.