I'm interested in how this post should update us re: "How does pursuing RLHF help/hurt our chances of creating aligned transformative AI?", assuming that we take the post at face value and assume that it's true, reasonable, etc. (FYI I'm interested in this for weird reasons, including personal curiosity and investigating using alignment forum posts to answer alignment questions on https://elicit.org/)
Currently I think this update is: The following is true of imitation learning but not RLHF: "you are mimicking human actions, and doing so is useful precisely... (read more)
I'm interested in how this post should update us re: "How does pursuing RLHF help/hurt our chances of creating aligned transformative AI?", assuming that we take the post at face value and assume that it's true, reasonable, etc. (FYI I'm interested in this for weird reasons, including personal curiosity and investigating using alignment forum posts to answer alignment questions on https://elicit.org/)
Currently I think this update is: The following is true of imitation learning but not RLHF: "you are mimicking human actions, and doing so is useful precisely... (read more)