All of Joseph Emerson's Comments + Replies

Hey Milan, I’m broadly sympathetic to the argument in Proposition 1 Reason 2 that if we want to understand if models do human-derived cognitive operation X, we need to define what X is, and the best validation of our definition will come from testing it in humans. But recently, I’ve been wrestling with whether we need to define the cognition that models are doing in the same terms that we define human cognition to get alignment of model behavior.
 

For instance you could take the definition of deception given in this paper: “the systematic induceme... (read more)