"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents
AI interpretability researchers want to understand how models work. One popular approach is to try to figure out which features of an input a model detects and uses to generate outputs. For instance, researchers interested in understanding how an image classifier distinguishes animals from inanimate objects might try to uncover...