IwanWilliams

"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents

AI interpretability researchers want to understand how models work. One popular approach is to try to figure out which features of an input a model detects and uses to generate outputs. For instance, researchers interested in understanding how an image classifier distinguishes animals from inanimate objects might try to uncover...

Jun 9, 20249

LESSWRONG
LW

LESSWRONG
LW

IwanWilliams

"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents

IwanWilliams

IwanWilliams

"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents