tricky_labyrinth

LEAst-squares Concept Erasure (LEACE)

"Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. ... LEACE has a closed-form solution that fits on a T-shirt. This makes it orders of...

Jun 7, 202368

LESSWRONG
LW

LESSWRONG
LW

tricky_labyrinth

tricky_labyrinth's Shortform

LEAst-squares Concept Erasure (LEACE)

tricky_labyrinth

tricky_labyrinth

tricky_labyrinth's Shortform

LEAst-squares Concept Erasure (LEACE)