graphpatch: a Python Library for Activation Patching
This post is an announcement for a software library. It is likely only relevant to those working, or looking to start working, in mechanistic interpretability. What is graphpatch? graphpatch is a Python library for activation patching on arbitrary PyTorch neural network models. It is designed to minimize the amount of...
Jun 5, 202414