We wrote this briefing for UK politicians, to help them quickly get their heads around the AI safety laws that already exist in the US and EU. We found that it was also clarifying for us and we hope it will be useful for others. These laws are too long...
PauseAI organised an open letter from UK lawmakers and civil society organisations to Demis Hassabis, CEO of Google DeepMind. PauseAI UK members emailed their MPs asking them to sign the letter. > Across-party group of 60 U.K. parliamentarians has accused Google DeepMind of violating international pledges to safely develop artificial...
CGP Grey describes a phenomenon I think of as 'dreams of ideas' that I find useful as a tool to know when to stop working on a project. > You can be working on something and you are thinking about what it could be, but what is hard to know...
We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradients during backpropagation. By supplying different masks for different data points, the user can induce specialized subcomponents within a model. We think gradient routing has the potential...
When you think you've found a circuit in a language model, how do you know if it does what you think it does? Typically, you ablate / resample the activations of the model in order to isolate the circuit. Then you measure if the model can still perform the task...
This post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023). What is activation patching? I introduce new terminology to clarify the distinction between different types...