But is it really in Rome? An investigation of the ROME model editing technique
Thanks to Andrei Alexandru, Joe Collman, Michael Einhorn, Kyle McDonell, Daniel Paleka, and Neel Nanda for feedback on drafts and/or conversations which led to useful insights for this work. In addition, thank you to both William Saunders and Alex Gray for exceptional mentorship throughout this project. The majority of this work was carried out this summer. Many people in the community were surprised when I mentioned some of the limitations of ROME (Rank-One Model Editing), so I figured it was worth it to write a post about it as well as other insights I gained from looking into the paper. Most tests were done with GPT-2, some were done with GPT-J. The ROME paper (Locating and Editing Factual Associations in GPT) has been one of the most influential papers in the prosaic alignment community. It has several important insights. The main findings are: 1. Factual associations such as “The Eiffel Tower is in Paris” seem to be stored in the MLPs of the early-middle layers of a GPT model. As the Tower token passes through the network, the MLPs of the early-middle layers will write information (e.g. the Eiffel Tower’s location) into the residual so that the model can later read that information to generate a token about that fact (e.g. Paris). 2. Editing/updating the MLP of a single layer for a given (subject, relationship, object) association allows the model to generate text with the updated fact when using new prompts/sentences that include the subject tokens. For example, editing “The Eiffel Tower is in Paris Rome” results in a model that outputs “The Eiffel Tower is right across from St Peter’s Basilica in Rome, Italy. “ In this post, I show that the ROME edit has many limitations: * The ROME edit doesn’t generalize in the way you might expect. It’s true that if the subject tokens you use for the edit are found in the prompt, it will try to generalize from the updated fact. However, it doesn’t “generalize” in the following ways: * It is not direction-agnosti
I've been posting on my personal blog about topics that are likely less interesting to the regular LessWrong user, but I wanted to link to it here for those interested in the intersection of AI safety and startups.
https://jacquesthibodeau.com/
Recent post: https://jacquesthibodeau.com/when-execution-gets-cheap-does-taste-become-the-moat/
When Execution Gets Cheap, Does Taste Become the Moat?
For 20 years, the startup mantra has been: "Ideas are worthless. Execution is everything."
That era is ending.
When AI executes 10x faster and cheaper than a team of engineers, execution stops being the differentiator. The game shifts to taste. Which problems are worth solving. Which solutions are good. What to build before the market tells you. How do your internal systems allow you to accelerate faster... (read 1109 more words →)