Neuronpedia
Edit - Neuronpedia has pivoted to be a research tool for Sparse Autoencoders, so most of this post is outdated. Please read the new post, Neuronpedia: Accelerating Sparse Autoencoders Research. Neuronpedia is an AI safety game that documents and explains each neuron in modern AI models. It aims to be the Wikipedia for neurons, where the contributions come from users playing a game. Neuronpedia wants to connect the general public to AI safety, so it's designed to not require any technical knowledge to play. Neuronpedia is in experimental beta: getting its first users in order to collect feedback, ideas, and build an initial community. OBJECTIVES 1. Increase understanding of AI to help build safer AI 2. Increase public engagement, awareness, and education in AI safety CURRENT STATUS * I started working on Neuronpedia three weeks ago, and I'm posting on LessWrong to develop an initial community and for feedback and testing. I'm not posting it anywhere else, please do not share it yet in other forums like Reddit. * There's an onboarding tutorial that explains the game, but to summarize: It's a word association game. You're shown one neuron ("puzzle") at a time, and its highest activations ("clues"). You then either vote for an existing explanation, or submit your own explanation. Neuronpedia's first "campaign" is explaining gpt2-small, layer 6. * There is an "advanced mode" that allows testing custom activation text and shows more details/filters. Click "Simple" at the top right to toggle it. WHAT YOU CAN DO 1. Play @ neuronpedia.org - feel free to use a throwaway GitHub account to log in. 2. Give feedback, ideas, and ask questions. THE VISION 1. Millions of casual and technical users play Neuronpedia daily, trying to solve each neuron (like NYT crossword/Wordle). There are weekly/monthly contests ("side quests"). Top scorers are ranked on leaderboards by country, region, etc. 2. Neuronpedia sparks interest in AI safety for thousands
apologies for the issue with the neuronpedia link. it's now been resolved.