Cross-posted from the Leap Labs blog For many people, including me, the real promise of AI is massively accelerated scientific discovery. Chatbots, vibe coding, video generation: these things are magical, but what I really want is superhuman medicine, radical life extension, humanity blossoming out into the universe. Understanding the universe....
All examples in this post can be found in this notebook, which is also probably the easiest way to start experimenting with PIZZA. From the research & engineering team at Leap Laboratories (incl. @Arush, @sebastian-sosa, @Robbie McCorkell), where we use AI interpretability to accelerate scientific discovery from data. What is...
We are thrilled to introduce Leap Labs, an AI startup. We’re building a universal interpretability engine. We design robust interpretability methods with a model-agnostic mindset. These methods in concert form our end-to-end interpretability engine. This engine takes in a model, or ideally a model and its training dataset (or some...
The set of anomalous tokens which we found in mid-January are now being described as 'glitch tokens' and 'aberrant tokens' in online discussion, as well as (perhaps more playfully) 'forbidden tokens', 'unspeakable tokens' and 'cursed tokens'. We've mostly just called them 'weird tokens'. GPT-3 speaks of 'the unspeakable one' when...
tl;dr: This is a follow-up to our original post on prompt generation and the anomalous token phenomenon which emerged from that research. Work done by Jessica Rumbelow and Matthew Watkins in January 2023 at SERI-MATS. part of a typical semantically coherent cluster we found in GPT2-small's embedding space Clustering As...
UPDATE (14th Feb 2023): ChatGPT appears to have been patched! However, very strange behaviour can still be elicited in the OpenAI playground, particularly with the davinci-instruct model. More technical details here. Further (fun) investigation into the stories behind the tokens we found here. Work done at SERI-MATS, over the past...
Work done @ SERI-MATS, idea from a conversation with Ivan Vendrov at Future Forum earlier this year. Misaligned systems are all around us. They are what make me watch another video of a man in filthy shorts building a hut using only tools made from rocks and his own armpit...