Dates: May 27-28
Location: D1.112, Science Park, University of Amsterdam 
Organizers: Jan Pieter van der Schaar (University of Amsterdam) and Karthik Viswanathan (University of Amsterdam)

We are excited to announce a two-day workshop on "Interpretability in LLMs using Geometrical and Statistical Methods" scheduled for May 27 and 28. We expect around 30 participants from Amsterdam, AREA Science Park (Trieste), and our invited speakers.

Image credit: Mechanistic Interpretability for AI Safety -- A Review

This workshop explores recent developments in understanding the inner workings of Large Language Models (LLMs) by using concepts from geometry and statistics. The workshop aims to provide an accessible introduction to these approaches, discussing their potential to address key challenges in AI alignment, safety, and efficiency, while providing an overview of the current research problems in LLM interpretability. By bridging theoretical insights with practical applications, this workshop seeks to foster an exchange of ideas and motivate research at the intersection of computational geometry, statistical mechanics, and AI interpretability.

Overview

The workshop spans two days where Day 1 focuses on the geometric and statistical properties of internal representations in LLMs. The talks on this day are expected to have a physics-oriented perspective. On the second day, we aim to broaden the scope, covering mechanistic interpretability and its applications to AI safety, and exploring how the ideas from Day 1 can contribute to current research challenges in AI safety. We are in the process of inviting speakers, and you can find the list of prospective participants here

Day 1: Geometric and Statistical Methods for Interpretability

On the first day, we will explore how large language models process and represent information through their internal representations. The discussions will focus on the geometry of embeddings - how they evolve across model layers and the insights they provide. The talks on Day 1 are expected to align with the themes discussed in this blogpost and paper.

Day 2: Mechanistic Interpretability and Applications to AI Safety

On the second day, the focus will shift toward the mechanistic aspects of interpretability, examining how specific circuits in a model’s architecture can be identified and analyzed. The discussions will also explore how these insights can be applied to AI safety research. The talks on Day 2 are expected to align with the themes discussed in this blogpost and paper.

Format

The workshop is still in its early planning stages, so the format may evolve. Currently, the plan is to have 3-4 talks per day, with dedicated time for discussions and potential collaborations. The workshop is currently intended to be fully in-person, but this may be adjusted based on the level of interest from the online community. The speakers and the schedule are yet to be decided.

Registration

To participate, please register by completing this form if you haven't done so already. A confirmation of your registration will be sent by April 15.

Questions?

Reach out to me at k.viswanathan@uva.nl or comment below. We look forward to seeing you. In the meantime, here’s a fun comic to keep you occupied!

Image Credit: SMBC comics

Prospective participants

NameAffiliation
Nabil IqbalDurham University/Visiting University of Amsterdam
Christoph WenigerUniversity of Amsterdam
Ro JeffersonUtrecht University
Razhan HameedVox AI
Tim BakkerQualcomm AI
Ana LucicUniversity of Amsterdam
Leon EshuijsVrije Universiteit Amsterdam
Angela van SprangUvA (IvI & ILLC)
Lorenzo BasileAREA Science Park
Ege ErdoganUniversity of Amsterdam
Patrik BartakUvA
Isaak MengeshaUniversity of Amsterdam
Navonil NeogiDurham University/Visiting University of Amsterdam
Yuri GardinazziUniversity of Trieste, Area Science Park
Rob RomijndersUvA, AMLAB
Jasmin KareemUvA / Eindhoven University of Technology 
Shradha RamakrishnanUtrecht University
Lucrezia ValerianiUniversity of Trieste / AREA Science Park
Francesco OrtuAREA Science Park, University of Trieste
Lohithsai Yadala ChanchuUVA
Matteo BiagettiArea Science Park
Martin CarrascoVrije Universiteit Amsterdam
Stan van WingerdenTimaeus
Stefan SchoutenVrije Universiteit Amsterdam
Adrian SauterUniversity of Amsterdam
Alessandro Pietro SerraSISSA/Area Science Park
Alexander van GrootelMIT
New Comment
Curated and popular this week