Workshop: Interpretability in LLMs using Geometric and Statistical Methods

Karthik Viswanathan

Dates: May 27-28
Location: D1.112, Science Park, University of Amsterdam
Organizers: Jan Pieter van der Schaar (University of Amsterdam) and Karthik Viswanathan (University of Amsterdam)

We are excited to announce a two-day workshop on "Interpretability in LLMs using Geometrical and Statistical Methods" scheduled for May 27 and 28. We expect around 30 participants from Amsterdam, AREA Science Park (Trieste), and our invited speakers.

Image credit: Mechanistic Interpretability for AI Safety -- A Review

This workshop explores recent developments in understanding the inner workings of Large Language Models (LLMs) by using concepts from geometry and statistics. The workshop aims to provide an accessible introduction to these approaches, discussing their potential to address key challenges in AI alignment, safety, and efficiency, while providing an overview of the current research problems in LLM interpretability. By bridging theoretical insights with practical applications, this workshop seeks to foster an exchange of ideas and motivate research at the intersection of computational geometry, statistical mechanics, and AI interpretability.

Overview

The workshop spans two days where Day 1 focuses on the geometric and statistical properties of internal representations in LLMs. The talks on this day are expected to have a physics-oriented perspective. On the second day, we aim to broaden the scope, covering mechanistic interpretability and its applications to AI safety, and exploring how the ideas from Day 1 can contribute to current research challenges in AI safety. We are in the process of inviting speakers, and you can find the list of prospective participants here.

Day 1: Geometric and Statistical Methods for Interpretability

On the first day, we will explore how large language models process and represent information through their internal representations. The discussions will focus on the geometry of embeddings - how they evolve across model layers and the insights they provide. The talks on Day 1 are expected to align with the themes discussed in this blogpost and paper.

Day 2: Mechanistic Interpretability and Applications to AI Safety

On the second day, the focus will shift toward the mechanistic aspects of interpretability, examining how specific circuits in a model’s architecture can be identified and analyzed. The discussions will also explore how these insights can be applied to AI safety research. The talks on Day 2 are expected to align with the themes discussed in this blogpost and paper.

Format

The workshop is still in its early planning stages, so the format may evolve. Currently, the plan is to have 3-4 talks per day, with dedicated time for discussions and potential collaborations. The workshop is currently intended to be fully in-person, but this may be adjusted based on the level of interest from the online community. The speakers and the schedule are yet to be decided.

Registration

To participate, please register by completing this form if you haven't done so already. A confirmation of your registration will be sent by April 15.

Questions?

Reach out to me at k.viswanathan@uva.nl or comment below. We look forward to seeing you. In the meantime, here’s a fun comic to keep you occupied!

Prospective participants

Name	Affiliation
Nabil Iqbal	Durham University/Visiting University of Amsterdam
Christoph Weniger	University of Amsterdam
Ro Jefferson	Utrecht University
Razhan Hameed	Vox AI
Tim Bakker	Qualcomm AI
Ana Lucic	University of Amsterdam
Leon Eshuijs	Vrije Universiteit Amsterdam
Angela van Sprang	UvA (IvI & ILLC)
Lorenzo Basile	AREA Science Park
Ege Erdogan	University of Amsterdam
Patrik Bartak	UvA
Isaak Mengesha	University of Amsterdam
Navonil Neogi	Durham University/Visiting University of Amsterdam
Yuri Gardinazzi	University of Trieste, Area Science Park
Rob Romijnders	UvA, AMLAB
Jasmin Kareem	UvA / Eindhoven University of Technology
Shradha Ramakrishnan	Utrecht University
Lucrezia Valeriani	University of Trieste / AREA Science Park
Francesco Ortu	AREA Science Park, University of Trieste
Lohithsai Yadala Chanchu	UVA
Matteo Biagetti	Area Science Park
Martin Carrasco	Vrije Universiteit Amsterdam
Stan van Wingerden	Timaeus
Stefan Schouten	Vrije Universiteit Amsterdam
Adrian Sauter	University of Amsterdam
Alessandro Pietro Serra	SISSA/Area Science Park
Alexander van Grootel	MIT