Stefan — LessWrong

LESSWRONG
LW

Stefan — LessWrong

AI paper club: The assistant axis in persona space

19d

When you chat with an AI assistant, it usually acts helpful and professional. But sometimes things get weird - the model starts speaking in a mystical tone, claims to be something else entirely, or drifts into bizarre behavior. What's going on under the hood?
A recent Anthropic paper digs into the geometry of "personas" inside language models. They find that diverse character types (Ghost, Sage, Nomad, Demon...) cluster along a primary axis - and at one end sits the helpful Assistant we're familiar with.

We'll discuss the paper, what it tells us about how RLHF actually shapes models, and what it might mean for alignment.

Please read this paper to prepare for the session:
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(A shorter summary can be found here)

Lesswrong/ACX Meetup Gothenburg: Forecasting

Stefan

2mo

To start off the new year, we will discuss the art and science of forecasting and try our hands on making our own forecasts for 2026, so we can see how well calibrated (or not ;-) ) we are next year.

Lesswrong/ACX Christmas fika

Stefan

2mo

Hi all!

Let's get together for a cozy fika and reflect on 2025. What surprised you this year, what did you learn, what predictions did you get wrong or right, and what are you looking forward to in 2026.

Whether you're a regular or dropping by for the first time, you're welcome to join us for some December coziness and thoughtful discussion.

ACX Everywhere Fall 2025

Stefan

6mo

This year's ACX Everywhere Fall Meetup in Gothenburg. If you are reading this, you're invited!
We will be in the Condeco Fredsgatan on the second floor, look for some books on the table.

Replying toLessWrong/ACX social meetup

Stefan6mo

LessWrong/ACX social meetup

PS: We are on the second floor of the café, look for a book on the table

LessWrong/ACX social meetup

Stefan

6mo

Come by! Meet interesting people, chat interesting chat!
Normally we just chat about whatever comes up. Past topics of conversation have included AI alignment, decision theory (Newcomb's paradox etc), progress in AI and much much more.

LessWrong/ACX social meetup

Stefan

7mo

The Aumann Agreement Game

Stefan

9mo

In this session we will cover Aumanns Agreement Theorem. If you are not familiar with it, here is an explanation: Explanation of Aumanns Agreement Theorem by Scott Aronson, but that is optional reading, we will go through the basics during the meetup event.
Afterwards we are going to play the Aumann Game, were we practice updating probabilities in a cooperative setting.

ACX Everywhere Spring 2025

Stefan

11mo

This year's Spring ACX Everywhere Meetup in Gothenburg. If you are reading this, you're invited!
We will be in the Condeco Fredsgatan on the second floor, look for some books on the table.

LW/ACX Social Meetup

Stefan

Come by! Meet interesting people, chat interesting chat!

Normally we just chat about whatever comes up. Past topics of conversation have included AI alignment, decision theory (Newcomb's paradox etc), progress in AI and much much more.

(We will be on the second floor of the Condeco café, look for a book on the table)

LW/ACX social meetup

Stefan

Come by! Meet interesting people, chat interesting chat!

Replying toGothenburg LW / ACX meetup

Stefan2y

Gothenburg LW / ACX meetup

Hi all,

I had to move the location to a nearby café to avoid a collision with a different meetup.