Eleni Angelou

Large Language Models Live in Time

Crossposted from my Substack. Epistemic status: wild brainstorming. LLMs don’t seem to live in time. That is, they don’t seem to have a continuous personal identity the way we typically understand this for humans. My claim here, however, is that the question of whether LLMs somehow represent or track temporal...

Feb 920

Tools, Agents, and Sycophantic Things

Crossposted from my Substack. For more context, you may also want to read The Intentional Stance, LLMs Edition. Why Am I Writing This I recently realized that, in applying the intentional stance to LLMs, I have not fully spelled out what exactly I’m applying the intentional stance to. For the...

Dec 6, 202525

Takeaways from the Eleos Conference on AI Consciousness and Welfare

Crossposted from my Substack. I spent the weekend at Lighthaven, attending the Eleos conference. In this post, I share thoughts and updates as I reflect on talks, papers, and discussions, and put out some of my takes since I haven't written about this topic before. I divide my thoughts into...

Nov 25, 202533

Notes on "Explaining AI Explainability"

Conor Griffin interviewed Been Kim and Neel Nanda and posted their discussion here. They address a series of important questions about what explaining AI systems should look like, the role of mechanistic interpretability, the problems with solving explainability through chain-of-thought reasoning, and using AIs to produce explanations. The following are...

Oct 24, 202520

Three Levels for Large Language Model Cognition

This is the abridged version of my second dissertation chapter. Read the first here. Thanks to everyone I've discussed this with, and especially, M.A. Khalidi, Lewis Smith, and Aysja Johnson. TL;DR: Applying Marr's three levels to LLMs seems useful, but quickly proves itself to be a leaky abstraction. Despite the...

Feb 25, 202521

A Problem to Solve Before Building a Deception Detector

TL;DR: If you are thinking of using interpretability to help with strategic deception, then there's likely a problem you need to solve first: how are intentional descriptions (like deception) related to algorithmic ones (like understanding the mechanisms models use)? We discuss this problem and try to outline some constructive directions....

Feb 7, 202578

Failure Modes of Teaching AI Safety

Why I'm writing this I'm about to teach my AI safety course for the fourth time. As I'm now updating the syllabus for the upcoming semester, I summarize my observations on what can go wrong when teaching AI safety. These have mostly not happened during my teaching but are generally...

Jun 25, 202420

LESSWRONG
LW

LESSWRONG
LW

Eleni Angelou

Eleni Angelou

A Problem to Solve Before Building a Deception Detector

I designed an AI safety course (for a philosophy department)

The Intentional Stance, LLMs Edition

Takeaways from the Eleos Conference on AI Consciousness and Welfare

Eleni Angelou

A Problem to Solve Before Building a Deception Detector

I designed an AI safety course (for a philosophy department)

The Intentional Stance, LLMs Edition

Takeaways from the Eleos Conference on AI Consciousness and Welfare

Large Language Models Live in Time

Tools, Agents, and Sycophantic Things

Takeaways from the Eleos Conference on AI Consciousness and Welfare

Notes on "Explaining AI Explainability"

Three Levels for Large Language Model Cognition

A Problem to Solve Before Building a Deception Detector

Failure Modes of Teaching AI Safety