x

LESSWRONG

LW

LLM Personas — LessWrong

LLM Personas

This page is a stub.

Add Posts

Posts tagged LLM Personas

10

201Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria, Kyle O’Brien, David Africa, Samuel Ratnam, andyk

6mo

25

8

4y

170

8

1y

108

7

27Constitutional AI Alignment

1mo

9

7

25Experimental Evidence for Simulator Theory— Part 1: Emergent Misalignment and Weird Generalizations

3mo

0

7

21Experimental Evidence for Simulator Theory— Part 2: The Scalers Strike Back

3mo

0

6

769The Rise of Parasitic AI

9mo

191

6

266A Three-Layer Model of LLM Psychology

1y

17

6

177Persona Parasitology

Raymond Douglas

4mo

38

6

106Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

5mo

12

6

83Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan, dillonkn

4mo

15

5

121A Case for Model Persona Research

nielsrolf, Maxime Riché, Daniel Tan

6mo

11

4

68The Bleeding Mind

6mo

9

4

40Selection Pressures on LM Personas

Raymond Douglas

1y

0

3

69Concrete research ideas on AI personas

nielsrolf, Maxime Riché, Daniel Tan

5mo

10

Load More (15/50)

Add Posts