KevinWei

Message

KevinWei

Emergent Misalignment & Realignment

Reproduction, Extension & Mitigations Authors: Elizaveta Tennant, Jasper Timm, Kevin Wei, David Quarel In this project, we set out to explore the generality of Emergent Misalignment (via a replication and some extensions) and how easy it is to mitigate. This project was conducted during the capstone week at ARENA (Alignment...

Jun 27, 202545

2024 Summer AI Safety Intro Fellowship and Socials in Boston

Tl;dr: The AI Safety Student Team (a group of students at Harvard) will be running two 8-week introductory reading groups this summer (in Boston and online), as well as summer socials (in Boston). Apply to our technical fellowship here or our policy fellowship here; express interest in our socials here....

May 29, 20248

LESSWRONG
LW

LESSWRONG
LW

KevinWei

KevinWei

KevinWei

Emergent Misalignment & Realignment

2024 Summer AI Safety Intro Fellowship and Socials in Boston

KevinWei

KevinWei

KevinWei

Emergent Misalignment & Realignment

2024 Summer AI Safety Intro Fellowship and Socials in Boston

Reproduction, Extension & Mitigations

TL;DR

Fellowships