atticusw

Message

11mo

atticusw

[CS2881r] Optimizing Prompts with Reinforcement Learning

This work was done as an experiment for Boaz Barak’s “CS 2881r: AI Safety and Alignment” at Harvard. The lecture where this work was presented can be viewed on YouTube here, and its corresponding blogpost can be found here. Background Prompt engineering has become a central idea in working with...

Oct 1, 20252

[CS 2881r AI Safety] [Week 1] Introduction

Authors: Jay Chooi, Natalia Siwek, Atticus Wang Lecture slides: link Lecture video: link Student experiment slides: link Student experiment blogpost: Some Generalizations of Emergent Misalignment This is the first of a series of blog posts on Boaz’s AI Safety class. Each week, a group of students will post a blog...

Sep 14, 202517

LESSWRONG
LW

LESSWRONG
LW

atticusw

atticusw

atticusw

[CS2881r] Optimizing Prompts with Reinforcement Learning

[CS 2881r AI Safety] [Week 1] Introduction

atticusw

atticusw

atticusw

[CS2881r] Optimizing Prompts with Reinforcement Learning

[CS 2881r AI Safety] [Week 1] Introduction

Background

The Setup

Authors’ Intro