Antonio Clarke

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation I completed this blog post as my final project for BlueDot Impact's AI Alignment course. While it's nowhere near as polished as I'd like it to be given the short-term nature of the course, I'd love...

Sep 29, 20246

LESSWRONG
LW

LESSWRONG
LW

Antonio Clarke

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation

Antonio Clarke

Antonio Clarke

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation

Abstract