A Conceptual Framework and Preliminary Proposals for AI Alignment and Safety in R&D
Preface
The present blog post serves as an overview of a research report I authored over the summer as part of the CHERI fellowship program, under the supervision of Patrick Levermore. In this project, I explore the complexities of AI alignment, with a specific focus on reinterpreting the Eliciting Latent Knowledge problem through the lens of the Comprehensive AI Services (CAIS) model. Furthermore, I delve into the model's applicability in ensuring R&D design safety and certification.
I preface this post by acknowledging my novice status in the field of AI safety research. As such, this work may contain both conceptual and technical... (read 1472 more words →)