acabodi

Message

Eliciting Latent Knowledge in Comprehensive AI Services Models

A Conceptual Framework and Preliminary Proposals for AI Alignment and Safety in R&D Preface The present blog post serves as an overview of a research report I authored over the summer as part of the CHERI fellowship program, under the supervision of Patrick Levermore. In this project, I explore the...

Nov 17, 2023•6

Message

5 karma

1 post

Member for 2 years

acabodi — LessWrong

acabodi

Message

acabodi

Eliciting Latent Knowledge in Comprehensive AI Services Models

Nov 17, 2023•6

Message

5 karma

1 post

Member for 2 years

Eliciting Latent Knowledge in Comprehensive AI Services Models

acabodi

A Conceptual Framework and Preliminary Proposals for AI Alignment and Safety in R&D

Preface

The present blog post serves as an overview of a research report I authored over the summer as part of the CHERI fellowship program, under the supervision of Patrick Levermore. In this project, I explore the complexities of AI alignment, with a specific focus on reinterpreting the Eliciting Latent Knowledge problem through the lens of the Comprehensive AI Services (CAIS) model. Furthermore, I delve into the model's applicability in ensuring R&D design safety and certification.

I preface this post by acknowledging my novice status in the field of AI safety research. As such, this work may contain both conceptual and technical... (read 1472 more words →)

LESSWRONG
LW

LESSWRONG
LW

acabodi

acabodi

acabodi

Eliciting Latent Knowledge in Comprehensive AI Services Models

acabodi

acabodi

acabodi

Eliciting Latent Knowledge in Comprehensive AI Services Models

A Conceptual Framework and Preliminary Proposals for AI Alignment and Safety in R&D

Preface