x

LESSWRONG
LW

If I were a well-intentioned AI...

I look at how some of the major problems in AI alignment - Goodhart problems, distributional shift, mesaoptimising, etc.. - look from the perspective of a well-intentioned but ignorant AI. And if this perspective can suggest methods of safety improvements.

If I were a well-intentioned AI...

I look at how some of the major problems in AI alignment - Goodhart problems, distributional shift, mesaoptimising, etc.. - look from the perspective of a well-intentioned but ignorant AI. And if this perspective can suggest methods of safety improvements.