Model Integrity
Hi! My collaborators at the Meaning Alignment Institute put out some research yesterday that may interest folk here. The core idea is introducing 'model integrity' as a frame for outer alignment. It leverages the intuition that "most people would prefer a compliant assistant, but a cofounder with integrity." It makes...