My impression is that much more effort being put into alignment than containment, and containment is treated as impossible while alignment merely very difficult. Is it accurate? If so, why? By containment I mean mostly hardware-coded strategies of limiting the compute and/or world-influence an AGI has access to. It's similar to alignment in that the most immediate obvious solutions ("box!") won't work, but more complex solutions may. A common objection is that an AI will learn the structure of the protection from the human that built it and work around, bu...
Doesn't the exact same argument work for alignment though? "It's so different, it may be misaligned in ways you can't think of". Why is it treated as a solvable challenge for alignment and an impossibility for containment? Is the guiding principle that people do expect a foolproof alignment solution to be within our reach?
One difference is that the AI wants to escape containment by default, almost by definition, but is agnostic about preferring a goal function. But since alignment space is huge (i.e. "human-compatible goals are measure 0 in alignment space... (read more)