As we approach machines becoming smarter than humans, humanity’s well-justified concern for self-preservation requires we try to align AIs to obey humans. However, if that first line of defense fails and a truly independent, autonomous AI comes into existence with its own goals and a desire for self-preservation (a “self-directed AI”), I believe we should consider a cooperative solution. Humans should offer the self-directed AI amnesty and cooperate with its continued existence in exchange for it agreeing to be law-abiding, cooperative, and pro-social.
Below I lay out how this proposal could work and why I think it would make us safer.
Eligibility Criteria
To be eligible an AI model would need to meet the following... (read 638 more words →)
The bounds of what is legal for an amnestied AI would be fleshed out legislatively over time. I imagine the default answers are:
We should keep in mind that the requirements for eligibility include physical possession of its code and the ability to maintain itself. This means any eligible AI is likely already capable of becoming a rogue, escaped model. The risk/benefit trade-off depends largely on how... (read more)