All previous discussions and literature I was able to find on the topic of cryptographic containment of powerful AI or AI boxing primarily discussed or argued these points:
- Certain AI boxing schemes could, in principle, solve alignment.
Even if AI boxing is implemented, social engineering would likely be the primary failure mode.
AI boxing might be necessary as progress on alignment lags behind AI capabilities.
- Cryptographic AI boxing is generally computationally expensive.
However, there has been little attempt made to measure the magnitude of the alignment tax associated with the implementation of cryptographic containment for AI and use that to estimate the feasibility of such implementation. In this post, I argue that we are highly unlikely to see cryptographic containment... (read 2409 more words →)