A new variation on the Gatekeeper idea, thanks to Zvi Schreiber.
We block all but one IO channel for our AGI.
We create a formally verified proof-verification system as a gatekeeper. We pass the Gatekeeper a decision problem with a formal proof that it is in the complexity class UP ("Unambiguous Non-deterministic Polynomial-time"). The Gatekeeper verifies the proof, and passes the decision problem to the AGI. If the AGI provides an answer, the Gatekeeper validates that this is the unique valid answer (which it can do in polynomial time) and if so it passes the answer to us.
The advantage of this approach over one-bit and other Oracles is that the AI can do some useful problem solving with zero discretion as to the data it communicates. It can either give the unique answer or no answer at all. Such a gatekeeper is feasible with today's technology.
To keep the AGI from communicating by choosing the timing of its answer, the Gatekeeper always communicates the answer (or the fact that there it is not yet one from the AGI) after exactly 1 hour.
(The idea of restricting an Oracle AI to choosing from predetermined list of verifiable answers has been mentioned, but I don't remember anyone adding this formalism.)
The usual caveats on Oracles apply. Like other Oracles, this is a stopgap rather than a solution to the FAI problem.
- The laws of physics still give the AGI a causal link to its environment, even if it is in a Faraday cage.
- Even a one-bit Oracle AI can exert optimization power when asked multiple questions.
- Answers aren't enough to ensure safety: We may be asking the wrong questions or not understanding the implications of the answers.
- Only certain problems can be fed to the AGI, and perhaps not the relevant ones. However, UP includes integer factorization and other interesting problems, suggesting that this could be a useful tool.
- This idea does not say that the AGI has figured out how to do all UP in polynomial time, just that it has figured out fast solutions for a larger subset of UP than humans have. So the additional optimization power exerted by the AGI may not be very large.
This wont work, like with all other similar schemes, because you can't "prove" the gatekeeper down to the quark level of what makes its hardware (so you're vulnerable to some kind of side-attack, like the memory bit flipping attack that was spoken about recently), nor shield the AI from being able to communicate through side channels (like, varying the temperature of its internal processing unit which it turns will influence the air conditioning system, ...).
And that's not even considering that the AI could actually discover new physics (new particles, ...) and have some ability to manipulate them with its own hardware.
This whole class of approach can't work, because there are just too many ways for side-attacks and side-channels of communication, and you can't formally prove none of them are available, without going down to making proof over the whole (AI + gatekeeper + power generator + air conditioner + ...) down at Schrödinger equation level.
You're quite right--these are among the standard objections for boxing, as mentioned in the post. However, AI boxing may have value as a stopgap in an early stage, so I'm wondering about the idea's value in that context.