JoshuaFox comments on Open Thread, Jun. 29 - Jul. 5, 2015 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (210)
Could someone point me to any existing articles on this variant of AI-Boxing and Oracle AGIs:
The boxed AGI's gatekeeper is a simpler system which runs formal proofs to verify that AGI's output satisfies a simple, formally definable. The constraint is not "safety" in general but rather is narrow enough that we can be mathematically sure that the output is safe. (This does limit potential benefits from the AGI.)
The questions about what the constraint should be remains open, and of course the fact that the AGI is physically embodied puts it in causal contact with the rest of the universe. But as a partial or short-term solution, has anyone written about it? The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI.
You're probably thinking of GOLEM. The Gödel machine is another proposal along somewhat similar lines.
Some discussions more directly related to your suggestion could be:
Thank you, Kaj. Those references are what I was looking for.
It looks like there might be a somewhat new idea here. Previous suggestions, as you mention, restrict output to a single bit; or require review by human experts. Using multiple AGI oracles to check each other is a good one, though I'd worry about acausal coordination between by the AGIs, and I don't see that the safety is provable beyond checking that answers match.
This new variant gives the benefit of provable restrictions and the relative ease of implementing a narrow-AI proof system to check it. It's certainly not the full solution to the FAI problem, but it's a good addition to our lineup of partial or short-term solutions in the area of AI Boxing and Oracle AI.
I'll get this feedback to the originator of this idea and see what can be made of it.